Research Paper Cockpit

Needs Review

Unresolved papers that are still in your triage queue.

Daily Archives

Quick jump into generated daily digests.

Research Workflow

Latest digest: 2026-05-13.

Papers

1366 visible entries

arxiv Score 35.5

Modular Continual Learning via Zero-Leakage Reconstruction Routing and Autonomous Task Discovery

2026-04-15 · Noureddine Kermiche

Research Track A · General AI

Catastrophic forgetting remains a primary hurdle in sequential task learning for artificial neural networks. We propose a silicon-native modular architecture that achieves structural parameter isolation using Task-Specific Experts and a distributed, outlier-based Gatekeeper. Moving beyond traditional sequential consoli…

Review
pending
Role
unreviewed
Read
now
arxiv Score 30.5

Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning

2026-03-12 · Jiaheng Hu, Jay Shim, Chen Tang, Yoonchang Sung, Bo Liu, Peter Stone, Roberto Martin-Martin

Research Track A · General AI

Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual learning suggests that naive Sequential Fine-Tuning (Seq. FT) leads to catastrophi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 30.0

JumpLoRA: Sparse Adapters for Continual Learning in Large Language Models

2026-04-17 · Alexandra Dragomir, Ioana Pintilie, Antonio Barbalau, Marius Dragoi, Florin Brad, Cristian Daniel Paduraru, Alexandru Tifrea, Elena Burceanu, Radu Tudor Ionescu

Research Track A · General AI

Adapter-based methods have become a cost-effective approach to continual learning (CL) for Large Language Models (LLMs), by sequentially learning a low-rank update matrix for each task. To mitigate catastrophic forgetting, state-of-the-art approaches impose constraints on new adapters with respect to the previous ones,…

Review
pending
Role
unreviewed
Read
now
arxiv Score 29.0

Structured Distillation of Web Agent Capabilities Enables Generalization

2026-04-09 · Xing Han Lù, Siva Reddy

Research Track B · General AI

Frontier LLMs can navigate complex websites, but their cost and reliance on third-party APIs make local deployment impractical. We introduce Agent-as-Annotators, a framework that structures synthetic trajectory generation for web agents by analogy to human annotation roles, replacing the Task Designer, Annotator, and S…

Review
pending
Role
unreviewed
Read
now
arxiv Score 28.0

ContractSkill: Repairable Contract-Based Skills for Multimodal Web Agents

2026-03-20 · Zijian Lu, Yiping Zuo, Yupeng Nie, Xin He, Weibei Fan, Chen Dai, Lianyong Qi, Shi Jin

Research Track B · General AI

Despite rapid progress in multimodal GUI agents, reusable skill acquisition remains difficult because on-demand generated skills often leave action semantics, state assumptions, and success criteria implicit. This makes them brittle to execution errors, hard to verify, and difficult to repair. We present ContractSkill,…

Review
pending
Role
unreviewed
Read
now
arxiv Score 27.3

Automatic Ontology Construction Using LLMs as an External Layer of Memory, Verification, and Planning for Hybrid Intelligent Systems

2026-04-22 · Pavel Salovskii, Iuliia Gorshkova

General AI

This paper presents a hybrid architecture for intelligent systems in which large language models (LLMs) are extended with an external ontological memory layer. Instead of relying solely on parametric knowledge and vector-based retrieval (RAG), the proposed approach constructs and maintains a structured knowledge graph …

Review
pending
Role
unreviewed
Read
now
arxiv Score 26.8

ATP-Bench: Towards Agentic Tool Planning for MLLM Interleaved Generation

2026-03-31 · Yinuo Liu, Zi Qian, Heng Zhou, Jiahao Zhang, Yajie Zhang, Zhihang Li, Mengyu Zhou, Erchao Zhao, Xiaoxi Jiang, Guanjun Jiang

General AI

Interleaved text-and-image generation represents a significant frontier for Multimodal Large Language Models (MLLMs), offering a more intuitive way to convey complex information. Current paradigms rely on either image generation or retrieval augmentation, yet they typically treat the two as mutually exclusive paths, fa…

Review
pending
Role
unreviewed
Read
now
arxiv Score 26.5

Low-Rank Adaptation Reduces Catastrophic Forgetting in Sequential Transformer Encoder Fine-Tuning: Controlled Empirical Evidence and Frozen-Backbone Representation Probes

2026-03-29 · Ashish Pandey

Research Track A

Sequential fine-tuning of pretrained language encoders often overwrites previously acquired capabilities, but the forgetting behavior of parameter-efficient updates remains under-characterized. We present a controlled empirical study of Low-Rank Adaptation (LoRA) in sequential transformer encoder fine-tuning with compa…

Review
pending
Role
unreviewed
Read
now
arxiv Score 26.5

HiP-LoRA: Budgeted Spectral Plasticity for Robust Low-Rank Adaptation

2026-04-20 · Lixian Chen, Jianhong Tan

Research Track A

Adapting foundation models under resource budgets relies heavily on Parameter-Efficient Fine-Tuning (PEFT), with LoRA being a standard modular solution. However, LoRA suffers from spectral interference. Low-rank updates often concentrate energy on the leading singular directions of pretrained weights, perturbing genera…

Review
pending
Role
unreviewed
Read
now
arxiv Score 26.4

PRISM: Exposing and Resolving Spurious Isolation in Federated Multimodal Continual Learning

2026-05-01 · Beining Wu, Zihao Ding, Jun Huang

Research Track A · General AI

While current federated multimodal continual learning over mixture-of-experts low-rank adaptation (MoE-LoRA) is built on the unverified assumption that routing isolates task-specific knowledge into disjoint experts, we argue that routing operates per-sample, while forgetting accumulates across the task sequence, and gr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 26.0

Towards Lifelong Aerial Autonomy: Geometric Memory Management for Continual Visual Place Recognition in Dynamic Environments

2026-04-10 · Xingyu Shao, Zhiqiang Yan, Liangzheng Sun, Mengfan He, Chao Chen, Jinhui Zhang, Chunyu Li, Ziyang Meng

Research Track A · General AI

Robust geo-localization in changing environmental conditions is critical for long-term aerial autonomy. While visual place recognition (VPR) models perform well when airborne views match the training domain, adapting them to shifting distributions during sequential missions triggers catastrophic forgetting. Existing co…

Review
pending
Role
unreviewed
Read
now
arxiv Score 26.0

Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability

2026-04-23 · Nicolae Filat, Ahmed Hussain, Konstantinos Kalogiannis, Elena Burceanu

Research Track A · General AI

Streaming Continual Learning (CL) typically converts a continuous stream into a sequence of discrete tasks through temporal partitioning. We argue that this temporal taskification step is not a neutral preprocessing choice, but a structural component of evaluation: different valid splits of the same stream can induce d…

Review
pending
Role
unreviewed
Read
now
arxiv Score 25.8

STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?

2026-05-07 · Hanxiang Chao, Yihan Bai, Rui Sheng, Tianle Li, Yushi Sun

Research Track A · General AI

Large Language Model (LLM) agents are increasingly expected to maintain coherent, long-term personalized memory, yet current benchmarks primarily measure static fact retrieval, overlooking the ability to revise stored beliefs when new evidence emerges. We identify a critical and underexplored failure mode, Implicit Con…

Review
pending
Role
unreviewed
Read
now
huggingface Score 25.5

Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

2026-04-15 · Xiaohua Wang, Muzhao Tian, Yuqi Zeng, Zisu Huang, Jiakang Yuan, Bowen Chen, Jingwen Xu, Mingbo Zhou, Wenhao Liu, Muling Wu, Zhengkang Guo, Qi Qian, Yifei Wang, Feiran Zhang, Ruicheng Yin, Shihan Dou, Changze Lv, Tao Chen, Kaitao Song, Xu Tan, Tao Gui, Xiaoqing Zheng, Xuanjing Huang

General AI

Reinforcement Learning from Human Feedback (RLHF) and related alignment paradigms have become central to steering large language models (LLMs) and multimodal large language models (MLLMs) toward human-preferred behaviors. However, these approaches introduce a systemic vulnerability: reward hacking, where models exploit…

Review
pending
Role
unreviewed
Read
now
arxiv Score 25.5

Learning, Fast and Slow: Towards LLMs That Adapt Continually

2026-05-12 · Rishabh Tiwari, Kusha Sareen, Lakshya A Agrawal, Joseph E. Gonzalez, Matei Zaharia, Kurt Keutzer, Inderjit S Dhillon, Rishabh Agarwal, Devvrit Khatri

Research Track A · General AI

Large language models (LLMs) are trained for downstream tasks by updating their parameters (e.g., via RL). However, updating parameters forces them to absorb task-specific information, which can result in catastrophic forgetting and loss of plasticity. In contrast, in-context learning with fixed LLM parameters can chea…

Review
pending
Role
unreviewed
Read
now
arxiv Score 25.3

FineCog-Nav: Integrating Fine-grained Cognitive Modules for Zero-shot Multimodal UAV Navigation

2026-04-17 · Dian Shao, Zhengzheng Xu, Peiyang Wang, Like Liu, Yule Wang, Jieqi Shi, Jing Huo

General AI

UAV vision-language navigation (VLN) requires an agent to navigate complex 3D environments from an egocentric perspective while following ambiguous multi-step instructions over long horizons. Existing zero-shot methods remain limited, as they often rely on large base models, generic prompts, and loosely coordinated mod…

Review
pending
Role
unreviewed
Read
now
arxiv Score 25.3

A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding

2026-04-21 · Shuai Wang, Hongyi Zhu, Jia-Hong Huang, Yixian Shen, Chengxi Zeng, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring

General AI

Understanding artworks requires multi-step reasoning over visual content and cultural, historical, and stylistic context. While recent multimodal large language models show promise in artwork explanation, they rely on implicit reasoning and internalized knowl- edge, limiting interpretability and explicit evidence groun…

Review
pending
Role
unreviewed
Read
now
arxiv Score 25.0

Continual Calibration: Coverage Can Collapse Before Accuracy in Lifelong LLM Fine-Tuning

2026-04-27 · Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

Research Track A · General AI

Continual learning for large language models is typically evaluated through accuracy retention under sequential fine-tuning. We argue that this perspective is incomplete, because uncertainty reliability can degrade earlier and more sharply than top-1 performance. We study this empirically by measuring conformal coverag…

Review
pending
Role
unreviewed
Read
now
arxiv Score 25.0

Online Continual Learning with Dynamic Label Hierarchies

2026-05-12 · Xinrui Wang, Shao-Yuan Li, Bartłomiej Twardowski, Alexandra Gomez-Villa, Songcan Chen

Research Track A · General AI

Online Continual Learning (OCL) aims to learn from endless non\text{-}stationary data streams, yet most existing methods assume a flat label space and overlook the hierarchical organization of real\text{-}world concepts that evolves both horizontally (sibling classes) and vertically (coarse or fine categories). To bett…

Review
pending
Role
unreviewed
Read
now
arxiv Score 25.0

Stop Marginalizing My Dreams: Model Inversion via Laplace Kernel for Continual Learning

2026-05-12 · Patryk Krukowski, Jacek Tabor, Przemysław Spurek, Marek Śmieja, Łukasz Struski

Research Track A · General AI

Data-free continual learning (DFCIL) relies on model inversion to synthesize pseudo-samples and mitigate catastrophic forgetting. However, existing inversion methods are fundamentally limited by a simplifying assumption: they model feature distributions using diagonal covariance, effectively ignoring correlations that …

Review
pending
Role
unreviewed
Read
now
huggingface Score 24.8

Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents

2026-05-11 · Shijue Huang, Hangyu Guo, Chenxin Li, Junting Lu, Xinyu Geng, Zhaochen Su, Zhenyu Li, Shuang Chen, Hongru Wang, Yi R. Fung

General AI

Multimodal deep search requires an agent to solve open-world problems by chaining search, tool use, and visual reasoning over evolving textual and visual context. Two bottlenecks limit current systems. First, existing tool-use harnesses treat images returned by search, browsing, or transformation as transient outputs, …

Review
pending
Role
unreviewed
Read
now
arxiv Score 24.5

Universe Routing: Why Self-Evolving Agents Need Epistemic Control

2026-03-16 · Zhaohui Geoffrey Wang

Research Track A · General AI

A critical failure mode of current lifelong agents is not lack of knowledge, but the inability to decide how to reason. When an agent encounters "Is this coin fair?" it must recognize whether to invoke frequentist hypothesis testing or Bayesian posterior inference - frameworks that are epistemologically incompatible. M…

Review
pending
Role
unreviewed
Read
now
arxiv Score 24.5

Face-D(^2)CL: Multi-Domain Synergistic Representation with Dual Continual Learning for Facial DeepFake Detection

2026-04-09 · Yushuo Zhang, Yu Cheng, Yongkang Hu, Jiuan Zhou, Jiawei Chen, Yuan Xie, Zhaoxia Yin

Research Track A

The rapid advancement of facial forgery techniques poses severe threats to public trust and information security, making facial DeepFake detection a critical research priority. Continual learning provides an effective approach to adapt facial DeepFake detection models to evolving forgery patterns. However, existing met…

Review
pending
Role
unreviewed
Read
now
huggingface Score 24.5

PersonalAI: A Systematic Comparison of Knowledge Graph Storage and Retrieval Approaches for Personalized LLM agents

2026-04-12 · Mikhail Menschikov, Dmitry Evseev, Victoria Dochkina, Ruslan Kostoev, Ilia Perepechkin, Petr Anokhin, Nikita Semenov, Evgeny Burnaev

General AI

Personalizing language models by effectively incorporating user interaction history remains a central challenge in the development of adaptive AI systems. While large language models (LLMs), combined with Retrieval-Augmented Generation (RAG), have improved factual accuracy, they often lack structured memory and fail to…

Review
pending
Role
unreviewed
Read
now
arxiv Score 24.5

Fact4ac at the Financial Misinformation Detection Challenge Task: Reference-Free Financial Misinformation Detection via Fine-Tuning and Few-Shot Prompting of Large Language Models

2026-04-16 · Cuong Hoang, Le-Minh Nguyen

Research Track A · General AI

The proliferation of financial misinformation poses a severe threat to market stability and investor trust, misleading market behavior and creating critical information asymmetry. Detecting such misleading narratives is inherently challenging, particularly in real-world scenarios where external evidence or supplementar…

Review
pending
Role
unreviewed
Read
now
arxiv Score 24.3

MEME: Multi-entity & Evolving Memory Evaluation

2026-05-12 · Seokwon Jung, Alexander Rubinstein, Arnas Uselis, Sangdoo Yun, Seong Joon Oh

General AI

LLM-based agents increasingly operate in persistent environments where they must store, update, and reason over information across many sessions. While prior benchmarks evaluate only single-entity updates, MEME defines six tasks spanning the full space defined by the multi-entity and evolving axes, including three not …

Review
pending
Role
unreviewed
Read
now
arxiv Score 24.2

Echo-α: Large Agentic Multimodal Reasoning Model for Ultrasound Interpretation

2026-04-30 · Jing Zhang, Wentao Jiang, Tao Huang, Zhiwei Wang, Jianxin Liu, Jian Chen, Ping Ye, Gang Wang, Zengmao Wang, Bo Du, Dacheng Tao

General AI

Ultrasound interpretation requires both precise lesion localization and holistic clinical reasoning, yet existing methods typically excel at only one of these capabilities: specialized detectors offer strong localization but limited reasoning, whereas multimodal large language models (MLLMs) provide flexible reasoning …

Review
pending
Role
unreviewed
Read
now
arxiv Score 24.0

TSN-Affinity: Similarity-Driven Parameter Reuse for Continual Offline Reinforcement Learning

2026-04-28 · Dominik Żurek, Kamil Faber, Marcin Pietron, Paweł Gajewski, Roberto Corizzo

Research Track A · General AI

Continual offline reinforcement learning (CORL) aims to learn a sequence of tasks from datasets collected over time while preserving performance on previously learned tasks. This setting corresponds to domains where new tasks arise over time, but adapting the model in live environment interactions is expensive, risky, …

Review
pending
Role
unreviewed
Read
now
arxiv Score 23.8

Enhancing Web Agents with a Hierarchical Memory Tree

2026-03-07 · Yunteng Tan, Zhi Gao, Xinxiao Wu

Research Track B · General AI

Large language model-based web agents have shown strong potential in automating web interactions through advanced reasoning and instruction following. While retrieval-based memory derived from historical trajectories enables these agents to handle complex, long-horizon tasks, current methods struggle to generalize acro…

Review
pending
Role
unreviewed
Read
now
arxiv Score 23.8

MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents

2026-04-06 · Shu Wang, Edwin Yu, Oscar Love, Tom Zhang, Tom Wong, Steve Scargall, Charles Fan

General AI

Large Language Model (LLM) agents require persistent memory to maintain personalization, factual continuity, and long-horizon reasoning, yet standard context-window and retrieval-augmented generation (RAG) pipelines degrade over multi-session interactions. We present MemMachine, an open-source memory system that integr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 23.5

BID-LoRA: A Parameter-Efficient Framework for Continual Learning and Unlearning

2026-04-14 · Jagadeesh Rachapudi, Ritali Vatsi, Praful Hambarde, Amit Shukla

Research Track A · General AI

Recent advances in deep learning underscore the need for systems that can not only acquire new knowledge through Continual Learning (CL) but also remove outdated, sensitive, or private information through Machine Unlearning (MU). However, while CL methods are well-developed, MU techniques remain in early stages, creati…

Review
pending
Role
unreviewed
Read
now
arxiv Score 23.5

Balancing Stability and Plasticity in Sequentially Trained Early-Exiting Neural Networks

2026-05-06 · Alaa Zniber, Ouassim Karrakchou, Mounir Ghogho

Research Track A · General AI

Early-exiting neural networks enable adaptive inference by allowing inputs to exit at intermediate classifiers, reducing computation for easy samples while maintaining high accuracy. In practice, exits can be trained sequentially by incrementally adding them to a shared backbone; however, this sequential training can c…

Review
pending
Role
unreviewed
Read
now
arxiv Score 23.3

SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models

2026-04-21 · Josue Torres-Fonseca, Naihao Deng, Yinpei Dai, Shane Storks, Yichi Zhang, Rada Mihalcea, Casey Kennington, Joyce Chai

General AI

Multimodal Large Language Models are increasingly adopted as autonomous agents in interactive environments, yet their ability to proactively address safety hazards remains insufficient. We introduce SafetyALFRED, built upon the embodied agent benchmark ALFRED, augmented with six categories of real-world kitchen hazards…

Review
pending
Role
unreviewed
Read
now
arxiv Score 23.3

K-MetBench: A Multi-Dimensional Benchmark for Fine-Grained Evaluation of Expert Reasoning, Locality, and Multimodality in Meteorology

2026-04-27 · Soyeon Kim, Cheongwoong Kang, Myeongjin Lee, Eun-Chul Chang, Jaedeok Lee, Jaesik Choi

General AI

The development of practical (multimodal) large language model assistants for Korean weather forecasters is hindered by the absence of a multidimensional, expert-level evaluation framework grounded in authoritative sources. To address this, we introduce K-MetBench, a diagnostic benchmark grounded in national qualificat…

Review
pending
Role
unreviewed
Read
now
arxiv Score 23.2

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

2026-04-29 · GLM-V Team, :, Wenyi Hong, Xiaotao Gu, Ziyang Pan, Zhen Yang, Yuting Wang, Yue Wang, Yuanchang Yue, Yu Wang, Yanling Wang, Yan Wang, Xijun Liu, Wenmeng Yu, Weihan Wang, Wei Li, Shuaiqi Duan, Sheng Yang, Ruiliang Lv, Mingdao Liu, Lihang Pan, Ke Ning, Junhui Ji, Jinjiang Wang, Jing Chen, Jiazheng Xu, Jiale Zhu, Jiale Cheng, Ji Qi, Guobing Gan, Guo Wang, Cong Yao, Zijun Dou, Zihao Zhou, Zihan Wang, Zhiqi Ge, Zhijie Li, Zhenyu Hou, Zhao Xue, Zehui Wang, Zehai He, Yusen Liu, Yukuo Cen, Yuchen Li, Yuan Wang, Yijian Lu, Yanzi Wang, Yadong Xue, Xinyu Zhang, Xinyu Liu, Wenkai Li, Tianyu Tong, Tianshu Zhang, Shengdong Yan, Qinkai Zheng, Mingde Xu, Licheng Bao, Jiaxing Xu, Jiaxin Fan, Jiawen Qian, Jiali Chen, Jiahui Lin, Haozhi Zheng, Haoran Wang, Haochen Li, Fan Yang, Dan Zhang, Chuangxin Zhao, Chengcheng Wu, Boyan Shi, Bowei Jia, Baoxu Wang, Peng Zhang, Debing Liu, Bin Xu, Juanzi Li, Minlie Huang, Yuxiao Dong, Jie Tang, V Team

General AI

We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environments, agentic capability depends not only on language reasoning, but also on the ability to perceive, interpret, and act over heterogeneous contexts such as images, video…

Review
pending
Role
unreviewed
Read
now
arxiv Score 23.0

Continual Learning in Large Language Models: Methods, Challenges, and Opportunities

2026-03-13 · Hongyang Chen, Zhongwu Sun, Hongfei Ye, Kunchi Li, Xuemin Lin

Research Track A · General AI

Continual learning (CL) has emerged as a pivotal paradigm to enable large language models (LLMs) to dynamically adapt to evolving knowledge and sequential tasks while mitigating catastrophic forgetting-a critical limitation of the static pre-training paradigm inherent to modern LLMs. This survey presents a comprehensiv…

Review
pending
Role
unreviewed
Read
now
arxiv Score 23.0

Temporal Memory for Resource-Constrained Agents: Continual Learning via Stochastic Compress-Add-Smooth

2026-03-31 · Michael Chertkov

Research Track A · General AI

An agent that operates sequentially must incorporate new experience without forgetting old experience, under a fixed memory budget. We propose a framework in which memory is not a parameter vector but a stochastic process: a Bridge Diffusion on a replay interval $[0,1]$, whose terminal marginal encodes the present and …

Review
pending
Role
unreviewed
Read
now
huggingface Score 23.0

Memory Intelligence Agent

2026-04-06 · Jingyang Qiao, Weicheng Meng, Yu Cheng, Zhihang Lin, Zhizhong Zhang, Xin Tan, Jingyu Gong, Kun Shao, Yuan Xie

General AI

Deep research agents (DRAs) integrate LLM reasoning with external tools. Memory systems enable DRAs to leverage historical experiences, which are essential for efficient reasoning and autonomous evolution. Existing methods rely on retrieving similar trajectories from memory to aid reasoning, while suffering from key li…

Review
pending
Role
unreviewed
Read
now
arxiv Score 23.0

Information as Structural Alignment: A Dynamical Theory of Continual Learning

2026-04-08 · Radu Negulescu

Research Track A · General AI

Catastrophic forgetting is not an engineering failure. It is a mathematical consequence of storing knowledge as global parameter superposition. Existing methods, such as regularization, replay, and frozen subnetworks, add external mechanisms to a shared-parameter substrate. None derives retention from the learning dyna…

Review
pending
Role
unreviewed
Read
now
arxiv Score 22.8

A Subgoal-driven Framework for Improving Long-Horizon LLM Agents

2026-03-20 · Taiyi Wang, Sian Gooding, Florian Hartmann, Oriana Riva, Edward Grefenstette

Research Track B · General AI

Large language model (LLM)-based agents have emerged as powerful autonomous controllers for digital environments, including mobile interfaces, operating systems, and web browsers. Web navigation, for example, requires handling dynamic content and long sequences of actions, making it particularly challenging. Existing L…

Review
pending
Role
unreviewed
Read
now
arxiv Score 22.8

Multi-Agent Video Recommenders: Evolution, Patterns, and Open Challenges

2026-04-02 · Srivaths Ranganathan, Abhishek Dharmaratnakar, Anushree Sinha, Debanshu Das

General AI

Video recommender systems are among the most popular and impactful applications of AI, shaping content consumption and influencing culture for billions of users. Traditional single-model recommenders, which optimize static engagement metrics, are increasingly limited in addressing the dynamic requirements of modern pla…

Review
pending
Role
unreviewed
Read
now
arxiv Score 22.8

OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks

2026-04-09 · Wenbo Hu, Xin Chen, Yan Gao-Tian, Yihe Deng, Nanyun Peng, Kai-Wei Chang

General AI

Group Relative Policy Optimization (GRPO) has emerged as the de facto Reinforcement Learning (RL) objective driving recent advancements in Multimodal Large Language Models. However, extending this success to open-source multimodal generalist models remains heavily constrained by two primary challenges: the extreme vari…

Review
pending
Role
unreviewed
Read
now
arxiv Score 22.8

MedHorizon: Towards Long-context Medical Video Understanding in the Wild

2026-05-07 · Bodong Du, Bowen Liu, Yang Yu, Xinpeng Ding, Zhiheng Wu, Shuning Wang, Shuo Nie, Naiming Liu, Qifeng Chen, Yangqiu Song, Xiaomeng Li

General AI

Medical multimodal large language models (MLLMs) have advanced image understanding and short-video analysis, but real clinical review often requires full-procedure video understanding. Unlike general long videos, medical procedures contain highly redundant anatomical views, while decisive evidence is temporally sparse,…

Review
pending
Role
unreviewed
Read
now
huggingface Score 22.5

PersonaVLM: Long-Term Personalized Multimodal LLMs

2026-03-20 · Chang Nie, Chaoyou Fu, Yifan Zhang, Haihua Yang, Caifeng Shan

General AI

Multimodal Large Language Models (MLLMs) serve as daily assistants for millions. However, their ability to generate responses aligned with individual preferences remains limited. Prior approaches enable only static, single-turn personalization through input augmentation or output alignment, and thus fail to capture use…

Review
pending
Role
unreviewed
Read
now
huggingface Score 22.5

WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models

2026-04-20 · Xinping Lei, Xinyu Che, Junqi Xiong, Chenchen Zhang, Yukai Huang, Chenyu Zhou, Haoyang Huang, Minghao Liu, Letian Zhu, Hongyi Ye, Jinhua Hao, Ken Deng, Zizheng Zhan, Han Li, Dailin Li, Yifan Yao, Ming Sun, Zhaoxiang Zhang, Jiaheng Liu

General AI

Large language models are rapidly evolving into interactive coding agents capable of end-to-end web coding, yet existing benchmarks evaluate only narrow slices of this capability, typically text-conditioned generation with static-correctness metrics, leaving visual fidelity, interaction quality, and codebase-level reas…

Review
pending
Role
unreviewed
Read
now
arxiv Score 22.5

Region4Web: Rethinking Observation Space Granularity for Web Agents

2026-05-08 · Donguk Kwon, Dongha Lee

Research Track B · General AI

Web agents perceive web pages through an observation space, yet its granularity has remained an underexamined design choice. Existing work treats observation at the same element-level granularity as the action space, leaving the page's functional organization implicit and forcing the agent to infer it from element-leve…

Review
pending
Role
unreviewed
Read
now
huggingface Score 22.4

Audio-Visual Intelligence in Large Foundation Models

2026-05-05 · You Qin, Kai Liu, Shengqiong Wu, Kai Wang, Shijian Deng, Yapeng Tian, Junbin Xiao, Yazhou Xing, Yinghao Ma, Bobo Li, Roger Zimmermann, Lei Cui, Furu Wei, Jiebo Luo, Hao Fei

General AI

Audio-Visual Intelligence (AVI) has emerged as a central frontier in artificial intelligence, bridging auditory and visual modalities to enable machines that can perceive, generate, and interact in the multimodal real world. In the era of large foundation models, joint modeling of audio and vision has become increasing…

Review
pending
Role
unreviewed
Read
now
arxiv Score 22.3

V-tableR1: Process-Supervised Multimodal Table Reasoning with Critic-Guided Policy Optimization

2026-04-22 · Yubo Jiang, Yitong An, Xin Yang, Abudukelimu Wuerkaixi, Xuxin Cheng, Fengying Xie, Zhiguo Jiang, Cao Liu, Ke Zeng, Haopeng Zhang

General AI

We introduce V-tableR1, a process-supervised reinforcement learning framework that elicits rigorous, verifiable reasoning from multimodal large language models (MLLMs). Current MLLMs trained solely on final outcomes often treat visual reasoning as a black box, relying on superficial pattern matching rather than perform…

Review
pending
Role
unreviewed
Read
now
arxiv Score 22.3

AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

2026-05-12 · Runhui Huang, Jie Wu, Rui Yang, Zhe Liu, Hengshuang Zhao

General AI

In this paper, we propose AlphaGRPO, a novel framework that applies Group Relative Policy Optimization (GRPO) to AR-Diffusion Unified Multimodal Models (UMMs) to enhance multimodal generation capabilities without an additional cold-start stage. Our approach unlocks the model's intrinsic potential to perform advanced re…

Review
pending
Role
unreviewed
Read
now
arxiv Score 22.2

DenseStep2M: A Scalable, Training-Free Pipeline for Dense Instructional Video Annotation

2026-04-29 · Mingji Ge, Qirui Chen, Zeqian Li, Weidi Xie

General AI

Long-term video understanding requires interpreting complex temporal events and reasoning over procedural activities. While instructional video corpora, like HowTo100M, offer rich resources for model training, they present significant challenges, including noisy ASR transcripts and inconsistent temporal alignments betw…

Review
pending
Role
unreviewed
Read
now
arxiv Score 22.0

Improving Sparse Memory Finetuning

2026-04-06 · Satyam Goyal, Anirudh Kanchi, Garv Shah, Prakhar Gupta

Research Track A · General AI

Large Language Models (LLMs) are typically static after training, yet real-world applications require continual adaptation to new knowledge without degrading existing capabilities. Standard approaches to updating models, like full finetuning or parameter-efficient methods (e.g., LoRA), face a fundamental trade-off: cat…

Review
pending
Role
unreviewed
Read
now
arxiv Score 22.0

WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks

2026-04-07 · Guruprasad Viswanathan Ramesh, Asmit Nayak, Basieem Siddique, Kassem Fawaz

Research Track B · General AI

Web agents automate browser tasks, ranging from simple form completion to complex workflows like ordering groceries. While current benchmarks evaluate general-purpose performance~(e.g., WebArena) or safety against malicious actions~(e.g., SafeArena), no existing framework assesses an agent's ability to successfully exe…

Review
pending
Role
unreviewed
Read
now
arxiv Score 22.0

COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling

2026-04-22 · Noah Flynn

Research Track A · General AI

Large language models (LLMs) often exhibit performance disparities across languages, with naive multilingual fine-tuning frequently degrading performance due to negative cross-lingual interference. To address this, we introduce COMPASS (COntinual Multilingual PEFT with Adaptive Semantic Sampling), a novel data-centric …

Review
pending
Role
unreviewed
Read
now
arxiv Score 22.0

CRAFT: Forgetting-Aware Intervention-Based Adaptation for Continual Learning

2026-05-07 · Md Anwar Hossen, Fatema Siddika, Juan Pablo Munoz, Tanya Roosta, Ali Jannesari

Research Track A · General AI

Large language models (LLMs) can acquire new capabilities through fine-tuning, but continual adaptation often leads to catastrophic forgetting. We propose CRAFT, a continual learning framework that avoids updating model weights by instead learning low-rank interventions on hidden representations. CRAFT proceeds in thre…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.9

AutoSurfer -- Teaching Web Agents through Comprehensive Surfing, Learning, and Modeling

2026-04-29 · Fazle Elahi Faisal, Qianhui Wu, Baolin Peng, Jianfeng Gao

Research Track B · General AI

Recent advances in multimodal large language models (LLMs) have revolutionized web agents that can automate complex tasks on websites. However, their accuracy remains limited by the scarcity of high-quality web trajectory training data. Existing automatic trajectory generation methods suffer from incomplete website cov…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.8

Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos

2026-03-23 · Shoubin Yu, Lei Shu, Antoine Yang, Yao Fu, Srinivas Sunkara, Maria Wang, Jindong Chen, Mohit Bansal, Boqing Gong

Research Track B · General AI

Multimodal AI agents are increasingly automating complex real-world workflows that involve online web execution. However, current web-agent benchmarks suffer from a critical limitation: they focus entirely on web-based interaction and perception, lacking grounding in the user's real-world physical surroundings. This li…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.8

Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models

2026-04-09 · Shilin Yan, Jintao Tong, Hongwei Xue, Xiaojun Tang, Yangyang Wang, Kunyu Shi, Guannan Zhang, Ruixuan Li, Yixiong Zou

General AI

The advent of agentic multimodal models has empowered systems to actively interact with external environments. However, current agents suffer from a profound meta-cognitive deficit: they struggle to arbitrate between leveraging internal knowledge and querying external utilities. Consequently, they frequently fall prey …

Review
pending
Role
unreviewed
Read
now
huggingface Score 21.5

OmniScript: Towards Audio-Visual Script Generation for Long-Form Cinematic Video

2026-04-13 · Junfu Pu, Yuxin Chen, Teng Wang, Ying Shan

General AI

Current multimodal large language models (MLLMs) have demonstrated remarkable capabilities in short-form video understanding, yet translating long-form cinematic videos into detailed, temporally grounded scripts remains a significant challenge. This paper introduces the novel video-to-script (V2S) task, aiming to gener…

Review
pending
Role
unreviewed
Read
now
huggingface Score 21.5

EasyVideoR1: Easier RL for Video Understanding

2026-04-18 · Chuanyu Qin, Chenxu Yang, Qingyi Si, Naibin Gu, Dingyu Yao, Zheng Lin, Peng Fu, Nan Duan, Jiaqi Wang

General AI

Reinforcement learning from verifiable rewards (RLVR) has demonstrated remarkable effectiveness in improving the reasoning capabilities of large language models. As models evolve into natively multimodal architectures, extending RLVR to video understanding becomes increasingly important yet remains largely unexplored, …

Review
pending
Role
unreviewed
Read
now
huggingface Score 21.5

Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks

2026-04-22 · Xiyang Wu, Zongxia Li, Guangyao Shi, Alexander Duffy, Tyler Marques, Matthew Lyle Olson, Tianyi Zhou, Dinesh Manocha

General AI

Long horizon interactive environments are a testbed for evaluating agents skill usage abilities. These environments demand multi step reasoning, the chaining of multiple skills over many timesteps, and robust decision making under delayed rewards and partial observability. Games are a good testbed for evaluating agent …

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.5

Cortex-Inspired Continual Learning: Unsupervised Instantiation and Recovery of Functional Task Networks

2026-04-27 · Kevin McKee, Thomas Hazy, Yicong Zheng, Zacharie Bugaud, Thomas Miconi

Research Track A · General AI

Block-sequential continual learning demands that a single model both protect prior solutions from catastrophic forgetting and efficiently infer at inference time which prior solution matches the current input without task labels. We present Functional Task Networks (FTN), a parameter-isolation method inspired by struct…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.5

Attribution-Guided Continual Learning for Large Language Models

2026-05-06 · Yazheng Liu, Yuxuan Wan, Rui Xu, Xi Zhang, Sihong Xie, Hui Xiong

Research Track A · General AI

Large language models (LLMs) often suffer from catastrophic forgetting in continual learning: after learning new tasks sequentially, they perform worse on earlier tasks. Existing methods mitigate catastrophic forgetting by data replay, parameter freezing, or regularization. However, these methods lack semantic awarenes…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.4

HERCULES: Hardware-Efficient, Robust, Continual Learning Neural Architecture Search

2026-05-03 · Matteo Gambella, Fabrizio Pittorino, Manuel Roveri

Research Track A · General AI

Neural Architecture Search (NAS) has emerged as a powerful framework for automatically discovering neural architectures that balance accuracy and efficiency. However, as AI transitions from static benchmarks to real-world deployment, the traditional focus on hardware-aware efficiency is no longer sufficient. We observe…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.3

Collaborative Multi-Agent Scripts Generation for Enhancing Imperfect-Information Reasoning in Murder Mystery Games

2026-04-13 · Keyang Zhong, Junlin Xie, Hefeng Wu, Haofeng Li, Guanbin Li

General AI

Vision-language models (VLMs) have shown impressive capabilities in perceptual tasks, yet they degrade in complex multi-hop reasoning under multiplayer game settings with imperfect and deceptive information. In this paper, we study a representative multiplayer task, Murder Mystery Games, which require inferring hidden …

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.3

GAM: Hierarchical Graph-based Agentic Memory for LLM Agents

2026-04-14 · Zhaofen Wu, Hanrong Zhang, Fulin Lin, Wujiang Xu, Xinran Xu, Yankai Chen, Henry Peng Zou, Shaowen Chen, Weizhi Zhang, Xue Liu, Philip S. Yu, Hongwei Wang

Research Track A · General AI

To sustain coherent long-term interactions, Large Language Model (LLM) agents must navigate the tension between acquiring new information and retaining prior knowledge. Current unified stream-based memory systems facilitate context updates but remain vulnerable to interference from transient noise. Conversely, discrete…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.3

MASS-RAG: Multi-Agent Synthesis Retrieval-Augmented Generation

2026-04-20 · Xingchen Xiao, Heyan Huang, Runheng Liu, Jincheng Xie

General AI

Large language models (LLMs) are widely used in retrieval-augmented generation (RAG) to incorporate external knowledge at inference time. However, when retrieved contexts are noisy, incomplete, or heterogeneous, a single generation process often struggles to reconcile evidence effectively. We propose \textbf{MASS-RAG},…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.3

SpaMEM: Benchmarking Dynamic Spatial Reasoning via Perception-Memory Integration in Embodied Environments

2026-04-24 · Chih-Ting Liao, Xi Xiao, Chunlei Meng, Zhangquan Chen, Yitong Qiao, Weilin Zhou, Tianyang Wang, Xu Zheng, Xin Cao

General AI

Multimodal large language models (MLLMs) have advanced static visual--spatial reasoning, yet they often fail to preserve long-horizon spatial coherence in embodied settings where beliefs must be continuously revised from egocentric observations under environmental change. We introduce SpaMEM (Spatial Memory from Action…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.3

Joint sparse coding and temporal dynamics support context reconfiguration

2026-05-11 · Qianqian Shi, Yue Che, Faqiang Liu, Hongyi Li, Mingkun Xu, Sandra Reinert, Pieter M. Goltstein, Rong Zhao, Luping Shi

Research Track A

Adaptive behavior requires the brain to transition between distinct contexts while maintaining representations of prior experience. The ability to reconfigure neural representations without erasing previously acquired knowledge is central to learning in dynamic environments, yet the neural mechanisms that support this …

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.3

Beyond Localization: A Comprehensive Diagnosis of Perspective-Conditioned Spatial Reasoning in MLLMs from Omnidirectional Images

2026-05-12 · Yuangong Chen, Wai Keung Wong, Jiaxing Li, Ioannis Patras, Xu Zheng

General AI

Multimodal Large Language Models (MLLMs) show strong visual perception, yet remain limited in reasoning about space under changing viewpoints. We study this challenge as Perspective-Conditioned Spatial Reasoning (PCSR) in 360-degree omnidirectional images, where broad scene coverage reduces ambiguity from partial obser…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.3

KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference

2026-05-12 · Alireza Nadali, Patrick Cooper, Ashutosh Trivedi, Alvaro Velasquez

General AI

We introduce KV-Fold, a simple, training-free long-context inference protocol that treats the key-value (KV) cache as the accumulator in a left fold over sequence chunks. At each step, the model processes the next chunk conditioned on the accumulated cache, appends the newly produced keys and values, and passes the enl…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.2

Agentic AI in the Software Development Lifecycle: Architecture, Empirical Evidence, and the Reshaping of Software Engineering

2026-04-29 · Happy Bhati

General AI

The arrival of large language models (LLMs) capable of multi-step reasoning, tool use, and long-horizon planning has produced a qualitative shift in software engineering. Where earlier code-completion tools such as GitHub Copilot operated at the granularity of a line or function, modern agentic systems -- Claude Code, …

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.2

Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

2026-05-01 · Derong Xu, Shuochen Liu, Pengfei Luo, Pengyue Jia, Yingyi Zhang, Yi Wen, Yimin Deng, Wenlin Zhang, Enhong Chen, Xiangyu Zhao, Tong Xu

General AI

Large language model (LLM) agents require long-term user memory for consistent personalization, but limited context windows hinder tracking evolving preferences over long interactions. Existing memory systems mainly rely on static, hand-crafted update rules; although reinforcement learning (RL)-based agents learn memor…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.2

Towards Multi-Agent Autonomous Reasoning in Hydrodynamics

2026-05-01 · Jinpai Zhao, Albert Cerrone, Joannes Westerink, Clint Dawson

General AI

Single-agent systems (SAS) have become the default pattern for LLM-driven scientific workflows, but routing planning, tool use, and synthesis through a single context window comes with a well-known cost: as tool specifications and observational traces accumulate, the effective context available for each decision shrink…

Review
pending
Role
unreviewed
Read
now
huggingface Score 21.0

FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol

2026-03-26 · Jie Zhu, Yimin Tian, Boyang Li, Kehao Wu, Zhongzhi Liang, Junhui Li, Xianyin Zhang, Lifan Guo, Feng Chen, Yong Liu, Chi Zhang

General AI

This paper introduces FinMCP-Bench, a novel benchmark for evaluating large language models (LLMs) in solving real-world financial problems through tool invocation of financial model context protocols. FinMCP-Bench contains 613 samples spanning 10 main scenarios and 33 sub-scenarios, featuring both real and synthetic us…

Review
pending
Role
unreviewed
Read
now
huggingface Score 21.0

Learning to Retrieve from Agent Trajectories

2026-03-30 · Yuqi Zhou, Sunhao Dai, Changle Qu, Liang Pang, Jun Xu, Ji-Rong Wen

General AI

Information retrieval (IR) systems have traditionally been designed and trained for human users, with learning-to-rank methods relying heavily on large-scale human interaction logs such as clicks and dwell time. With the rapid emergence of large language model (LLM) powered search agents, however, retrieval is increasi…

Review
pending
Role
unreviewed
Read
now
huggingface Score 21.0

Marco DeepResearch: Unlocking Efficient Deep Research Agents via Verification-Centric Design

2026-03-30 · Bin Zhu, Qianghuai Jia, Tian Lan, Junyang Ren, Feng Gu, Feihu Jiang, Longyue Wang, Zhao Xu, Weihua Luo

General AI

Deep research agents autonomously conduct open-ended investigations, integrating complex information retrieval with multi-step reasoning across diverse sources to solve real-world problems. To sustain this capability on long-horizon tasks, reliable verification is critical during both training and inference. A major bo…

Review
pending
Role
unreviewed
Read
now
huggingface Score 21.0

Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search

2026-04-09 · Chuzhan Hao, Wenfeng Feng, Guochao Jiang, Guofeng Quan, Guohua Liu, Yuewei Zhang

General AI

Reinforcement learning (RL) has become an effective approach for advancing the reasoning capabilities of large language models (LLMs) through the strategic integration of external search engines. However, current RL-based search agents often rely on a process of stochastic exploration guided by carefully crafted outcom…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.0

WebXSkill: Skill Learning for Autonomous Web Agents

2026-04-14 · Zhaoyang Wang, Qianhui Wu, Xuchao Zhang, Chaoyun Zhang, Wenlin Yao, Fazle Elahi Faisal, Baolin Peng, Si Qin, Suman Nath, Qingwei Lin, Chetan Bansal, Dongmei Zhang, Saravan Rajmohan, Jianfeng Gao, Huaxiu Yao

Research Track B · General AI

Autonomous web agents powered by large language models (LLMs) have shown promise in completing complex browser tasks, yet they still struggle with long-horizon workflows. A key bottleneck is the grounding gap in existing skill formulations: textual workflow skills provide natural language guidance but cannot be directl…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.0

Recovery Guarantees for Continual Learning of Dependent Tasks: Memory, Data-Dependent Regularization, and Data-Dependent Weights

2026-04-19 · Liangzu Peng, Uday Kiran Reddy Tadipatri, Ziqing Xu, Eric Eaton, René Vidal

Research Track A · General AI

Continual learning (CL) is concerned with learning multiple tasks sequentially without forgetting previously learned tasks. Despite substantial empirical advances over recent years, the theoretical development of CL remains in its infancy. At the heart of developing CL theory lies the challenge that the data distributi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.0

WebUncertainty: Dual-Level Uncertainty Driven Planning and Reasoning For Autonomous Web Agent

2026-04-20 · Lingfeng Zhang, yongan sun, Jinpeng Hu, Hui Ma, yang ying, Kuien Liu, Zenglin Shi, Meng Wang, Yongan Sun, Yang Ying

Research Track B · General AI

Recent advancements in large language models (LLMs) have empowered autonomous web agents to execute natural language instructions directly on real-world webpages. However, existing agents often struggle with complex tasks involving dynamic interactions and long-horizon execution due to rigid planning strategies and hal…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.0

FSFM: A Biologically-Inspired Framework for Selective Forgetting of Agent Memory

2026-04-22 · Yingjie Gu, Bo Xiong, Yijuan Guo, Chao Li, Xiaojing Zhang, Liqiang Wang, Pengcheng Ren, Qi Sun, Jingyao Ma, Shidang Shi

Research Track A · General AI

For LLM agents, memory management critically impacts efficiency, quality, and security. While much research focuses on retention, selective forgetting--inspired by human cognitive processes (hippocampal indexing/consolidation theory and Ebbinghaus forgetting curve)--remains underexplored. We argue that in resource-cons…

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.8

CL-VISTA: Benchmarking Continual Learning in Video Large Language Models

2026-04-01 · Haiyang Guo, Yichen Shi, Fei Zhu, Wenzhuo Liu, Hongbo Zhao, Fanhu Zeng, Shijie Ma, Da-Han Wang, Xu-Yao Zhang

Research Track A · General AI

Video Large Language Models (Video-LLMs) require continual learning to adapt to non-stationary real-world data. However, existing benchmarks fall short of evaluating modern foundation models: many still rely on models without large-scale pre-training, and prevailing benchmarks typically partition a single dataset into …

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.8

MA-IDS: Multi-Agent RAG Framework for IoT Network Intrusion Detection with an Experience Library

2026-04-07 · Md Shamimul Islam, Luis G. Jaimes, Ayesha S. Dina

Research Track A · General AI

Network Intrusion Detection Systems (NIDS) face important limitations. Signature-based methods are effective for known attack patterns, but they struggle to detect zero-day attacks and often miss modified variants of previously known attacks, while many machine learning approaches offer limited interpretability. These …

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.8

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

2026-05-07 · Tianle Wang, Zhaoyang Wang, Guangchen Lan, Xinpeng Wei, Sipeng Zhang, Guanwen Qiu, Abulhair Saparov

General AI

Reinforcement learning (RL) has been applied to improve large language model (LLM) reasoning, yet the systematic study of how training scales with task difficulty has been hampered by the lack of controlled, scalable environments. We introduce ScaleLogic, a synthetic logical reasoning framework that offers independent …

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.8

Consolidation-Expansion Operator Mechanics:A Unified Framework for Adaptive Learning

2026-05-11 · Debashis Guha

Research Track A · General AI

Every adaptive learning system must alternate between two operations: consolidating what it already knows and expanding into new evidence. We propose \emph{Consolidation-Expansion Operator Mechanics} (OpMech), a framework that makes this structure precise. The central object is the \emph{order-gap} $\Ogap(θ; e)$, the d…

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.5

AI Planning Framework for LLM-Based Web Agents

2026-03-13 · Orit Shahnovsky, Rotem Dror

Research Track B · General AI

Developing autonomous agents for web-based tasks is a core challenge in AI. While Large Language Model (LLM) agents can interpret complex user requests, they often operate as black boxes, making it difficult to diagnose why they fail or how they plan. This paper addresses this gap by formally treating web tasks as sequ…

Review
pending
Role
unreviewed
Read
now
huggingface Score 20.5

CocoaBench: Evaluating Unified Digital Agents in the Wild

2026-04-13 · CocoaBench Team, Shibo Hao, Zhining Zhang, Zhiqi Liang, Tianyang Liu, Yuheng Zha, Qiyue Gao, Jixuan Chen, Zilong Wang, Zhoujun Cheng, Haoxiang Zhang, Junli Wang, Hexi Jin, Boyuan Zheng, Kun Zhou, Yu Wang, Feng Yao, Licheng Liu, Yijiang Li, Zhifei Li, Zhengtao Han, Pracha Promthaw, Tommaso Cerruti, Xiaohan Fu, Ziqiao Ma, Jingbo Shang, Lianhui Qin, Julian McAuley, Eric P. Xing, Zhengzhong Liu, Rupesh Kumar Srivastava, Zhiting Hu

General AI

LLM agents now perform strongly in software engineering, deep research, GUI automation, and various other applications, while recent agent scaffolds and models are increasingly integrating these capabilities into unified systems. Yet, most evaluations still test these capabilities in isolation, which leaves a gap for m…

Review
pending
Role
unreviewed
Read
now
huggingface Score 20.5

DR^{3}-Eval: Towards Realistic and Reproducible Deep Research Evaluation

2026-04-16 · Qianqian Xie, Qingheng Xiong, He Zhu, Tiantian Xia, Xueming Han, Fanyu Meng, Jiakai Wang, Zhiqi Bai, Chengkang Jiang, Zhaohui Wang, Yubin Guo, Yuqing Wen, Jiayang Mao, Zijie Zhang, Shihao Li, Yanghai Wang, Yuxiang Ren, Junlan Feng, Jiaheng Liu

General AI

Deep Research Agents (DRAs) aim to solve complex, long-horizon research tasks involving planning, retrieval, multimodal understanding, and report generation, yet their evaluation remains challenging due to dynamic web environments and ambiguous task definitions. We propose DR^{3}-Eval, a realistic and reproducible benc…

Review
pending
Role
unreviewed
Read
now
huggingface Score 20.5

UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards

2026-04-16 · Jun Wang, Shuo Tan, Zelong Sun, Tiancheng Gu, Yongle Zhao, Ziyong Feng, Kaicheng Yang, Cewu Lu

General AI

Retrieval-Augmented Generation (RAG) extends Large Vision-Language Models (LVLMs) with external visual knowledge. However, existing visual RAG systems typically rely on generic retrieval signals that overlook the fine-grained visual semantics essential for complex reasoning. To address this limitation, we propose UniDo…

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.5

Hidden Failure Modes of Gradient Modification under Adam in Continual Learning, and Adaptive Decoupled Moment Routing as a Repair

2026-04-24 · Yuelin Hu, Zhenbo Yu, Zhengxue Cheng, Wei Liu, Li Song

Research Track A · General AI

Many continual-learning methods modify gradients upstream (e.g., projection, penalty rescaling, replay mixing) while treating Adam as a neutral backend. We show this composition has a hidden failure mode. In a high-overlap, non-adaptive 8-domain continual LM, all shared-routing projection baselines collapse close to va…

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.5

Continual Knowledge Updating in LLM Systems: Learning Through Multi-Timescale Memory Dynamics

2026-05-06 · Andreas Pattichis, Constantine Dovrolis

Research Track A · General AI

LLMs are trained once, then deployed into a world that never stops changing. External memory compensates for this, but most systems manage it explicitly rather than letting it adapt on its own. Biological memory works differently: coupled multi-timescale dynamics make new associations immediately usable, strengthen wha…

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.3

Drawing on Memory: Dual-Trace Encoding Improves Cross-Session Recall in LLM Agents

2026-04-14 · Benjamin Stern, Peter Nadel

General AI

LLM agents with persistent memory store information as flat factual records, providing little context for temporal reasoning, change tracking, or cross-session aggregation. Inspired by the drawing effect [3], we introduce dual-trace memory encoding. In this method, each stored fact is paired with a concrete scene trace…

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.3

From Recall to Forgetting: Benchmarking Long-Term Memory for Personalized Agents

2026-04-21 · Md Nayem Uddin, Kumar Shubham, Eduardo Blanco, Chitta Baral, Gengyu Wang

Research Track A · General AI

Personalized agents that interact with users over long periods must maintain persistent memory across sessions and update it as circumstances change. However, existing benchmarks predominantly frame long-term memory evaluation as fact retrieval from past conversations, providing limited insight into agents' ability to …

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.2

Exploration Hacking: Can LLMs Learn to Resist RL Training?

2026-04-30 · Eyon Jang, Damon Falck, Joschka Braun, Nathalie Kirch, Achu Menon, Perusha Moodley, Scott Emmons, Roland S. Zimmermann, David Lindner

General AI

Reinforcement learning (RL) has become essential to the post-training of large language models (LLMs) for reasoning, agentic capabilities and alignment. Successful RL relies on sufficient exploration of diverse actions by the model during training, which creates a potential failure mode: a model could strategically alt…

Review
pending
Role
unreviewed
Read
now
huggingface Score 20.0

ImplicitMemBench: Measuring Unconscious Behavioral Adaptation in Large Language Models

2026-04-09 · Chonghan Qin, Xiachong Feng, Weitao Ma, Xiaocheng Feng, Lingpeng Kong

General AI

Existing memory benchmarks for LLM agents evaluate explicit recall of facts, yet overlook implicit memory where experience becomes automated behavior without conscious retrieval. This gap is critical: effective assistants must automatically apply learned procedures or avoid failed actions without explicit reminders. We…

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.0

Task Switching Without Forgetting via Proximal Decoupling

2026-04-20 · Pourya Shamsolmoali, Masoumeh Zareapoor, Eric Granger, William A. P. Smith, Yue Lu

Research Track A · General AI

In continual learning, the primary challenge is to learn new information without forgetting old knowledge. A common solution addresses this trade-off through regularization, penalizing changes to parameters critical for previous tasks. In most cases, this regularization term is directly added to the training loss and o…

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.0

CoMemNet: Contrastive Sampling with Memory Replay Network for Continual Traffic Prediction

2026-05-07 · Mei Wu, Wenchao Weng, Wenxin Su, Wenjie Tang, Wei Zhou

Research Track A · General AI

In recent years, the integration of non-topological space modeling with temporal learning methods has emerged as an effective approach for capturing spatio-temporal information in non-Euclidean graphs. However, most existing methods rely on static underlying graph structures, which are inadequate for capturing the cont…

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.0

Intrinsic Vicarious Conditioning for Deep Reinforcement Learning

2026-05-12 · Rodney A Sanchez, Ferat Sahin, Alex Ororbia, Jamison Heard

Research Track A · General AI

Advancements in reinforcement learning have produced a variety of complex and useful intrinsic driving forces; crucially, these drivers operate under a direct conditioning paradigm. This form of conditioning limits our agents' capacity by restricting how they learn from the environment as well as from others. Off-polic…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.9

cotomi Act: Learning to Automate Work by Watching You

2026-05-04 · Masafumi Oyamada, Kunihiro Takeoka, Kosuke Akimoto, Ryoma Obara, Masafumi Enomoto, Haochen Zhang, Daichi Haraguchi, Takuya Tamura

Research Track B · General AI

What if a browser agent could learn your work simply by watching you do it? We present cotomi Act, a browser-based computer-using agent that combines reliable multi-step task execution with persistent organizational knowledge learned from user behavior. For execution, an agent scaffold with adaptive lazy observation, v…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.8

Optimizing LLM Prompt Engineering with DSPy Based Declarative Learning

2026-04-06 · Shiek Ruksana, Sailesh Kiran Kurra, Thipparthi Sanjay Baradwaj

General AI

Large Language Models (LLMs) have shown strong performance across a wide range of natural language processing tasks; however, their effectiveness is highly dependent on prompt design, structure, and embedded reasoning signals. Conventional prompt engineering methods largely rely on heuristic trial-and-error processes, …

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.8

AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation

2026-04-09 · Ziwei Zhou, Zeyuan Lai, Rui Wang, Yifan Yang, Zhen Xing, Yuqing Yang, Qi Dai, Lili Qiu, Chong Luo

General AI

Text-to-Audio-Video (T2AV) generation is rapidly becoming a core interface for media creation, yet its evaluation remains fragmented. Existing benchmarks largely assess audio and video in isolation or rely on coarse embedding similarity, failing to capture the fine-grained joint correctness required by realistic prompt…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.8

EigentSearch-Q+: Enhancing Deep Research Agents with Structured Reasoning Tools

2026-04-09 · Boer Zhang, Mingyan Wu, Dongzhuoran Zhou, Yuqicheng Zhu, Wendong Fan, Puzhen Zhang, Zifeng Ding, Guohao Li, Yuan He

Research Track B · General AI

Deep research requires reasoning over web evidence to answer open-ended questions, and it is a core capability for AI agents. Yet many deep research agents still rely on implicit, unstructured search behavior that causes redundant exploration and brittle evidence aggregation. Motivated by Anthropic's "think" tool parad…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.5

FeDMRA: Federated Incremental Learning with Dynamic Memory Replay Allocation

2026-03-30 · Tiantian Wang, Xiang Xiang, Simon S. Du

Research Track A · General AI

In federated healthcare systems, Federated Class-Incremental Learning (FCIL) has emerged as a key paradigm, enabling continuous adaptive model learning among distributed clients while safeguarding data privacy. However, in practical applications, data across agent nodes within the distributed framework often exhibits n…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.5

GUIDE: Interpretable GUI Agent Evaluation via Hierarchical Diagnosis

2026-04-06 · Yuwen Zhai, Runze Li, Liang Wang, Nian Shi, Liwu Xu, Wei Zhang, Ran Lin, Bo Xu, Benlei Cui

Research Track B · General AI

Evaluating GUI agents presents a distinct challenge: trajectories are long, visually grounded, and open-ended, yet evaluation must be both accurate and interpretable. Existing approaches typically apply a single holistic judgment over the entire action-observation sequence-a strategy that proves unreliable on long-hori…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.5

KAN-CL: Per-Knot Importance Regularization for Continual Learning with Kolmogorov-Arnold Networks

2026-05-12 · Minjong Cheon

Research Track A · General AI

Catastrophic forgetting remains the central obstacle in continual learning (CL): parameters shared across tasks interfere with one another, and existing regularization methods such as EWC and SI apply uniform penalties without awareness of which input region a parameter serves. We propose KAN-CL, a continual learning f…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.5

ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging

2026-05-12 · Neha Verma, Nikhil Mehta, Shao-Chuan Wang, Naijing Zhang, Alicia Tsai, Li Wei, Lukasz Heldt, Lichan Hong, Ed Chi, Xinyang Yi

Research Track A · General AI

Despite the rapid advancements in large language model (LLM) development, fine-tuning them for specific tasks often results in the catastrophic forgetting of their general, language-based reasoning abilities. This work investigates and addresses this challenge in the context of the Generative Retrieval (GenRetrieval) t…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.4

When Continual Learning Moves to Memory: A Study of Experience Reuse in LLM Agents

2026-04-29 · Qisheng Hu, Quanyu Long, Wenya Wang

Research Track A · General AI

Memory-augmented LLM agents offer an appealing shortcut to continual learning: rather than updating model parameters, they accumulate experience in external memory, seemingly sidestepping the stability-plasticity dilemma of parametric learning. We show that this challenge does not disappear but resurfaces at the memory…

Review
pending
Role
unreviewed
Read
now
huggingface Score 19.4

Heterogeneous Scientific Foundation Model Collaboration

2026-04-30 · Zihao Li, Jiaru Zou, Feihao Fang, Xuying Ning, Mengting Ai, Tianxin Wei, Sirui Chen, Xiyuan Yang, Jingrui He

General AI

Agentic large language model systems have demonstrated strong capabilities. However, their reliance on language as the universal interface fundamentally limits their applicability to many real-world problems, especially in scientific domains where domain-specific foundation models have been developed to address special…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.3

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

2026-04-13 · Fei Tang, Zhiqiong Lu, Boxuan Zhang, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen

Research Track B · General AI

GUI agents drive applications through their visual interfaces instead of programmatic APIs, interacting with arbitrary software via taps, swipes, and keystrokes, reaching a long tail of applications that CLI-based agents cannot. Yet progress in this area is bottlenecked less by modeling capacity than by the absence of …

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.3

Escaping the Context Bottleneck: Active Context Curation for LLM Agents via Reinforcement Learning

2026-04-13 · Xiaozhe Li, Tianyi Lyu, Yizhao Yang, Liang Shan, Siyi Yang, Ligao Zhang, Zhuoyi Huang, Qingwen Liu, Yang Li

Research Track B · General AI

Large Language Models (LLMs) struggle with long-horizon tasks due to the "context bottleneck" and the "lost-in-the-middle" phenomenon, where accumulated noise from verbose environments degrades reasoning over multi-turn interactions. To address this issue, we introduce a symbiotic framework that decouples context manag…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.3

Multi-ORFT: Stable Online Reinforcement Fine-Tuning for Multi-Agent Diffusion Planning in Cooperative Driving

2026-04-13 · Haojie Bai, Aimin Li, Ruoyu Yao, Xiongwei Zhao, Tingting Zhang, Xing Zhang, Lin Gao, and Jun Ma

General AI

Closed-loop cooperative driving requires planners that generate realistic multimodal multi-agent trajectories while improving safety and traffic efficiency. Existing diffusion planners can model multimodal behaviors from demonstrations, but they often exhibit weak scene consistency and remain poorly aligned with closed…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.3

From Reactive to Proactive: Assessing the Proactivity of Voice Agents via ProVoice-Bench

2026-04-16 · Ke Xu, Yuhao Wang, Yu Wang

General AI

Recent advancements in LLM agents are gradually shifting from reactive, text-based paradigms toward proactive, multimodal interaction. However, existing benchmarks primarily focus on reactive responses, overlooking the complexities of proactive intervention and monitoring. To bridge this gap, we introduce ProVoice-Benc…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.3

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

2026-04-16 · Yan Li, Zezi Zeng, Yifan Yang, Yuqing Yang, Ning Liao, Weiwei Guo, Lili Qiu, Mingxi Cheng, Qi Dai, Zhendong Wang, Zhengyuan Yang, Xue Yang, Ji Li, Lijuan Wang, Chong Luo

Research Track B · General AI

The rapid progress of Artificial Intelligence Generated Content (AIGC) tools enables images, videos, and visualizations to be created on demand for webpage design, offering a flexible and increasingly adopted paradigm for modern UI/UX. However, directly integrating such tools into automated webpage generation often lea…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.3

SSL-R1: Self-Supervised Visual Reinforcement Post-Training for Multimodal Large Language Models

2026-04-22 · Jiahao Xie, Alessio Tonioni, Nathalie Rauschmayr, Federico Tombari, Bernt Schiele

General AI

Reinforcement learning (RL) with verifiable rewards (RLVR) has demonstrated the great potential of enhancing the reasoning abilities in multimodal large language models (MLLMs). However, the reliance on language-centric priors and expensive manual annotations prevents MLLMs' intrinsic visual understanding and scalable …

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.3

ORCE: Order-Aware Alignment of Verbalized Confidence in Large Language Models

2026-05-12 · Chen Li, Xiaoling Hu, Songzhu Zheng, Jiawei Zhou, Chao Chen

General AI

Large language models (LLMs) often produce answers with high certainty even when they are incorrect, making reliable confidence estimation essential for deployment in real-world scenarios. Verbalized confidence, where models explicitly state their confidence in natural language, provides a flexible and user-facing unce…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.2

FutureWorld: A Live Environment for Training Predictive Agents with Real-World Outcome Rewards

2026-04-29 · Zhixin Han, Yanzhi Zhang, Chuyang Wei, Maohang Gao, Xiawei Yue, Kefei Chen, Yu Zhuang, Haoxiang Guan, Jiyan He, Jian Li, Yitong Duan, Yu Shi, Mengting Hu, Shuxin Zheng

General AI

Live future prediction refers to the task of making predictions about real-world events before they unfold. This task is increasingly studied using large language model-based agent systems, and it is important for building agents that can continually learn from real-world. Just as interactive environments have often dr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.2

AEGIS: A Holistic Benchmark for Evaluating Forensic Analysis of AI-Generated Academic Images

2026-04-30 · Bo Zhang, Tzu-Yen Ma, Zichen Tang, Junpeng Ding, Zirui Wang, Yizhuo Zhao, Peilin Gao, Zijie Xi, Zixin Ding, Haiyang Sun, Haocheng Gao, Yuan Liu, Liangjia Wang, Yiling Huang, Yujie Wang, Yuyue Zhang, Ronghui Xi, Yuanze Li, Jiacheng Liu, Zhongjun Yang, Haihong E

General AI

We introduce AEGIS, A holistic benchmark for Evaluating forensic analysis of AI-Generated academic ImageS. Compared to existing benchmarks, AEGIS features three key advances: (1) Domain-Specific Complexity: covering seven academic categories with 39 fine-grained subtypes, exposing intrinsic forensic difficulty, where e…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.2

SpecVQA: A Benchmark for Spectral Understanding and Visual Question Answering in Scientific Images

2026-04-30 · Jialu Shen, Han Lyu, Suyang Zhong, Hanzheng Li, Haoyi Tao, Nan Wang, Changhong Chen, Xi Fang

General AI

Spectra are a prevalent yet highly information-dense form of scientific imagery, presenting substantial challenges to multimodal large language models (MLLMs) due to their unstructured and domain-specific characteristics. Here we introduce SpecVQA, a professional scientific-image benchmark for evaluating multimodal mod…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.2

Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

2026-05-01 · Siyuan Huang, Xiaoye Qu, Yafu Li, Tong Zhu, Zefeng He, Muxin Fu, Daizong Liu, Wei-Long Zheng, Yu Cheng

General AI

While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Dilution" phenomenon, where the accumulation of textual history expands the attention partition function, causing visual attention to decay inversely with generated sequence lengt…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.2

FT-RAG: A Fine-grained Retrieval-Augmented Generation Framework for Complex Table Reasoning

2026-05-02 · Zebin Guo, Weidong Geng, Ruichen Mao

General AI

Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by grounding responses in external knowledge during inference. However, conventiona RAG systems under-perform on structured tabular data, largely due to coarse retrieval granularity and insufficient table semantic comprehension. To address these…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.0

You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translation

2026-05-06 · Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera, Stjepan Picek, Saraga Sakthidharan

Research Track A · General AI

The open-source ecosystem has accelerated the democratization of Large Language Models (LLMs) through the public distribution of specialized Low-Rank Adaptation (LoRA) modules. However, integrating these third-party adapters often induces catastrophic forgetting of the base model's foundational safety alignment. Restor…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.8

Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective

2026-03-15 · Mohamed Aghzal, Gregory J. Stein, Ziyu Yao

Research Track B · General AI

Large language model (LLM) web agents are increasingly used for web navigation but remain far from human reliability on realistic, long-horizon tasks. Existing evaluations focus primarily on end-to-end success, offering limited insight into where failures arise. We propose a hierarchical planning framework to analyze w…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.8

Demographic Fairness in Multimodal LLMs: A Benchmark of Gender and Ethnicity Bias in Face Verification

2026-03-26 · Ünsal Öztürk, Hatef Otroshi Shahreza, Sébastien Marcel

General AI

Multimodal Large Language Models (MLLMs) have recently been explored as face verification systems that determine whether two face images are of the same person. Unlike dedicated face recognition systems, MLLMs approach this task through visual prompting and rely on general visual and reasoning abilities. However, the d…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.8

ElephantBroker: A Knowledge-Grounded Cognitive Runtime for Trustworthy AI Agents

2026-03-26 · Cristian Lupascu, Alexandru Lupascu

Research Track A · General AI

Large Language Model based agents increasingly operate in high stakes, multi turn settings where factual grounding is critical, yet their memory systems typically rely on flat key value stores or plain vector retrieval with no mechanism to track the provenance or trustworthiness of stored knowledge. We present Elephant…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.8

Seeing with You: Perception-Reasoning Coevolution for Multimodal Reasoning

2026-03-30 · Ziqi Miao, Haonan Jia, Lijun Li, Chen Qian, Yuan Xiong, Wenting Yan, Jing Shao

General AI

Reinforcement learning with verifiable rewards (RLVR) has substantially enhanced the reasoning capabilities of multimodal large language models (MLLMs). However, existing RLVR approaches typically rely on outcome-driven optimization that updates both perception and reasoning using a shared reward based solely on the fi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.8

Hybrid Framework for Robotic Manipulation: Integrating Reinforcement Learning and Large Language Models

2026-03-31 · Md Saad, Sajjad Hussain, Mohd Suhaib

General AI

This paper introduces a new hybrid framework that combines Reinforcement Learning (RL) and Large Language Models (LLMs) to improve robotic manipulation tasks. By utilizing RL for accurate low-level control and LLMs for high level task planning and understanding of natural language, the proposed framework effectively co…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.8

Odysseys: Benchmarking Web Agents on Realistic Long Horizon Tasks

2026-04-27 · Lawrence Keunho Jang, Jing Yu Koh, Daniel Fried, Ruslan Salakhutdinov

Research Track B · General AI

Existing web agent benchmarks have largely converged on short, single-site tasks that frontier models are approaching saturation on. However, real world web use consists of long-horizon, multi-site workflows. Common web navigation tasks, such as comparing products across different domains, planning trips across multipl…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.8

Cited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agents

2026-05-07 · Hailey Onweller, Elias Lumer, Austin Huber, Pia Ramchandani, Vamse Kumar Subbiah, Corey Feld

General AI

Large language models (LLMs) power deep research agents that synthesize information from hundreds of web sources into cited reports, yet these citations cannot be reliably verified. Current approaches either trust models to self-cite accurately, risking bias, or employ retrieval-augmented generation (RAG) that does not…

Review
pending
Role
unreviewed
Read
now
huggingface Score 18.7

VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation

2026-04-23 · Qijun Han, Haoqin Tu, Zijun Wang, Haoyue Dai, Yiyang Zhou, Nancy Lau, Alvaro A. Cardenas, Yuhui Xu, Ran Xu, Caiming Xiong, Zeyu Zheng, Huaxiu Yao, Yuyin Zhou, Cihang Xie

Research Track B · General AI

Autonomous GUI agents face two fundamental challenges: early stopping, where agents prematurely declare success without verifiable evidence, and repetitive loops, where agents cycle through the same failing actions without recovery. We present VLAA-GUI, a modular GUI agentic framework built around three integrated comp…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.5

ColorBrowserAgent: Complex Long-Horizon Browser Agent with Adaptive Knowledge Evolution

2026-01-12 · Jihong Wang, Jiamu Zhou, Weiming Zhang, Weiwen Liu, Zhuosheng Zhang, Xingyu Lou, Weinan Zhang, Huarong Deng, Jun Wang

Research Track B · General AI

With the advancement of vision-language models, web automation has made significant progress. However, deploying autonomous agents in real-world settings remains challenging, primarily due to site heterogeneity, where generalist models lack domain-specific priors for diverse interfaces, and long-horizon instability, ch…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.5

Ares: Adaptive Reasoning Effort Selection for Efficient LLM Agents

2026-03-09 · Jingbo Yang, Bairu Hou, Wei Wei, Yujia Bao, Shiyu Chang

Research Track B · General AI

Modern agents powered by thinking LLMs achieve high accuracy through long chain-of-thought reasoning but incur substantial inference costs. While many LLMs now support configurable reasoning levels (e.g., high/medium/low), static strategies are often ineffective: using low-effort modes at every step leads to significan…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.5

Poison Once, Exploit Forever: Environment-Injected Memory Poisoning Attacks on Web Agents

2026-04-03 · Wei Zou, Mingwen Dong, Miguel Romero Calvo, Shuaichen Chang, Jiang Guo, Dongkyu Lee, Xing Niu, Xiaofei Ma, Yanjun Qi, Jiarong Jiang

Research Track B · General AI

Memory makes LLM-based web agents personalized, powerful, yet exploitable. By storing past interactions to personalize future tasks, agents inadvertently create a persistent attack surface that spans websites and sessions. While existing security research on memory assumes attackers can directly inject into memory stor…

Review
pending
Role
unreviewed
Read
now
huggingface Score 18.5

3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding

2026-04-09 · Makanjuola Ogunleye, Eman Abdelrahman, Ismini Lourentzou

General AI

Large multimodal models are increasingly used as the reasoning core of embodied agents operating in 3D environments, yet they remain prone to hallucinations that can produce unsafe and ungrounded decisions. Existing inference-time hallucination mitigation methods largely target 2D vision-language settings and do not tr…

Review
pending
Role
unreviewed
Read
now
huggingface Score 18.5

Towards Autonomous Mechanistic Reasoning in Virtual Cells

2026-04-14 · Yunhui Jang, Lu Zhu, Jake Fawkes, Alisandra Kaye Denton, Dominique Beaini, Emmanuel Noutahi

General AI

Large language models (LLMs) have recently gained significant attention as a promising approach to accelerate scientific discovery. However, their application in open-ended scientific domains such as biology remains limited, primarily due to the lack of factually grounded and actionable explanations. To address this, w…

Review
pending
Role
unreviewed
Read
now
huggingface Score 18.5

Mind's Eye: A Benchmark of Visual Abstraction, Transformation and Composition for Multimodal LLMs

2026-04-17 · Rohit Sinha, Aditya Kanade, Sai Srinivas Kancheti, Vineeth N Balasubramanian, Tanuja Ganu

General AI

Multimodal large language models (MLLMs) have achieved impressive progress on vision language benchmarks, yet their capacity for visual cognitive and visuospatial reasoning remains less understood. We introduce "Mind's Eye", a multiple-choice benchmark of eight visuo-cognitive tasks inspired by classic human intelligen…

Review
pending
Role
unreviewed
Read
now
huggingface Score 18.5

Exploring Spatial Intelligence from a Generative Perspective

2026-04-22 · Muzhi Zhu, Shunyao Jiang, Huanyi Zheng, Zekai Luo, Hao Zhong, Anzhou Li, Kaijun Wang, Jintao Rong, Yang Liu, Hao Chen, Tao Lin, Chunhua Shen

General AI

Spatial intelligence is essential for multimodal large language models, yet current benchmarks largely assess it only from an understanding perspective. We ask whether modern generative or unified multimodal models also possess generative spatial intelligence (GSI), the ability to respect and manipulate 3D spatial cons…

Review
pending
Role
unreviewed
Read
now
huggingface Score 18.5

WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning

2026-04-22 · Juyong Jiang, Chenglin Cai, Chansung Park, Jiasi Shen, Sunghun Kim, Jianguo Li, Yue Wang

General AI

While Large Language Models (LLMs) excel at function-level code generation, project-level tasks such as generating functional and visually aesthetic multi-page websites remain highly challenging. Existing works are often limited to single-page static websites, while agentic frameworks typically rely on multi-turn execu…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.4

Forager: a lightweight testbed for continual learning with partial observability in RL

2026-05-01 · Steven Tang, Xinze Xiong, Anna Hakhverdyan, Andrew Patterson, Jacob Adkins, Jiamin He, Esraa Elelimy, Parham Mohammad Panahi, Martha White, Adam White

Research Track A · General AI

In continual reinforcement learning (CRL), good performance requires never-ending learning, acting, and exploration in a big, partially observable world. Most CRL experiments have focused on loss of plasticity -- the inability to keep learning -- in one-off experiments where some unobservable non-stationarity is added …

Review
pending
Role
unreviewed
Read
now
huggingface Score 18.4

Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation

2026-05-01 · Ziwen Zhao, Menglin Yang

General AI

Retrieval-augmented generation (RAG) enhances large language models with external knowledge, and tree-based RAG organizes documents into hierarchical indexes to support queries at multiple granularities. However, existing Tree-RAG methods designed for single-document retrieval face critical challenges in scaling to cro…

Review
pending
Role
unreviewed
Read
now
huggingface Score 18.4

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

2026-05-03 · Zhuofeng Li, Haoxiang Zhang, Cong Wei, Pan Lu, Ping Nie, Yi Lu, Yuyang Bai, Shangbin Feng, Hangxiao Zhu, Ming Zhong, Yuyu Zhang, Jianwen Xie, Yejin Choi, James Zou, Jiawei Han, Wenhu Chen, Jimmy Lin, Dongfu Jiang, Yu Zhang

General AI

Modern retrieval systems, whether lexical or semantic, expose a corpus through a fixed similarity interface that compresses access into a single top-k retrieval step before reasoning. This abstraction is efficient, but for agentic search, it becomes a bottleneck: exact lexical constraints, sparse clue conjunctions, loc…

Review
pending
Role
unreviewed
Read
now
huggingface Score 18.4

PhysicianBench: Evaluating LLM Agents in Real-World EHR Environments

2026-05-04 · Ruoqi Liu, Imran Q. Mohiuddin, Austin J. Schoeffler, Kavita Renduchintala, Ashwin Nayak, Prasantha L. Vemu, Shivam C. Vedak, Kameron C. Black, John L. Havlik, Isaac Ogunmola, Stephen P. Ma, Roopa Dhatt, Jonathan H. Chen

General AI

We introduce PhysicianBench, a benchmark for evaluating LLM agents on physician tasks grounded in real clinical setting within electronic health record (EHR) environments. Existing medical agent benchmarks primarily focus on static knowledge recall, single-step atomic actions, or action intent without verifiable execut…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.3

CodaRAG: Connecting the Dots with Associativity Inspired by Complementary Learning

2026-04-12 · Cheng-Yen Li, Xuanjun Chen, Claire Lin, Wei-Yu Chen, Wenhua Nie, Hung-Yi Lee, Jyh-Shing Roger Jang

Research Track A · General AI

Large Language Models (LLMs) struggle with knowledge-intensive tasks due to hallucinations and fragmented reasoning over dispersed information. While Retrieval-Augmented Generation (RAG) grounds generation in external sources, existing methods often treat evidence as isolated units, failing to reconstruct the logical c…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.3

Solving Physics Olympiad via Reinforcement Learning on Physics Simulators

2026-04-13 · Mihir Prabhudesai, Aryan Satpathy, Yangmin Li, Zheyang Qin, Nikash Bhardwaj, Amir Zadeh, Chuan Li, Katerina Fragkiadaki, Deepak Pathak

General AI

We have witnessed remarkable advances in LLM reasoning capabilities with the advent of DeepSeek-R1. However, much of this progress has been fueled by the abundance of internet question-answer (QA) pairs, a major bottleneck going forward, since such data is limited in scale and concentrated mainly in domains like mathem…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.3

Synthius-Mem: Brain-Inspired Hallucination-Resistant Persona Memory Achieving 94.4% Memory Accuracy and 99.6% Adversarial Robustness on LoCoMo

2026-04-13 · Artem Gadzhiev, Andrew Kislov

General AI

Providing AI agents with reliable long-term memory that does not hallucinate remains an open problem. Current approaches to memory for LLM agents -- sliding windows, summarization, embedding-based RAG, and flat fact extraction -- each reduce token cost but introduce catastrophic information loss, semantic drift, or unc…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.3

Boosting Visual Instruction Tuning with Self-Supervised Guidance

2026-04-14 · Sophia Sirko-Galouchenko, Monika Wysoczanska, Andrei Bursuc, Nicolas Thome, Spyros Gidaris

General AI

Multimodal large language models (MLLMs) perform well on many vision-language tasks but often struggle with vision-centric problems that require fine-grained visual reasoning. Recent evidence suggests that this limitation arises not from weak visual representations, but from under-utilization of visual information duri…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.3

WebAgentGuard: A Reasoning-Driven Guard Model for Detecting Prompt Injection Attacks in Web Agents

2026-04-14 · Yulin Chen, Tri Cao, Haoran Li, Yue Liu, Yibo Li, Yufei He, Le Minh Khoi, Yangqiu Song, Shuicheng Yan, Bryan Hooi

Research Track B · General AI

Web agents powered by vision-language models (VLMs) enable autonomous interaction with web environments by perceiving and acting on both visual and textual webpage content to accomplish user-specified tasks. However, they are highly vulnerable to prompt injection attacks, where adversarial instructions embedded in HTML…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.3

AutoPKG: An Automated Framework for Dynamic E-commerce Product-Attribute Knowledge Graph Construction

2026-04-18 · Pollawat Hongwimol, Haoning Shang, Chutong Wang, Zhichao Wan, Yi Gao, Yuanming Li, Lin Gui, Wenhao Sun, Cheng Yu

Research Track A · General AI

Product attribute extraction in e-commerce is bottlenecked by ontologies that are inconsistent, incomplete, and costly to maintain. We present AutoPKG, a multi-agent Large Language Model (LLM) framework that automatically constructs a Product-attribute Knowledge Graph (PKG) from multimodal product content. AutoPKG indu…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.3

Benchmarking System Dynamics AI Assistants: Cloud Versus Local LLMs on CLD Extraction and Discussion

2026-04-20 · Terry Leitch

General AI

We present a systematic evaluation of large language model families -- spanning both proprietary cloud APIs and locally-hosted open-source models -- on two purpose-built benchmarks for System Dynamics AI assistance: the \textbf{CLD Leaderboard} (53 tests, structured causal loop diagram extraction) and the \textbf{Discu…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.3

MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

2026-04-20 · Shaden Alshammari, Kevin Wen, Abrar Zainal, Mark Hamilton, Navid Safaei, Sultan Albarakati, William T. Freeman, Antonio Torralba

General AI

Mathematical problem solving remains a challenging test of reasoning for large language and multimodal models, yet existing benchmarks are limited in size, language coverage, and task diversity. We introduce MathNet, a high-quality, large-scale, multimodal, and multilingual dataset of Olympiad-level math problems toget…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.3

WorldDB: A Vector Graph-of-Worlds Memory Engine with Ontology-Aware Write-Time Reconciliation

2026-04-20 · Harish Santhanalakshmi Ganesan

General AI

Persistent memory is the bottleneck separating stateless chatbots from long-running agentic systems. Retrieval-augmented generation (RAG) over flat vector stores fragments facts into chunks, loses cross-session identity, and has no first-class notion of supersession or contradiction. Recent bitemporal knowledge-graph s…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.3

Time Series Augmented Generation for Financial Applications

2026-04-21 · Anton Kolonin, Alexey Glushchenko, Evgeny Bochkov, Abhishek Saxena

General AI

Evaluating the reasoning capabilities of Large Language Models (LLMs) for complex, quantitative financial tasks is a critical and unsolved challenge. Standard benchmarks often fail to isolate an agent's core ability to parse queries and orchestrate computations. To address this, we introduce a novel evaluation methodol…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.3

Unveiling Fine-Grained Visual Traces: Evaluating Multimodal Interleaved Reasoning Chains in Multimodal STEM Tasks

2026-04-21 · Jing Jin, Hao Liu, Yan Bai, Yihang Lou, Zhenke Wang, Tianrun Yuan, Juntong Chen, Yongkang Zhu, Fanhu Zeng, Xuanyu Zhu, Yige Xu

General AI

Multimodal large language models (MLLMs) have shown promising reasoning abilities, yet evaluating their performance in specialized domains remains challenging. STEM reasoning is a particularly valuable testbed because it provides highly verifiable feedback, but existing benchmarks often permit unimodal shortcuts due to…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.3

Fill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Models

2026-05-12 · Yanting Miao, Yutao Sun, Dexin Wang, Mengyu Zhou, Pascal Poupart, Lei Lv, Qi Zhao, Li Wang, Hao Li, Xiaoxi Jiang, Guanjun Jiang

General AI

Visual latent reasoning lets a multimodal large language model (MLLM) create intermediate visual evidence as continuous tokens, avoiding external tools or image generators. However, existing methods usually follow an output-as-input latent paradigm and yield unstable gains. We identify evidence for a feature-space mism…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.3

TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection

2026-05-12 · Tom Sander, Hongyan Chang, Tomáš Souček, Tuan Tran, Valeriu Lacatusu, Sylvestre-Alvise Rebuffi, Alexandre Mourachko, Surya Parimi, Christophe Ropers, Rashel Moritz, Vanessa Stark, Hady Elsahar, Pierre Fernandez

General AI

We introduce TextSeal, a state-of-the-art watermark for large language models. Building on Gumbel-max sampling, TextSeal introduces dual-key generation to restore output diversity, along with entropy-weighted scoring and multi-region localization for improved detection. It supports serving optimizations such as specula…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.2

Contextual Agentic Memory is a Memo, Not True Memory

2026-04-30 · Binyan Xu, Xilin Dai, Kehuan Zhang

Research Track A · General AI

Current agentic memory systems (vector stores, retrieval-augmented generation, scratchpads, and context-window management) do not implement memory: they implement lookup. We argue that treating lookup as memory is a category error with provable consequences for agent capability, long-term learning, and security. Retrie…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.2

PRISM: Pre-alignment via Black-box On-policy Distillation for Multimodal Reinforcement Learning

2026-04-30 · Sudong Wang, Weiquan Huang, Xiaomin Yu, Zuhao Yang, Hehai Lin, Keming Wu, Chaojun Xiao, Chen Chen, Wenxuan Wang, Beier Zhu, Yunjian Zhang, Chengwei Qin

General AI

The standard post-training recipe for large multimodal models (LMMs) applies supervised fine-tuning (SFT) on curated demonstrations followed by reinforcement learning with verifiable rewards (RLVR). However, SFT introduces distributional drift that neither preserves the model's original capabilities nor faithfully matc…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.2

Enhanced LLM Reasoning by Optimizing Reward Functions with Search-Driven Reinforcement Learning

2026-05-03 · Arash Ahmadi, Sarah Sharif, Yaser, Banad

General AI

Mathematical reasoning is a key benchmark for large language models. Reinforcement learning is a standard post-training mechanism for improving the reasoning capabilities of large language models, yet performance remains sensitive to the design of the reward function that drives policy optimization. This paper introduc…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.2

Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces

2026-05-04 · Chenchen Zhang

General AI

As large language model (LLM) agents evolve from isolated tool users into coordinated teams, reinforcement learning (RL) must optimize not only individual actions but also how work is spawned, delegated, communicated, aggregated, and stopped. This paper studies RL for LLM-based multi-agent systems through orchestration…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.0

All-day Multi-scenes Lifelong Vision-and-Language Navigation with Tucker Adaptation

2026-03-15 · Xudong Wang, Gan Li, Zhiyu Liu, Yao Wang, Lianqing Liu, Zhi Han

Research Track A · General AI

Deploying vision-and-language navigation (VLN) agents requires adaptation across diverse scenes and environments, but fine-tuning on a specific scenario often causes catastrophic forgetting in others, which severely limits flexible long-term deployment. We formalize this challenge as the all-day multi-scenes lifelong V…

Review
pending
Role
unreviewed
Read
now
huggingface Score 18.0

Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math

2026-03-26 · Dingjie Song, Tianlong Xu, Yi-Fan Zhang, Hang Li, Zhiling Yan, Xing Fan, Haoyang Li, Lichao Sun, Qingsong Wen

General AI

Assessing student handwritten scratchwork is crucial for personalized educational feedback but presents unique challenges due to diverse handwriting, complex layouts, and varied problem-solving approaches. Existing educational NLP primarily focuses on textual responses and neglects the complexity and multimodality inhe…

Review
pending
Role
unreviewed
Read
now
huggingface Score 18.0

MuSEAgent: A Multimodal Reasoning Agent with Stateful Experiences

2026-03-29 · Shijian Wang, Jiarui Jin, Runhao Fu, Zexuan Yan, Xingjian Wang, Mengkang Hu, Eric Wang, Xiaoxi Li, Kangning Zhang, Li Yao, Wenxiang Jiao, Xuelian Cheng, Yuan Lu, Zongyuan Ge

General AI

Research agents have recently achieved significant progress in information seeking and synthesis across heterogeneous textual and visual sources. In this paper, we introduce MuSEAgent, a multimodal reasoning agent that enhances decision-making by extending the capabilities of research agents to discover and leverage st…

Review
pending
Role
unreviewed
Read
now
huggingface Score 18.0

PRBench: End-to-end Paper Reproduction in Physics Research

2026-03-29 · Shi Qiu, Junyi Deng, Yiwei Deng, Haoran Dong, Jieyu Fu, Mao Li, Zeyu Li, Zhaolong Zhang, Huiwen Zheng, Leidong Bao, Anqi Lv, Zihan Mo, Yadi Niu, Yiyang Peng, Yu Tian, Yili Wang, Ziyu Wang, Zi-Yu Wang, Jiashen Wei, Liuheng Wu, Aoran Xue, Leyi Yang, Guanglu Yuan, Xiarui Zhan, Jingjun Zhang, Zifan Zheng, Pengfei Liu, Linrui Zhen, Kaiyang Li, Qichang Li, Ziheng Zhou, Guo-En Nian, Yunwei Xiao, Qing-Hong Cao, Linjie Dai, Xu Feng, Peng Gao, Ying Gu, Chang Liu, Jia Liu, Ming-xing Luo, Yan-Qing Ma, Liang-You Peng, Huichao Song, Shufeng Wang, Chenxu Wang, Tao Wang, Yi-Nan Wang, Chengyin Wu, Pengwei Zhao, Hua Xing Zhu

General AI

AI agents powered by large language models exhibit strong reasoning and problem-solving capabilities, enabling them to assist scientific research tasks such as formula derivation and code generation. However, whether these agents can reliably perform end-to-end reproduction from real scientific papers remains an open q…

Review
pending
Role
unreviewed
Read
now
huggingface Score 18.0

Balanced Aggregation: Understanding and Fixing Aggregation Bias in GRPO

2026-04-14 · Zhiyuan Zeng, Jiameng Huang, Zhangyue Yin, Jiashuo Liu, Ziniu Li, Bingrui Li, Yuhao Wu, Yining Zheng, Ge Zhang, Wenhao Huang, Xipeng Qiu

General AI

Reinforcement learning with verifiable rewards (RLVR) has become a central paradigm for improving reasoning and code generation in large language models, and GRPO-style training is widely adopted for its simplicity and effectiveness. However, an important design choice remains underexplored: how token-level policy grad…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.0

MCPO: Mastery-Consolidated Policy Optimization for Large Reasoning Models

2026-04-18 · Zhaokang Liao, Yingguo Gao, Yi Yang, Yongheng Hu, Jingting Ding

Research Track A · General AI

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising approach to improve the reasoning abilities of Large Language Models (LLMs). Among RLVR algorithms, Group Relative Policy Optimization (GRPO) and its variants have demonstrated strong performance and high training efficiency. However, GRPO…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.0

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

2026-04-24 · Meng Chu, Xuan Billy Zhang, Kevin Qinghong Lin, Lingdong Kong, Jize Zhang, Teng Tu, Weijian Ma, Ziqi Huang, Senqiao Yang, Wei Huang, Yeying Jin, Zhefan Rao, Jinhui Ye, Xinyu Lin, Xichen Zhang, Qisheng Hu, Shuai Yang, Leyang Shen, Wei Chow, Yifei Dong, Fengyi Wu, Quanyu Long, Bin Xia, Shaozuo Yu, Mingkang Zhu, Wenhu Zhang, Jiehui Huang, Haokun Gui, Haoxuan Che, Long Chen, Qifeng Chen, Wenxuan Zhang, Wenya Wang, Xiaojuan Qi, Yang Deng, Yanwei Li, Mike Zheng Shou, Zhi-Qi Cheng, See-Kiong Ng, Ziwei Liu, Philip Torr, Jiaya Jia

Research Track B · General AI

As AI systems move from generating text to accomplishing goals through sustained interaction, the ability to model environment dynamics becomes a central bottleneck. Agents that manipulate objects, navigate software, coordinate with others, or design experiments require predictive environment models, yet the term world…

Review
pending
Role
unreviewed
Read
now
huggingface Score 18.0

PAAC: Privacy-Aware Agentic Device-Cloud Collaboration

2026-05-09 · Liangqi Yuan, Wenzhi Fang, Shiqiang Wang, Christopher G. Brinton

General AI

Large language model (LLM) agents face a structural tension: cloud agents provide strong reasoning but expose user data, while on-device agents preserve privacy at the cost of overall capability. Existing device-cloud designs treat this boundary as a compute split rather than a trust boundary suited to agentic workload…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.0

ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents

2026-05-12 · Xuhao Hu, Xi Zhang, Haiyang Xu, Kyle Qiao, Jingyi Yang, Xuanjing Huang, Jing Shao, Ming Yan, Jieping Ye

Research Track B · General AI

Computer Use Agents (CUAs) can act through both atomic GUI actions, such as click and type, and high-level tool calls, such as API-based file operations, but this hybrid action space often leaves them uncertain about when to continue with GUI actions or switch to tools, leading to suboptimal execution paths. This diffi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.0

Unlocking Compositional Generalization in Continual Few-Shot Learning

2026-05-12 · Phu-Quy Nguyen-Lam, Phu-Hoa Pham, Dao Sy Duy Minh, Chi-Nguyen Tran, Huynh Trung Kiet, Long Tran-Thanh

Research Track A · General AI

Object-centric representations promise a key property for few-shot learning: Rather than treating a scene as a single unit, a model can decompose it into individual object-level parts that can be matched and compared across different concepts. In practice, this potential is rarely realized. Continual learners either co…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.9

MPCS: Neuroplastic Continual Learning via Multi-Component Plasticity and Topology-Aware EWC

2026-05-04 · Joern Hentsch

Research Track A · General AI

Continual learning systems face a fundamental tension between plasticity -- acquiring new knowledge -- and stability -- retaining prior knowledge. We introduce MPCS (Multi-Plasticity Continual System), a neuroplastic architecture that integrates eleven complementary mechanisms: task-driven neurogenesis, Fourier-encoded…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.8

Colon-Bench: An Agentic Workflow for Scalable Dense Lesion Annotation in Full-Procedure Colonoscopy Videos

2026-03-26 · Abdullah Hamdi, Changchun Yang, Xin Gao

General AI

Early screening via colonoscopy is critical for colon cancer prevention, yet developing robust AI systems for this domain is hindered by the lack of densely annotated, long-sequence video datasets. Existing datasets predominantly focus on single-class polyp detection and lack the rich spatial, temporal, and linguistic …

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.8

Is Mathematical Problem-Solving Expertise in Large Language Models Associated with Assessment Performance?

2026-03-26 · Liang Zhang, Yu Fu, Xinyi Jin

General AI

Large Language Models (LLMs) are increasingly used in math education not only as problem solvers but also as assessors of learners' reasoning. However, it remains unclear whether stronger math problem-solving ability is associated with stronger step-level assessment performance. This study examines that relationship us…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.8

LanteRn: Latent Visual Structured Reasoning

2026-03-26 · André G. Viveiros, Nuno Gonçalves, Matthias Lindemann, André Martins

General AI

While language reasoning models excel in many tasks, visual reasoning remains challenging for current large multimodal models (LMMs). As a result, most LMMs default to verbalizing perceptual content into text, a strong limitation for tasks requiring fine-grained spatial and visual understanding. While recent approaches…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.8

SlotVTG: Object-Centric Adapter for Generalizable Video Temporal Grounding

2026-03-26 · Jiwook Han, Geo Ahn, Youngrae Kim, Jinwoo Choi

General AI

Multimodal Large Language Models (MLLMs) have shown strong performance on Video Temporal Grounding (VTG). However, their coarse recognition capabilities are insufficient for fine-grained temporal understanding, making task-specific fine-tuning indispensable. This fine-tuning causes models to memorize dataset-specific s…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.8

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

2026-03-26 · Jingwei Ni, Yihao Liu, Xinpeng Liu, Yutao Sun, Mengyu Zhou, Pengyu Cheng, Dexin Wang, Xiaoxi Jiang, Guanjun Jiang

General AI

Equipping Large Language Model (LLM) agents with domain-specific skills is critical for tackling complex tasks. Yet, manual authoring creates a severe scalability bottleneck. Conversely, automated skill generation often yields fragile or fragmented results because it either relies on shallow parametric knowledge or seq…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.8

ResAdapt: Adaptive Resolution for Efficient Multimodal Reasoning

2026-03-30 · Huanxuan Liao, Zhongtao Jiang, Yupu Hao, Yuqiao Tan, Shizhu He, Jun Zhao, Kun Xu, Kang Liu

General AI

Multimodal Large Language Models (MLLMs) achieve stronger visual understanding by scaling input fidelity, yet the resulting visual token growth makes jointly sustaining high spatial resolution and long temporal context prohibitive. We argue that the bottleneck lies not in how post-encoding representations are compresse…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.8

EC-Bench: Enumeration and Counting Benchmark for Ultra-Long Videos

2026-03-31 · Fumihiko Tsuchiya, Taiki Miyanishi, Mahiro Ukai, Nakamasa Inoue, Shuhei Kurita, Yusuke Iwasawa, Yutaka Matsuo

General AI

Counting in long videos remains a fundamental yet underexplored challenge in computer vision. Real-world recordings often span tens of minutes or longer and contain sparse, diverse events, making long-range temporal reasoning particularly difficult. However, most existing video counting benchmarks focus on short clips …

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.8

DIRECT: Video Mashup Creation via Hierarchical Multi-Agent Planning and Intent-Guided Editing

2026-04-06 · Ke Li, Maoliang Li, Jialiang Chen, Jiayu Chen, Zihao Zheng, Shaoqi Wang, Xiang Chen

General AI

Video mashup creation represents a complex video editing paradigm that recomposes existing footage to craft engaging audio-visual experiences, demanding intricate orchestration across semantic, visual, and auditory dimensions and multiple levels. However, existing automated editing frameworks often overlook the cross-l…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.8

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

2026-04-06 · Shuai Liu, Shulin Tian, Kairui Hu, Yuhao Dong, Zhe Yang, Bo Li, Jingkang Yang, Chen Change Loy, Ziwei Liu

General AI

Coworking AI agents operating within local file systems are rapidly emerging as a paradigm in human-AI interaction; however, effective personalization remains limited by severe data constraints, as strict privacy barriers and the difficulty of jointly collecting multimodal real-world traces prevent scalable training an…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.8

ACE-Bench: Agent Configurable Evaluation with Scalable Horizons and Controllable Difficulty under Lightweight Environments

2026-04-07 · Wang Yang, Chaoda Song, Xinpeng Li, Debargha Ganguly, Chuang Ma, Shouren Wang, Zhihao Dou, Yuli Zhou, Vipin Chaudhary, Xiaotian Han

General AI

Existing Agent benchmarks suffer from two critical limitations: high environment interaction overhead (up to 41\% of total evaluation time) and imbalanced task horizon and difficulty distributions that make aggregate scores unreliable. To address these issues, we propose ACE-Bench built around a unified grid-based plan…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.8

Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents

2026-04-07 · Bowen Ye, Rang Li, Qibin Yang, Yuanxin Liu, Linli Yao, Hanglong Lv, Zhihui Xie, Chenxin An, Lei Li, Lingpeng Kong, Qi Liu, Zhifang Sui, Tong Yang

General AI

Large language models are increasingly deployed as autonomous agents executing multi-step workflows in real-world software environments. However, existing agent benchmarks suffer from three critical limitations: (1) trajectory-opaque grading that checks only final outputs, (2) underspecified safety and robustness evalu…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.8

Scientific Graphics Program Synthesis via Dual Self-Consistency Reinforcement Learning

2026-04-07 · Juekai Lin, Yun Zhu, Honglin Lin, Sijing Li, Tianwei Lin, Zheng Liu, Xiaoyang Wang, Wenqiao Zhang, Lijun Wu

General AI

Graphics Program Synthesis is pivotal for interpreting and editing visual data, effectively facilitating the reverse-engineering of static visuals into editable TikZ code. While TikZ is the de facto standard for scientific schematics due to its programmatic flexibility, its requirement for rigorous spatial precision pr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.8

MolmoWeb: Open Visual Web Agent and Open Data for the Open Web

2026-04-09 · Tanmay Gupta, Piper Wolters, Zixian Ma, Peter Sushko, Rock Yuren Pang, Diego Llanes, Yue Yang, Taira Anderson, Boyuan Zheng, Zhongzheng Ren, Harsh Trivedi, Taylor Blanton, Caleb Ouellette, Winson Han, Ali Farhadi, Ranjay Krishna

Research Track B · General AI

Web agents--autonomous systems that navigate and execute tasks on the web on behalf of users--have the potential to transform how people interact with the digital world. However, the most capable web agents today rely on proprietary models with undisclosed training data and recipes, limiting scientific understanding, r…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.8

Visually-grounded Humanoid Agents

2026-04-09 · Hang Ye, Xiaoxuan Ma, Fan Lu, Wayne Wu, Kwan-Yee Lin, Yizhou Wang

General AI

Digital human generation has been studied for decades and supports a wide range of real-world applications. However, most existing systems are passively animated, relying on privileged state or scripted control, which limits scalability to novel environments. We instead ask: how can digital humans actively behave using…

Review
pending
Role
unreviewed
Read
now
huggingface Score 17.5

Web Retrieval-Aware Chunking (W-RAC) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems

2026-01-08 · Uday Allu, Sonu Kedia, Tanmay Odapally, Biddwan Ahmed

General AI

Retrieval-Augmented Generation (RAG) systems critically depend on effective document chunking strategies to balance retrieval quality, latency, and operational cost. Traditional chunking approaches, such as fixed-size, rule-based, or fully agentic chunking, often suffer from high token consumption, redundant text gener…

Review
pending
Role
unreviewed
Read
now
huggingface Score 17.5

You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass

2026-04-13 · Yinuo Yang, Zixian Ma, Manasi Ganti, Jieyu Zhang, Ranjay Krishna

General AI

We present a discriminative multimodal reward model that scores all candidate responses in a single forward pass. Conventional discriminative reward models evaluate each response independently, requiring multiple forward passes, one for each potential response. Our approach concatenates multiple responses with separato…

Review
pending
Role
unreviewed
Read
now
huggingface Score 17.5

Masked by Consensus: Disentangling Privileged Knowledge in LLM Correctness

2026-04-14 · Tomer Ashuach, Liat Ein-Dor, Shai Gretz, Yoav Katz, Yonatan Belinkov

General AI

Humans use introspection to evaluate their understanding through private internal states inaccessible to external observers. We investigate whether large language models possess similar privileged knowledge about answer correctness, information unavailable through external observation. We train correctness classifiers …

Review
pending
Role
unreviewed
Read
now
huggingface Score 17.5

AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization

2026-04-15 · Genghan Zhang, Shaowei Zhu, Anjiang Wei, Zhenyu Song, Allen Nie, Zhen Jia, Nandita Vijaykumar, Yida Wang, Kunle Olukotun

General AI

We present AccelOpt, a self-improving large language model (LLM) agentic system that autonomously optimizes kernels for emerging AI acclerators, eliminating the need for expert-provided hardware-specific optimization knowledge. AccelOpt explores the kernel optimization space through iterative generation, informed by an…

Review
pending
Role
unreviewed
Read
now
huggingface Score 17.5

GTA-2: Benchmarking General Tool Agents from Atomic Tool-Use to Open-Ended Workflows

2026-04-17 · Jize Wang, Xuanxuan Liu, Yining Li, Songyang Zhang, Yijun Wang, Zifei Shan, Xinyi Le, Cailian Chen, Xinping Guan, Dacheng Tao

General AI

The development of general-purpose agents requires a shift from executing simple instructions to completing complex, real-world productivity workflows. However, current tool-use benchmarks remain misaligned with real-world requirements, relying on AI-generated queries, dummy tools, and limited system-level coordination…

Review
pending
Role
unreviewed
Read
now
huggingface Score 17.5

CreativeGame:Toward Mechanic-Aware Creative Game Generation

2026-04-21 · Hongnan Ma, Han Wang, Shenglin Wang, Tieyue Yin, Yiwei Shi, Yucong Huang, Yingtian Zou, Muning Wen, Mengyue Yang

General AI

Large language models can generate plausible game code, but turning this capability into iterative creative improvement remains difficult. In practice, single-shot generation often produces brittle runtime behavior, weak accumulation of experience across versions, and creativity scores that are too subjective to serve …

Review
pending
Role
unreviewed
Read
now
huggingface Score 17.5

MMCORE: MultiModal COnnection with Representation Aligned Latent Embeddings

2026-04-21 · Zijie Li, Yichun Shi, Jingxiang Sun, Ye Wang, Yixuan Huang, Zhiyao Guo, Xiaochen Lian, Peihao Zhu, Yu Tian, Zhonghua Zhai, Peng Wang

General AI

We present MMCORE, a unified framework designed for multimodal image generation and editing. MMCORE leverages a pre-trained Vision-Language Model (VLM) to predict semantic visual embeddings via learnable query tokens, which subsequently serve as conditioning signals for a diffusion model. This streamlined design effect…

Review
pending
Role
unreviewed
Read
now
huggingface Score 17.5

SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution

2026-04-21 · Xiachong Feng, Yi Jiang, Xiaocheng Feng, Deyi Yin, Libo Qin, Yangfan Ye, Lei Huang, Weitao Ma, Yuxuan Gu, Chonghan Qin, Bing Qin, Lingpeng Kong

General AI

Social intelligence, the ability to navigate complex interpersonal interactions, presents a fundamental challenge for language agents. Training such agents via reinforcement learning requires solving the credit assignment problem: determining how individual utterances contribute to multi-turn dialogue outcomes. Existin…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.5

ZenBrain: A Neuroscience-Inspired 7-Layer Memory Architecture for Autonomous AI Systems

2026-04-26 · Alexander Bering

Research Track A · General AI

Despite a century of empirical memory research, existing AI agent memory systems rely on system-engineering metaphors (virtual-memory paging, flat LLM storage, Zettelkasten notes), none integrating principles of consolidation, forgetting, and reconsolidation. We present ZenBrain, a multi-layer memory architecture integ…

Review
pending
Role
unreviewed
Read
now
huggingface Score 17.5

Improving Vision-language Models with Perception-centric Process Reward Models

2026-04-27 · Yingqian Min, Kun Zhou, Yifan Li, Yuhuan Wu, Han Peng, Yifan Du, Wayne Xin Zhao, Min Yang, Ji-Rong Wen

General AI

Recent advancements in reinforcement learning with verifiable rewards (RLVR) have significantly improved the complex reasoning ability of vision-language models (VLMs). However, its outcome-level supervision is too coarse to diagnose and correct errors within the reasoning chain. To this end, we propose Perceval, a pro…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.5

Weblica: Scalable and Reproducible Training Environments for Visual Web Agents

2026-05-07 · Oğuzhan Fatih Kar, Roman Bachmann, Yuanzheng Gong, Anders Boesen Lindbo Larsen, Afshin Dehghan

Research Track B · General AI

The web is complex, open-ended, and constantly changing, making it challenging to scale training data for visual web agents. Existing data collection attempts remain limited to offline trajectories for supervised fine-tuning or a handful of simulated environments for RL training, thus failing to capture web diversity. …

Review
pending
Role
unreviewed
Read
now
huggingface Score 17.4

Step-level Optimization for Efficient Computer-use Agents

2026-04-29 · Jinbiao Wei, Kangqi Ni, Yilun Zhao, Guo Gan, Arman Cohan

General AI

Computer-use agents provide a promising path toward general software automation because they can interact directly with arbitrary graphical user interfaces instead of relying on brittle, application-specific integrations. Despite recent advances in benchmark performance, strong computer-use agents remain expensive and …

Review
pending
Role
unreviewed
Read
now
huggingface Score 17.4

InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?

2026-04-30 · Qiyao Wang, Haoran Hu, Longze Chen, Hongbo Wang, Hamid Alinejad-Rokny, Yuan Lin, Min Yang

General AI

With the advancement of multimodal large language models (MLLMs) and coding agents, the website development has shifted from manual programming to agent-based project-level code synthesis. Existing benchmarks rely on idealized assumptions, especially for well-structured, information-rich inputs and static execution set…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.4

Sentinel-VLA: A Metacognitive VLA Model with Active Status Monitoring for Dynamic Reasoning and Error Recovery

2026-05-02 · Wenhao Li, Xiu Su, Yichao Cao, Hongyan Xu, Xiaobo Xia, Shan You, Yi Chen, Chang Xu

Research Track A · General AI

Vision-language-action (VLA) models have advanced the field of embodied manipulation by harnessing broad world knowledge and strong generalization. However, current VLA models still face several key challenges, including limited reasoning capability, lack of status monitoring, and difficulty in self-correction. In this…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.3

Towards Long-horizon Agentic Multimodal Search

2026-04-14 · Yifan Du, Zikang Liu, Jinbiao Peng, Jie Wu, Junyi Li, Jinyang Li, Wayne Xin Zhao, Ji-Rong Wen

General AI

Multimodal deep search agents have shown great potential in solving complex tasks by iteratively collecting textual and visual evidence. However, managing the heterogeneous information and high token costs associated with multimodal inputs over long horizons remains a critical challenge, as existing methods often suffe…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.3

OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

2026-04-20 · Jinghui Lu, Jiayi Guan, Zhijian Huang, Jinlong Li, Guang Li, Lingdong Kong, Yingyan Li, Han Wang, Shaoqing Xu, Yuechen Luo, Fang Li, Chenxu Dang, Junli Wang, Tao Xu, Jing Wu, Jianhua Wu, Xiaoshuai Hao, Wen Zhang, Tianyi Jiang, Lingfeng Zhang, Lei Zhou, Yingbo Tang, Jie Wang, Yinfeng Gao, Xizhou Bu, Haochen Tian, Yihang Qiu, Feiyang Jia, Lin Liu, Yigu Ge, Hanbing Li, Yuannan Shen, Jianwei Cui, Hongwei Xie, Bing Wang, Haiyang Sun, Jingwei Zhao, Jiahui Huang, Pei Liu, Zeyu Zhu, Yuncheng Jiang, Zibin Guo, Chuhong Gong, Hanchao Leng, Kun Ma, Naiyang Wang, Guang Chen, Kuiyuan Yang, Hangjun Ye, Long Chen

General AI

Chain-of-Thought (CoT) reasoning has become a powerful driver of trajectory prediction in VLA-based autonomous driving, yet its autoregressive nature imposes a latency cost that is prohibitive for real-time deployment. Latent CoT methods attempt to close this gap by compressing reasoning into continuous hidden states, …

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.3

Self-Aware Vector Embeddings for Retrieval-Augmented Generation: A Neuroscience-Inspired Framework for Temporal, Confidence-Weighted, and Relational Knowledge

2026-04-22 · Naizhong Xu

Research Track A · General AI

Modern retrieval-augmented generation (RAG) systems treat vector embeddings as static, context-free artifacts: an embedding has no notion of when it was created, how trustworthy its source is, or which other embeddings depend on it. This flattening of knowledge has a measurable cost: recent work on VersionRAG reports t…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.3

CGC: Compositional Grounded Contrast for Fine-Grained Multi-Image Understanding

2026-04-24 · Lihao Zheng, Zhenwei Shao, Yu Zhou, Yan Yang, Xintian Shen, Jiawei Chen, Hao Ma, Tao Wei

General AI

Although Multimodal Large Language Models (MLLMs) have advanced rapidly, they still face notable challenges in fine-grained multi-image understanding, often exhibiting spatial hallucination, attention leakage, and failures in object constancy. In addition, existing approaches typically rely on expensive human annotatio…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.3

SOLAR-RL: Semi-Online Long-horizon Assignment Reinforcement Learning

2026-04-24 · Jichao Wang, Liuyang Bian, Yufeng Zhou, Han Xiao, Yue Pan, Guozhi Wang, Hao Wang, Zhaoxiong Wang, Yafei Wen, Xiaoxin Chen, Shuai Ren, Lingfang Zeng

Research Track B · General AI

As Multimodal Large Language Models (MLLMs) mature, GUI agents are evolving from static interactions to complex navigation. While Reinforcement Learning (RL) has emerged as a promising paradigm for training MLLM agents on dynamic GUI tasks, its effective application faces a dilemma. Standard Offline RL often relies on …

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.3

MEG-RAG: Quantifying Multi-modal Evidence Grounding for Evidence Selection in RAG

2026-04-27 · Xihang Wang, Zihan Wang, Chengkai Huang, Quan Z. Sheng, Lina Yao

General AI

Multimodal Retrieval-Augmented Generation (MRAG) addresses key limitations of Multimodal Large Language Models (MLLMs), such as hallucination and outdated knowledge. However, current MRAG systems struggle to distinguish whether retrieved multimodal data truly supports the semantic core of an answer or merely provides s…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.3

MEMCoder: Multi-dimensional Evolving Memory for Private-Library-Oriented Code Generation

2026-04-27 · Mofei Li, Taozhi Chen, Guowei Yang, Jia Li

Research Track A · General AI

Large Language Models (LLMs) excel at general code generation, but their performance drops sharply in enterprise settings that rely on internal private libraries absent from public pre-training corpora. While Retrieval-Augmented Generation (RAG) offers a training-free alternative by providing static API documentation, …

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.3

Formalize, Don't Optimize: The Heuristic Trap in LLM-Generated Combinatorial Solvers

2026-05-12 · Haoyu Wang, Yuliang Song, Tao Li, Zhiwei Deng, Yaqing Wang, Deepak Ramachandran, Eldan Cohen, Dan Roth

General AI

Large Language Models (LLMs) struggle to solve complex combinatorial problems through direct reasoning, so recent neuro-symbolic systems increasingly use them to synthesize executable solvers. A central design question is how the LLM should represent the solver, and whether it should also attempt to optimize search. We…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.3

LychSim: A Controllable and Interactive Simulation Framework for Vision Research

2026-05-12 · Wufei Ma, Chloe Wang, Siyi Chen, Jiawei Peng, Patrick Li, Alan Yuille

General AI

While self-supervised pretraining has reduced vision systems' reliance on synthetic data, simulation remains an indispensable tool for closed-loop optimization and rigorous out-of-distribution (OOD) evaluation. However, modern simulation platforms often present steep technical barriers, requiring extensive expertise in…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.3

OptArgus: A Multi-Agent System to Detect Hallucinations in LLM-based Optimization Modeling

2026-05-12 · Zhong Li, Zihan Guo, Xiaohan Lu, Juntao Wang, Jie Song, Chao Shen, Jiageng Wu, Mingyang Sun

Research Track A · General AI

Large language models (LLMs) are increasingly used to translate natural-language optimization problems into mathematical formulations and solver code, but matching the reference objective value is not a reliable test of correctness: an artifact may agree numerically while still changing the underlying optimization sema…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.2

AgentSim: A Platform for Verifiable Agent-Trace Simulation

2026-04-29 · Saber Zerhoudi, Michael Granitzer, Jelena Mitrovic

General AI

Training trustworthy agentic LLMs requires data that shows the grounded reasoning process, not just the final answer. Existing datasets fall short: question-answering data is outcome-only, chain-of-thought data is not tied to specific documents, and web-agent datasets track interface actions rather than the core retrie…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.2

Three-Step Nav: A Hierarchical Global-Local Planner for Zero-Shot Vision-and-Language Navigation

2026-04-29 · Wanrong Zheng, Yunhao Ge, Laurent Itti

General AI

Breakthrough progress in vision-based navigation through unknown environments has been achieved by using multimodal large language models (MLLMs). These models can plan a sequence of motions by evaluating the current view at each time step against the task and goal given to the agent. However, current zero-shot Vision-…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.2

FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption

2026-04-30 · Yanting Wang, Chenlong Yin, Ying Chen, Jinyuan Jia

General AI

Long-context large language models (LLMs)-for example, Gemini-3.1-Pro and Qwen-3.5-are widely used to empower many real-world applications, such as retrieval-augmented generation, autonomous agents, and AI assistants. However, security remains a major concern for their widespread deployment, with threats such as prompt…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.2

CustomDancer: Customized Dance Recommendation by Text-Dance Retrieval

2026-05-01 · Yawen Qin, Ke Qiu, Qin Zhang

General AI

Dance serves as both a cultural cornerstone and a medium for personal expression, yet the rapid growth of online dance content has made personalized discovery increasingly difficult. Text-based dance retrieval offers a natural interface for users to search with choreographic intent, but it remains underexplored because…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.2

RunAgent: Interpreting Natural-Language Plans with Constraint-Guided Execution

2026-05-01 · Arunabh Srivastava, Mohammad A., Khojastepour, Srimat Chakradhar, Sennur Ulukus

General AI

Humans solve problems by executing targeted plans, yet large language models (LLMs) remain unreliable for structured workflow execution. We propose RunAgent, a multi-agent plan execution platform that interprets natural-language plans while enforcing stepwise execution through constraints and rubrics. RunAgent bridges …

Review
pending
Role
unreviewed
Read
now
huggingface Score 17.0

MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution

2026-03-19 · Minhua Lin, Zhiwei Zhang, Hanqing Lu, Hui Liu, Xianfeng Tang, Qi He, Xiang Zhang, Suhang Wang

General AI

Memory-augmented LLM agents maintain external memory banks to support long-horizon interaction, yet most existing systems treat construction, retrieval, and utilization as isolated subroutines. This creates two coupled challenges: strategic blindness on the forward path of the memory cycle, where construction and retri…

Review
pending
Role
unreviewed
Read
now
huggingface Score 17.0

Signals: Trajectory Sampling and Triage for Agentic Interactions

2026-04-01 · Shuguang Chen, Adil Hafeez, Salman Paracha

General AI

Agentic applications based on large language models increasingly rely on multi-step interaction loops involving planning, action execution, and environment feedback. While such systems are now deployed at scale, improving them post-deployment remains challenging. Agent trajectories are voluminous and non-deterministic,…

Review
pending
Role
unreviewed
Read
now
huggingface Score 17.0

GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers

2026-04-03 · Shufan Jiang, Chios Chen, Zhiyang Chen

General AI

The autonomous discovery of bugs remains a significant challenge in modern software development. Compared to code generation, the complexity of dynamic runtime environments makes bug discovery considerably harder for large language models (LLMs). In this paper, we take game development as a representative domain and in…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.0

SuperLocalMemory V3.3: The Living Brain -- Biologically-Inspired Forgetting, Cognitive Quantization, and Multi-Channel Retrieval for Zero-LLM Agent Memory Systems

2026-04-06 · Varun Pratap Bhardwaj

Research Track A · General AI

AI coding agents operate in a paradox: they possess vast parametric knowledge yet cannot remember a conversation from an hour ago. Existing memory systems store text in vector databases with single-channel retrieval, require cloud LLMs for core operations, and implement none of the cognitive processes that make human m…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.0

In-Place Test-Time Training

2026-04-07 · Guhao Feng, Shengjie Luo, Kai Hua, Ge Zhang, Di He, Wenhao Huang, Tianle Cai

Research Track A · General AI

The static ``train then deploy" paradigm fundamentally limits Large Language Models (LLMs) from dynamically adapting their weights in response to continuous streams of new information inherent in real-world tasks. Test-Time Training (TTT) offers a compelling alternative by updating a subset of model parameters (fast we…

Review
pending
Role
unreviewed
Read
now
huggingface Score 17.0

Faithful GRPO: Improving Visual Spatial Reasoning in Multimodal Language Models via Constrained Policy Optimization

2026-04-09 · Sai Srinivas Kancheti, Aditya Kanade, Rohit Sinha, Vineeth N Balasubramanian, Tanuja Ganu

General AI

Multimodal reasoning models (MRMs) trained with reinforcement learning with verifiable rewards (RLVR) show improved accuracy on visual reasoning benchmarks. However, we observe that accuracy gains often come at the cost of reasoning quality: generated Chain-of-Thought (CoT) traces are frequently inconsistent with the f…

Review
pending
Role
unreviewed
Read
now
huggingface Score 17.0

LPM 1.0: Video-based Character Performance Model

2026-04-09 · Ailing Zeng, Casper Yang, Chauncey Ge, Eddie Zhang, Garvey Xu, Gavin Lin, Gilbert Gu, Jeremy Pi, Leo Li, Mingyi Shi, Sheng Bi, Steven Tang, Thorn Hang, Tobey Guo, Vincent Li, Xin Tong, Yikang Li, Yuchen Sun, Yue, Zhao, Yuhan Lu, Yuwei Li, Zane Zhang, Zeshi Yang, Zi Ye

General AI

Performance, the externalization of intent, emotion, and personality through visual, vocal, and temporal behavior, is what makes a character alive. Learning such performance from video is a promising alternative to traditional 3D pipelines. However, existing video models struggle to jointly achieve high expressiveness,…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.0

Towards a Data-Parameter Correspondence for LLMs: A Preliminary Discussion

2026-04-19 · Ou Wu

Research Track A · General AI

Large language model optimization has historically bifurcated into isolated data-centric and model-centric paradigms: the former manipulates involved samples through selection, augmentation, or poisoning, while the latter tunes model weights via masking, quantization, or low-rank adaptation. This paper establishes a un…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.0

Incremental learning for audio classification with Hebbian Deep Neural Networks

2026-04-20 · Riccardo Casciotti, Francesco De Santis, Alberto Antonietti, Annamaria Mesaros

Research Track A

The ability of humans for lifelong learning is an inspiration for deep learning methods and in particular for continual learning. In this work, we apply Hebbian learning, a biologically inspired learning process, to sound classification. We propose a kernel plasticity approach that selectively modulates network kernels…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.8

Environment Maps: Structured Environmental Representations for Long-Horizon Agents

2026-03-24 · Yenchia Feng, Chirag Sharma, Karime Maamari

Research Track B · General AI

Although large language models (LLMs) have advanced rapidly, robust automation of complex software workflows remains an open problem. In long-horizon settings, agents frequently suffer from cascading errors and environmental stochasticity; a single misstep in a dynamic interface can lead to task failure, resulting in h…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.8

Seeing to Ground: Visual Attention for Hallucination-Resilient MDLLMs

2026-03-26 · Vishal Narnaware, Animesh Gupta, Kevin Zhai, Zhenyi Wang, Mubarak Shah

General AI

Multimodal Diffusion Large Language Models (MDLLMs) achieve high-concurrency generation through parallel masked decoding, yet the architectures remain prone to multimodal hallucinations. This structural vulnerability stems from an algorithmic flaw: the decoder ranks candidate tokens based on textual likelihood without …

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.8

BACE: LLM-based Code Generation through Bayesian Anchored Co-Evolution of Code and Test Populations

2026-03-30 · Kaushitha Silva, Srinath Perera

General AI

Large Language Models (LLMs) have demonstrated impressive capabilities in code generation. While an interactive feedback loop can improve performance, writing effective tests is a non-trivial task. Early multi-agent frameworks, such as AgentCoder, automated this process but relied on generated tests as absolute ground …

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.8

Gen-Searcher: Reinforcing Agentic Search for Image Generation

2026-03-30 · Kaituo Feng, Manyuan Zhang, Shuang Chen, Yunlong Lin, Kaixuan Fan, Yilei Jiang, Hongyu Li, Dian Zheng, Chenyang Wang, Xiangyu Yue

General AI

Recent image generation models have shown strong capabilities in generating high-fidelity and photorealistic images. However, they are fundamentally constrained by frozen internal knowledge, thus often failing on real-world scenarios that are knowledge-intensive or require up-to-date information. In this paper, we pres…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.8

HandX: Scaling Bimanual Motion and Interaction Generation

2026-03-30 · Zimu Zhang, Yucheng Zhang, Xiyan Xu, Ziyin Wang, Sirui Xu, Kai Zhou, Bing Zhou, Chuan Guo, Jian Wang, Yu-Xiong Wang, Liang-Yan Gui

General AI

Synthesizing human motion has advanced rapidly, yet realistic hand motion and bimanual interaction remain underexplored. Whole-body models often miss the fine-grained cues that drive dexterous behavior, finger articulation, contact timing, and inter-hand coordination, and existing resources lack high-fidelity bimanual …

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.8

An Empirical Study of Multi-Agent Collaboration for Automated Research

2026-03-31 · Yang Shen, Zhenyi Yi, Ziyi Zhao, Lijun Sun, Dongyang Li, Chin-Teng Lin, Yuhui Shi

Research Track A · General AI

As AI agents evolve, the community is rapidly shifting from single Large Language Models (LLMs) to Multi-Agent Systems (MAS) to overcome cognitive bottlenecks in automated research. However, the optimal multi-agent coordination framework for these autonomous agents remains largely unexplored. In this paper, we present …

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.8

Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning

2026-04-06 · Lei Zhang, Junjiao Tian, Zhipeng Fan, Kunpeng Li, Jialiang Wang, Weifeng Chen, Markos Georgopoulos, Felix Juefei-Xu, Yuxiang Bao, Julian McAuley, Manling Li, Zecheng He

General AI

Humans paint images incrementally: they plan a global layout, sketch a coarse draft, inspect, and refine details, and most importantly, each step is grounded in the evolving visual states. However, can unified multimodal models trained on text-image interleaved datasets also imagine the chain of intermediate states? In…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.8

MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive Control

2026-04-07 · Yuchi Wang, Haiyang Yu, Weikang Bian, Jiefeng Long, Xiao Liang, Chao Feng, Hongsheng Li

General AI

MLLMs have been successfully applied to multimodal embedding tasks, yet their generative reasoning capabilities remain underutilized. Directly incorporating chain-of-thought reasoning into embedding learning introduces two fundamental challenges. First, structural misalignment between instance-level reasoning and pairw…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.8

Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework

2026-04-07 · Komal Kumar, Aman Chadha, Salman Khan, Fahad Shahbaz Khan, Hisham Cholakkal

General AI

The rapid growth of scientific literature has made it increasingly difficult for researchers to efficiently discover, evaluate, and synthesize relevant work. Recent advances in multi-agent large language models (LLMs) have demonstrated strong potential for understanding user intent and are being trained to utilize vari…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.8

From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling

2026-04-28 · Jianghao Lin, Zi Ling, Chenyu Zhou, Tianyi Xu, Ruoqing Jiang, Zizhuo Wang, Dongdong Ge

General AI

Optimization modeling underpins real-world decision-making in logistics, manufacturing, energy, and public services, but reliably solving such problems from natural-language requirements remains challenging for current large language models (LLMs). In this paper, we propose \emph{Agora-Opt}, a modular agentic framework…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.8

RADD: Retrieval-Augmented Discrete Diffusion for Multi-Modal Knowledge Graph Completion

2026-04-28 · Guanglin Niu, Bo Li

General AI

Most multi-modal knowledge graph completion (MMKGC) models use one embedding scorer to do both retrieval over the full entity set and final decision making. We argue that this coupling is a core bottleneck: global high-recall search and local fine-grained disambiguation require different inductive biases. Therefore, we…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.8

Recursive Multi-Agent Systems

2026-04-28 · Xiyuan Yang, Jiaru Zou, Rui Pan, Ruizhong Qiu, Pan Lu, Shizhe Diao, Jindong Jiang, Hanghang Tong, Tong Zhang, Markus J. Buehler, Jingrui He, James Zou

General AI

Recursive or looped language models have recently emerged as a new scaling axis by iteratively refining the same model computation over latent states to deepen reasoning. We extend such scaling principle from a single model to multi-agent systems, and ask: Can agent collaboration itself be scaled through recursion? To …

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.8

Toward Multimodal Conversational AI for Age-Related Macular Degeneration

2026-04-28 · Ran Gu, Benjamin Hou, Mélanie Hébert, Asmita Indurkar, Yifan Yang, Emily Y. Chew, Tiarnán D. L. Keenan, Zhiyong Lu

General AI

Despite strong performance of deep learning models in retinal disease detection, most systems produce static predictions without clinical reasoning or interactive explanation. Recent advances in multimodal large language models (MLLMs) integrate diagnostic predictions with clinically meaningful dialogue to support clin…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.8

Beyond Negative Rollouts: Positive-Only Policy Optimization with Implicit Negative Gradients

2026-05-07 · Mingwei Xu, Hao Fang

General AI

Reinforcement learning with verifiable rewards (RLVR), due to the deterministic verification, becomes a dominant paradigm for enhancing the reasoning ability of large language models (LLMs). The community witnesses the rapid change from the Proximal Policy Optimization (PPO) to Group Relative Policy Optimization (GRPO)…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.8

GlazyBench: A Benchmark for Ceramic Glaze Property Prediction and Image Generation

2026-05-07 · Ziyu Zhai, Siyou Li, Juexi Shao, Juntao Yu

General AI

Developing ceramic glazes is a costly, time-consuming process of trial and error due to complex chemistry, placing a significant burden on independent artists. While recent advances in multimodal AI offer a modern solution, the field lacks the large-scale datasets required to train these models. We propose GlazyBench, …

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.8

SkillOS: Learning Skill Curation for Self-Evolving Agents

2026-05-07 · Siru Ouyang, Jun Yan, Yanfei Chen, Rujun Han, Zifeng Wang, Bhavana Dalvi Mishra, Rui Meng, Chun-Liang Li, Yizhu Jiao, Kaiwen Zha, Maohao Shen, Vishy Tirumalashetty, George Lee, Jiawei Han, Tomas Pfister, Chen-Yu Lee

General AI

LLM-based agents are increasingly deployed to handle streaming tasks, yet they often remain one-off problem solvers that fail to learn from past interactions. Reusable skills distilled from experience provide a natural substrate for self-evolution, where high-quality skill curation serves as the key bottleneck. Existin…

Review
pending
Role
unreviewed
Read
now
huggingface Score 16.8

SeePhys Pro: Diagnosing Modality Transfer and Blind-Training Effects in Multimodal RLVR for Physics Reasoning

2026-05-10 · Kun Xiang, Terry Jingchen Zhang, Zirong Liu, Bokai Zhou, Yueling Tang, Junjie Yu, Jiacong Lu, Shangrui Huang, Heng Li, Likui Zhang, Kunkun Liu, Changzheng Zhang, Yangle Fang, Boqiang Guo, Hui-Ling Zhen, Dandan Tu, Yinya Huang, Xiaodan Liang

General AI

We introduce SeePhys Pro, a fine-grained modality transfer benchmark that studies whether models preserve the same reasoning capability when critical information is progressively transferred from text to image. Unlike standard vision-essential benchmarks that evaluate a single input form, SeePhys Pro features four sema…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.6

C-CoT: Counterfactual Chain-of-Thought with Vision-Language Models for Safe Autonomous Driving

2026-05-11 · Kefei Tian, Yuansheng Lian, Kai Yang, Xiangdong Chen, Shen Li

General AI

Safety-critical planning in complex environments, particularly at urban intersections, remains a fundamental challenge for autonomous driving. Existing methods, whether rule-based or data-driven, frequently struggle to capture complex scene semantics, infer potential risks, and make reliable decisions in rare, high-ris…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.5

Non-Equilibrium Stochastic Dynamics as a Unified Framework for Insight and Repetitive Learning: A Kramers Escape Approach to Continual Learning

2026-04-05 · Gunn Kim

Research Track A · General AI

Continual learning in artificial neural networks is fundamentally limited by the stability--plasticity dilemma: systems that retain prior knowledge tend to resist acquiring new knowledge, and vice versa. Existing approaches, most notably elastic weight consolidation~(EWC), address this empirically without a physical ac…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.5

Continual Visual Anomaly Detection on the Edge: Benchmark and Efficient Solutions

2026-04-07 · Manuel Barusco, Francesco Borsatti, David Petrovic, Davide Dalle Pezze, Gian Antonio Susto

Research Track A · General AI

Visual Anomaly Detection (VAD) is a critical task for many applications including industrial inspection and healthcare. While VAD has been extensively studied, two key challenges remain largely unaddressed in conjunction: edge deployment, where computational resources are severely constrained, and continual learning, w…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.5

ELC: Evidential Lifelong Classifier for Uncertainty Aware Radar Pulse Classification

2026-04-08 · Mohamed Rabie, Chinthana Panagamuwa, Konstantinos G. Kyriakopoulos

Research Track A

Reliable radar pulse classification is essential in Electromagnetic Warfare for situational awareness and decision support. Deep Neural Networks have shown strong performance in radar pulse and RF emitter recognition; however, on their own they struggle to efficiently learn new pulses and lack mechanisms for expressing…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.5

From Selection to Scheduling: Federated Geometry-Aware Correction Makes Exemplar Replay Work Better under Continual Dynamic Heterogeneity

2026-04-09 · Zhuang Qi, Ying-Peng Tang, Lei Meng, Guoqing Chao, Lei Wu, Han Yu, Xiangxu Meng

Research Track A

Exemplar replay has become an effective strategy for mitigating catastrophic forgetting in federated continual learning (FCL) by retaining representative samples from past tasks. Existing studies focus on designing sample-importance estimation mechanisms to identify information-rich samples. However, they typically ove…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.5

Leveraging Complementary Embeddings for Replay Selection in Continual Learning with Small Buffers

2026-04-09 · Danit Yanowsky, Daphna Weinshall

Research Track A · General AI

Catastrophic forgetting remains a key challenge in Continual Learning (CL). In replay-based CL with severe memory constraints, performance critically depends on the sample selection strategy for the replay buffer. Most existing approaches construct memory buffers using embeddings learned under supervised objectives. Ho…

Review
pending
Role
unreviewed
Read
now
huggingface Score 16.5

HP-Edit: A Human-Preference Post-Training Framework for Image Editing

2026-04-21 · Fan Li, Chonghuinan Wang, Lina Lei, Yuping Qiu, Jiaqi Xu, Jiaxiu Jiang, Xinran Qin, Zhikai Chen, Fenglong Song, Zhixin Wang, Renjing Pei, Wangmeng Zuo

General AI

Common image editing tasks typically adopt powerful generative diffusion models as the leading paradigm for real-world content editing. Meanwhile, although reinforcement learning (RL) methods such as Diffusion-DPO and Flow-GRPO have further improved generation quality, efficiently applying Reinforcement Learning from H…

Review
pending
Role
unreviewed
Read
now
huggingface Score 16.5

Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment

2026-04-21 · Bobo Li, Rui Wu, Zibo Ji, Meishan Zhang, Hao Fei, Min Zhang, Mong-Li Lee, Wynne Hsu

General AI

Large Language Model agents have rapidly evolved from static text generators into dynamic systems capable of executing complex autonomous workflows. To enhance reliability, multi-agent frameworks assigning specialized roles are increasingly adopted to enable self-reflection and mutual auditing. While such role-playing …

Review
pending
Role
unreviewed
Read
now
huggingface Score 16.5

Visual Reasoning through Tool-supervised Reinforcement Learning

2026-04-21 · Qihua Dong, Gozde Sahin, Pei Wang, Zhaowei Cai, Robik Shrestha, Hao Yang, Davide Modolo

General AI

In this paper, we investigate the problem of how to effectively master tool-use to solve complex visual reasoning tasks for Multimodal Large Language Models. To achieve that, we propose a novel Tool-supervised Reinforcement Learning (ToolsRL) framework, with direct tool supervision for more effective tool-use learning.…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.3

Memory as Metabolism: A Design for Companion Knowledge Systems

2026-04-13 · Stefan Miteski

Research Track A · General AI

Retrieval-Augmented Generation remains the dominant pattern for giving LLMs persistent memory, but a visible cluster of personal wiki-style memory architectures emerged in April 2026 -- design proposals from Karpathy, MemPalace, and LLM Wiki v2 that compile knowledge into an interlinked artifact for long-term use by a …

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.3

Cycle-Consistent Search: Question Reconstructability as a Proxy Reward for Search Agent Training

2026-04-14 · Sohyun An, Shuibenyang Yuan, Hayeon Lee, Cho-Jui Hsieh, Alexander Min

General AI

Reinforcement Learning (RL) has shown strong potential for optimizing search agents in complex information retrieval tasks. However, existing approaches predominantly rely on gold supervision, such as ground-truth answers, which is difficult to scale. To address this limitation, we propose Cycle-Consistent Search (CCS)…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.3

IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning

2026-04-16 · Zihan Liang, Yufei Ma, Ben Chen, Zhipeng Qian, Huangyu Dai, Lingtao Mao, Xuxin Zhang, Chenyi Lei, Wenwu Ou

General AI

Reinforcement learning has emerged as an effective paradigm for training large language models to perform search-augmented reasoning. However, existing approaches rely on trajectory-level rewards that cannot distinguish precise search queries from vague or redundant ones within a rollout group, and collapse to a near-z…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.3

MCSC-Bench: Multimodal Context-to-Script Creation for Realistic Video Production

2026-04-16 · Huanran Hu, Zihui Ren, Dingyi Yang, Liangyu Chen, Qixiang Gao, Tiezheng Ge, Qin Jin

General AI

Real-world video creation often involves a complex reasoning workflow of selecting relevant shots from noisy materials, planning missing shots for narrative completeness, and organizing them into coherent storylines. However, existing benchmarks focus on isolated sub-tasks and lack support for evaluating this full proc…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.3

RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

2026-04-16 · Hao Gao, Shaoyu Chen, Yifan Zhu, Yuehao Song, Wenyu Liu, Qian Zhang, Xinggang Wang

General AI

High-level autonomous driving requires motion planners capable of modeling multimodal future uncertainties while remaining robust in closed-loop interactions. Although diffusion-based planners are effective at modeling complex trajectory distributions, they often suffer from stochastic instabilities and the lack of cor…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.3

A multimodal and temporal foundation model for virtual patient representations at healthcare system scale

2026-04-20 · Andrew Zhang, Tong Ding, Sophia J. Wagner, Caiwei Tian, Ming Y. Lu, Rowland Pettit, Joshua E. Lewis, Alexandre Misrahi, Dandan Mo, Long Phi Le, Faisal Mahmood

General AI

Modern medicine generates vast multimodal data across siloed systems, yet no existing model integrates the full breadth and temporal depth of the clinical record into a unified patient representation. We introduce Apollo, a multimodal temporal foundation model trained and evaluated on over three decades of longitudinal…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.3

Ask Only When Needed: Proactive Retrieval from Memory and Skills for Experience-Driven Lifelong Agents

2026-04-22 · Yuxuan Cai, Jie Zhou, Qin Chen, Liang He

Research Track A · General AI

Online lifelong learning enables agents to accumulate experience across interactions and continually improve on long-horizon tasks. However, existing methods typically treat retrieval from past experience as a passive operation, triggering it only at task initialization or after completing a step. Consequently, agents …

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.3

LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model

2026-04-22 · Inclusion AI, Tiwei Bie, Haoxing Chen, Tieyuan Chen, Zhenglin Cheng, Long Cui, Kai Gan, Zhicheng Huang, Zhenzhong Lan, Haoquan Li, Jianguo Li, Tao Lin, Qi Qin, Hongjun Wang, Xiaomei Wang, Haoyuan Wu, Yi Xin, Junbo Zhao

General AI

We present LLaDA2.0-Uni, a unified discrete diffusion large language model (dLLM) that supports multimodal understanding and generation within a natively integrated framework. Its architecture combines a fully semantic discrete tokenizer, a MoE-based dLLM backbone, and a diffusion decoder. By discretizing continuous vi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.3

OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Model

2026-04-22 · Qiguang Chen, Chengyu Luan, Jiajun Wu, Qiming Yu, Yi Yang, Yizhuo Li, Jingqi Tong, Xiachong Feng, Libo Qin, Wanxiang Che

General AI

Large vision-language models (LVLMs) have made substantial advances in reasoning tasks at the Olympiad level. Nevertheless, current Olympiad-level multimodal reasoning benchmarks for these models often emphasize single-image analysis and fail to exploit contextual information across multiple images. We present OMIBench…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.3

Where and What: Reasoning Dynamic and Implicit Preferences in Situated Conversational Recommendation

2026-04-22 · Dongding Lin, Jian Wang, Yongqi Li, Wenjie Li

General AI

Situated conversational recommendation (SCR), which utilizes visual scenes grounded in specific environments and natural language dialogue to deliver contextually appropriate recommendations, has emerged as a promising research direction due to its close alignment with real-world scenarios. Compared to traditional reco…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.3

A Multimodal Text- and Graph-Based Approach for Open-Domain Event Extraction from Documents

2026-04-23 · Praval Sharma

General AI

Event extraction is essential for event understanding and analysis. It supports tasks such as document summarization and decision-making in emergency scenarios. However, existing event extraction approaches have limitations: (1) closed-domain algorithms are restricted to predefined event types and thus rarely generaliz…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.3

Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models

2026-04-23 · Chee Wei Tan, Yuchen Wang, Shangxin Guo

General AI

This paper introduces a new paradigm for AI game programming, leveraging large language models (LLMs) to extend and operationalize Claude Shannon's taxonomy of game-playing machines. Central to this paradigm is Nemobot, an interactive agentic engineering environment that enables users to create, customize, and deploy L…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.3

BERAG: Bayesian Ensemble Retrieval-Augmented Generation for Knowledge-based Visual Question Answering

2026-04-24 · Jinghong Chen, Jingbiao Mei, Guangyu Yang, Bill Byrne

General AI

A common approach to question answering with retrieval-augmented generation (RAG) is to concatenate documents into a single context and pass it to a language model to generate an answer. While simple, this strategy can obscure the contribution of individual documents, making attribution difficult and contributing to th…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.3

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

2026-05-12 · Di Wu, Zixiang Ji, Asmi Kawatkar, Bryan Kwan, Jia-Chen Gu, Nanyun Peng, Kai-Wei Chang

Research Track B · General AI

Long-term memory is crucial for agents in specialized web environments, where success depends on recalling interface affordances, state dynamics, workflows, and recurring failure modes. However, existing memory benchmarks for agents mostly focus on user histories, short traces, or downstream task success, leaving open …

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.3

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

2026-05-12 · Haiwen Diao, Penghao Wu, Hanming Deng, Jiahao Wang, Shihao Bai, Silei Wu, Weichen Fan, Wenjie Ye, Wenwen Tong, Xiangyu Fan, Yan Li, Yubo Wang, Zhijie Cao, Zhiqian Lin, Zhitao Yang, Zhongang Cai, Yuwei Niu, Yue Zhu, Bo Liu, Chengguang Lv, Haojia Yu, Haozhe Xie, Hongli Wang, Jianan Fan, Jiaqi Li, Jiefan Lu, Jingcheng Ni, Junxiang Xu, Kaihuan Liang, Lianqiang Shi, Linjun Dai, Linyan Wang, Oscar Qian, Peng Gao, Pengfei Liu, Qingping Sun, Rui Shen, Ruisi Wang, Shengnan Ma, Shuang Yang, Siyi Xie, Siying Li, Tianbo Zhong, Xiangli Kong, Xuanke Shi, Yang Gao, Yongqiang Yao, Yves Wang, Zhengqi Bai, Zhengyu Lin, Zixin Yin, Wenxiu Sun, Ruihao Gong, Quan Wang, Lewei Lu, Lei Yang, Ziwei Liu, Dahua Lin

General AI

Recent large vision-language models (VLMs) remain fundamentally constrained by a persistent dichotomy: understanding and generation are treated as distinct problems, leading to fragmented architectures, cascaded pipelines, and misaligned representation spaces. We argue that this divide is not merely an engineering arti…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.2

ClawGym: A Scalable Framework for Building Effective Claw Agents

2026-04-29 · Fei Bai, Huatong Song, Shuang Sun, Daixuan Cheng, Yike Yang, Chuan Hao, Renyuan Li, Feng Chang, Yuan Wei, Ran Tao, Bryan Dai, Jian Yang, Wayne Xin Zhao

General AI

Claw-style environments support multi-step workflows over local files, tools, and persistent workspace states. However, scalable development around these environments remains constrained by the absence of a systematic framework, especially one for synthesizing verifiable training data and integrating it with agent trai…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.2

Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models

2026-04-29 · Gongbo Zhang, Wen Wang, Ye Tian, Li Yuan

General AI

Diffusion large language models (dLLMs) offer parallel decoding and bidirectional context, but state-of-the-art dLLMs require billions of parameters for competitive performance. While existing distillation methods for dLLMs reduce inference steps within a single architecture, none address cross-architecture knowledge t…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.2

Stable Behavior, Limited Variation: Persona Validity in LLM Agents for Urban Sentiment Perception

2026-04-30 · Neemias B da Silva, Rodrigo Minetto, Daniel Silver, Thiago H Silva

General AI

Large Language Models (LLMs) are increasingly used as proxies for human perception in urban analysis, yet it remains unclear whether persona prompting produces meaningful and reproducible behavioral diversity. We investigate whether distinct personas influence urban sentiment judgments generated by multimodal LLMs. Usi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.2

Can Coding Agents Reproduce Findings in Computational Materials Science?

2026-05-01 · Ziyang Huang, Yi Cao, Ali K. Shargh, Jing Luo, Ruidong Mei, Mohd Zaki, Zhan Liu, Wyatt Bunstine, William Jurayj, Somdatta Goswami, Tyrel McQueen, Michael Shields, Jaafar El-Awady, Paulette Clancy, Benjamin Van Durme, Nicholas Andrews, William Walden, Daniel Khashabi

General AI

Large language models are increasingly deployed as autonomous coding agents and have achieved remarkably strong performance on software engineering benchmarks. However, it is unclear whether such success transfers to computational scientific workflows, where tasks require not only strong coding ability, but also the ab…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.2

Make Your LVLM KV Cache More Lightweight

2026-05-01 · Xihao Chen, Yangyang Guo, Roger Zimmermann

General AI

Key-Value (KV) cache has become a de facto component of modern Large Vision-Language Models (LVLMs) for inference. While it enhances decoding efficiency in Large Language Models (LLMs), its direct adoption in LVLMs introduces substantial GPU memory overhead due to the large number of vision tokens processed during the …

Review
pending
Role
unreviewed
Read
now
huggingface Score 16.0

IQuest-Coder-V1 Technical Report

2026-03-17 · Jian Yang, Wei Zhang, Shawn Guo, Zhengmao Ye, Lin Jing, Shark Liu, Yizhi Li, Jiajun Wu, Cening Liu, X. Ma, Yuyang Song, Siwei Wu, Yuwen Li, L. Liao, T. Zheng, Ziling Huang, Zelong Huang, Che Liu, Yan Xing, Renyuan Li, Qingsong Cai, Hanxu Yan, Siyue Wang, Shikai Li, Jason Klein Liu, An Huang, Yongsheng Kang, Jinxing Zhang, Chuan Hao, Haowen Wang, Weicheng Gu, Ran Tao, Mingjie Tang, Peihao Wu, Jianzhou Wang, Xianglong Liu, Weifeng Lv, Bryan Dai

General AI

In this report, we introduce the IQuest-Coder-V1 series-(7B/14B/40B/40B-Loop), a new family of code large language models (LLMs). Moving beyond static code representations, we propose the code-flow multi-stage training paradigm, which captures the dynamic evolution of software logic through different phases of the pipe…

Review
pending
Role
unreviewed
Read
now
huggingface Score 16.0

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

2026-03-26 · Yicheng Zou, Dongsheng Zhu, Lin Zhu, Tong Zhu, Yunhua Zhou, Peiheng Zhou, Xinyu Zhou, Dongzhan Zhou, Zhiwang Zhou, Yuhao Zhou, Bowen Zhou, Zhanping Zhong, Zhijie Zhong, Haiteng Zhao, Penghao Zhao, Xiaomeng Zhao, Zhiyuan Zhao, Yechen Zhang, Jin Zhang, Wenwei Zhang, Hongjie Zhang, Zhuo Zhang, Wenlong Zhang, Bo Zhang, Chao Zhang, Chen Zhang, Yuhang Zang, Fei Yuan, Jiakang Yuan, Jiashuo Yu, Jinhui Yin, Haochen Ye, Qian Yao, Bowen Yang, Danni Yang, Kaichen Yang, Ziang Yan, Jun Xu, Yicheng Xu, Wanghan Xu, Xuenan Xu, Chao Xu, Ruiliang Xu, Shuhao Xing, Long Xing, Xinchen Xie, Ling-I Wu, Zijian Wu, Zhenyu Wu, Lijun Wu, Yue Wu, Jianyu Wu, Wen Wu, Fan Wu, Xilin Wei, Qi Wei, Bingli Wang, Rui Wang, Ziyi Wang, Zun Wang, Yi Wang, Haomin Wang, Yizhou Wang, Lintao Wang, Yiheng Wang, Longjiang Wang, Bin Wang, Jian Tong, Zhongbo Tian, Huanze Tang, Chen Tang, Shixiang Tang, Yu Sun, Qiushi Sun, Xuerui Su, Qisheng Su, Chenlin Su, Demin Song, Jin Shi, Fukai Shang, Yuchen Ren, Pengli Ren, Xiaoye Qu, Yuan Qu, Jiantao Qiu, Yu Qiao, Runyu Peng, Tianshuo Peng, Jiahui Peng, Qizhi Pei, Zhuoshi Pan, Linke Ouyang, Wenchang Ning, Yichuan Ma, Zerun Ma, Ningsheng Ma, Runyuan Ma, Chengqi Lyu, Haijun Lv, Han Lv, Lindong Lu, Kuikun Liu, Jiangning Liu, Yuhong Liu, Kai Liu, Hongwei Liu, Zhoumianze Liu, Mengjie Liu, Ziyu Liu, Wenran Liu, Yang Liu, Liwei Liu, Kaiwen Liu, Junyao Lin, Junming Lin, Tianyang Lin, Dahua Lin, Jianze Liang, Linyang Li, Peiji Li, Zonglin Li, Zehao Li, Pengze Li, Guoyan Li, Lingkai Kong, Linglin Jing, Zhenjiang Jin, Feifei Jiang, Qian Jiang, Junhao Huang, Zixian Huang, Haian Huang, Zhouqi Hua, Han Hu, Linfeng Hou, Yinan He, Conghui He, Tianyao He, Xu Guo, Qipeng Guo, Aijia Guo, Yuzhe Gu, Lixin Gu, Jingyang Gong, Qiming Ge, Jiaye Ge, Songyang Gao, Jianfei Gao, Xinyu Fang, Caihua fan, Yue Fan, Yanhui Duan, Zichen Ding, Shengyuan Ding, Xuanlang Dai, Erfei Cui, Ganqu Cui, Pei Chu, Tao Chu, Guangran Cheng, Yu Cheng, Kai Chen, Yongkang Chen, Chiyu Chen, Guanzhou Chen, Qiaosheng Chen, Sitao Chen, Xin Chen, Haojiong Chen, Yicheng Chen, Weihan Cao, Yuhang Cao, Qinglong Cao, Lei Bai

General AI

We introduce Intern-S1-Pro, the first one-trillion-parameter scientific multimodal foundation model. Scaling to this unprecedented size, the model delivers a comprehensive enhancement across both general and scientific domains. Beyond stronger reasoning and image-text understanding capabilities, its intelligence is aug…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.0

On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models

2026-03-29 · Chongyang Zhao, Mingsong Li, Haodong Lu, Dong Gong

Research Track A · General AI

Multimodal Continual Instruction Tuning aims to continually enhance Large Vision Language Models (LVLMs) by learning from new data without forgetting previously acquired knowledge. Mixture of Experts (MoE) architectures naturally facilitate this by incrementally adding new experts and expanding routers while keeping th…

Review
pending
Role
unreviewed
Read
now
huggingface Score 16.0

Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis

2026-03-31 · Shuang Chen, Quanxin Shou, Hangting Chen, Yucheng Zhou, Kaituo Feng, Wenbo Hu, Yi-Fan Zhang, Yunlong Lin, Wenxuan Huang, Mingyang Song, Dasen Dai, Bolin Jiang, Manyuan Zhang, Shi-Xue Zhang, Zhengkai Jiang, Lucas Wang, Zhao Zhong, Yu Cheng, Nanyun Peng

General AI

Unified multimodal models provide a natural and promising architecture for understanding diverse and complex real-world knowledge while generating high-quality images. However, they still rely primarily on frozen parametric knowledge, which makes them struggle with real-world image generation involving long-tail and kn…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.0

Analytic Drift Resister for Non-Exemplar Continual Graph Learning

2026-04-03 · Lei Song, Shihan Guan, Youyong Kong

Research Track A · General AI

Non-Exemplar Continual Graph Learning (NECGL) seeks to eliminate the privacy risks intrinsic to rehearsal-based paradigms by retaining solely class-level prototype representations rather than raw graph examples for mitigating catastrophic forgetting. However, this design choice inevitably precipitates feature drift. As…

Review
pending
Role
unreviewed
Read
now
huggingface Score 16.0

LightThinker++: From Reasoning Compression to Memory Management

2026-04-04 · Yuqi Zhu, Jintian Zhang, Zhenjie Wan, Yujie Luo, Shuofei Qiao, Zhengke Gui, Da Zheng, Lei Liang, Huajun Chen, Ningyu Zhang

General AI

Large language models (LLMs) excel at complex reasoning, yet their efficiency is limited by the surging cognitive overhead of long thought traces. In this paper, we propose LightThinker, a method that enables LLMs to dynamically compress intermediate thoughts into compact semantic representations. However, static compr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.0

Is Prompt Selection Necessary for Task-Free Online Continual Learning?

2026-04-06 · Seoyoung Park, Haemin Lee, Hankook Lee

Research Track A · General AI

Task-free online continual learning has recently emerged as a realistic paradigm for addressing continual learning in dynamic, real-world environments, where data arrive in a non-stationary stream without clear task boundaries and can only be observed once. To consider such challenging scenarios, many recent approaches…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.0

Tree Learning: A Multi-Skill Continual Learning Framework for Humanoid Robots

2026-04-14 · Yifei Yan, Linqi Ye

Research Track A · General AI

As reinforcement learning for humanoid robots evolves from single-task to multi-skill paradigms, efficiently expanding new skills while avoiding catastrophic forgetting has become a key challenge in embodied intelligence. Existing approaches either rely on complex topology adjustments in Mixture-of-Experts (MoE) models…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.0

SCM: Sleep-Consolidated Memory with Algorithmic Forgetting for Large Language Models

2026-04-22 · Saish Sachin Shinde

Research Track A · General AI

We present SCM (Sleep-Consolidated Memory), a research preview of a memory architecture for large language models that draws on neuroscientific principles to address a fundamental limitation in current systems: the absence of persistent, structured, and biologically plausible memory. Existing approaches rely on truncat…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.9

Learning to Forget: Continual Learning with Adaptive Weight Decay

2026-04-29 · Aditya A. Ramesh, Alex Lewandowski, Jürgen Schmidhuber

Research Track A · General AI

Continual learning agents with finite capacity must balance acquiring new knowledge with retaining the old. This requires controlled forgetting of knowledge that is no longer needed, freeing up capacity to learn. Weight decay, viewed as a mechanism for forgetting, can serve this role by gradually discarding information…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.9

Sequential Learning and Catastrophic Forgetting in Differentiable Resistor Networks

2026-05-02 · Maniru Ibrahim

Research Track A

Differentiable physical networks provide a simple setting in which learning can be studied through the interaction between trainable parameters and physical equilibrium constraints. We investigate sequential learning in differentiable resistor networks governed by Kirchhoff's laws. Although individual input--output map…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.8

Dynamic Dual-Granularity Skill Bank for Agentic RL

2026-03-30 · Songjun Tu, Chengdong Xu, Qichao Zhang, Yaocheng Zhang, Xiangyuan Lan, Linjing Li, Dongbin Zhao

General AI

Agentic reinforcement learning (RL) can benefit substantially from reusable experience, yet existing skill-based methods mainly extract trajectory-level guidance and often lack principled mechanisms for maintaining an evolving skill memory. We propose D2Skill, a dynamic dual-granularity skill bank for agentic RL that o…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.8

The Triadic Cognitive Architecture: Bounding Autonomous Action via Spatio-Temporal and Epistemic Friction

2026-03-31 · Davide Di Gioia

General AI

Current autonomous AI agents, driven primarily by Large Language Models (LLMs), operate in a state of cognitive weightlessness: they process information without an intrinsic sense of network topology, temporal pacing, or epistemic limits. Consequently, heuristic agentic loops (e.g., ReAct) can exhibit failure modes in …

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.8

Novel Memory Forgetting Techniques for Autonomous AI Agents: Balancing Relevance and Efficiency

2026-04-02 · Payal Fofadiya, Sunil Tiwari

Research Track A · General AI

Long-horizon conversational agents require persistent memory for coherent reasoning, yet uncontrolled accumulation causes temporal decay and false memory propagation. Benchmarks such as LOCOMO and LOCCO report performance degradation from 0.455 to 0.05 across stages, while MultiWOZ shows 78.2% accuracy with 6.8% false …

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.8

Stop Wandering: Efficient Vision-Language Navigation via Metacognitive Reasoning

2026-04-02 · Xueying Li, Feng Lyu, Hao Wu, Mingliu Liu, Jia-Nan Liu, Guozi Liu

General AI

Training-free Vision-Language Navigation (VLN) agents powered by foundation models can follow instructions and explore 3D environments. However, existing approaches rely on greedy frontier selection and passive spatial memory, leading to inefficient behaviors such as local oscillation and redundant revisiting. We argue…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.8

Rethinking Model Efficiency: Multi-Agent Inference with Large Models

2026-04-06 · Sixun Dong, Juhua Hu, Steven Li, Wei Wen, Qi Qian

General AI

Most vision-language models (VLMs) apply a large language model (LLM) as the decoder, where the response tokens are generated sequentially through autoregression. Therefore, the number of output tokens can be the bottleneck of the end-to-end latency. However, different models may require vastly different numbers of out…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.8

Figures as Interfaces: Toward LLM-Native Artifacts for Scientific Discovery

2026-04-09 · Yifang Wang, Rui Sheng, Erzhuo Shao, Yifan Qian, Haotian Li, Nan Cao, Dashun Wang

General AI

Large language models (LLMs) are transforming scientific workflows, not only through their generative capabilities but also through their emerging ability to use tools, reason about data, and coordinate complex analytical tasks. Yet in most human-AI collaborations, the primary outputs, figures, are still treated as sta…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.8

CoRE: Concept-Reasoning Expansion for Continual Brain Lesion Segmentation

2026-04-28 · Qianqian Chen, Anglin Liu, Jingyang Zhang, Yudong Zhang

Research Track A · General AI

Accurate brain lesion segmentation in MRI is vital for effective clinical diagnosis and treatment planning. Due to high annotation costs and strict data privacy regulations, universal models require employing Continual Learning (CL) to adapt to evolving clinical tasks without losing previously acquired knowledge. Howev…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.8

Detecting Time Series Anomalies Like an Expert: A Multi-Agent LLM Framework with Specialized Analyzers

2026-05-07 · Hyeongwon Kang, Jeongseob Kim, Jinwoo Park, Pilsung Kang

General AI

Recent studies have explored large language models for time-series anomaly detection, yet existing approaches often rely on a single general-purpose model to directly infer anomaly indices or intervals, limiting controllability, interpretability, and reliability for complex anomaly patterns. We propose SAGE (Specialize…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.8

Patch2Vuln: Agentic Reconstruction of Vulnerabilities from Linux Distribution Binary Patches

2026-05-07 · Isaac David, Arthur Gervais

General AI

Security updates create a short but important window in which defenders and attackers can compare vulnerable and patched software. Yet in many operational settings, the most accessible artifacts are binary packages rather than source patches or advisory text. This paper asks whether a language-model agent, restricted t…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.8

Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval

2026-05-07 · Zeyu Yang, Qi Ma, Jason Chen, Anshumali Shrivastava

General AI

Retrieval-augmented agents are increasingly the interface to large organizational knowledge bases, yet most still treat retrieval as a black box: they issue exploratory queries, inspect returned snippets, and iteratively reformulate until useful evidence emerges. This approach resembles how a newcomer searches an unfam…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.5

Fast Spatial Memory with Elastic Test-Time Training

2026-04-08 · Ziqiao Ma, Xueyang Yu, Haoyu Zhen, Yuncong Yang, Joyce Chai, Chuang Gan

Research Track A · General AI

Large Chunk Test-Time Training (LaCT) has shown strong performance on long-context 3D reconstruction, but its fully plastic inference-time updates remain vulnerable to catastrophic forgetting and overfitting. As a result, LaCT is typically instantiated with a single large chunk spanning the full input sequence, falling…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.5

WebForge: Breaking the Realism-Reproducibility-Scalability Trilemma in Browser Agent Benchmark

2026-04-13 · Peng Yuan, Yuyang Yin, Yuxuan Cai, Zheng Wei

Research Track B · General AI

Existing browser agent benchmarks face a fundamental trilemma: real-website benchmarks lack reproducibility due to content drift, controlled environments sacrifice realism by omitting real-web noise, and both require costly manual curation that limits scalability. We present WebForge, the first fully automated framewor…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.5

KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance

2026-04-14 · Linhao Yu, Tianmeng Yang, Siyu Ding, Renren Jin, Naibin Gu, Xiangzhao Hao, Shuaiyi Nie, Deyi Xiong, Weichong Yin, Yu Sun, Hua Wu

General AI

RLVR improves reasoning in large language models, but its effectiveness is often limited by severe reward sparsity on hard problems. Recent hint-based RL methods mitigate sparsity by injecting partial solutions or abstract templates, yet they typically scale guidance by adding more tokens, which introduce redundancy, i…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.5

Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

2026-04-14 · NVIDIA, Aakshita Chandiramani, Aaron Blakeman, Abdullahi Olaoye, Abhibha Gupta, Abhilash Somasamudramath, Abhinav Khattar, Adeola Adesoba, Adi Renduchintala, Adil Asif, Aditya Agrawal, Aditya Vavre, Ahmad Kiswani, Aishwarya Padmakumar, Ajay Hotchandani, Akanksha Shukla, Akhiad Bercovich, Aleksander Ficek, Aleksandr Shaposhnikov, Alex Gronskiy, Alex Kondratenko, Alex Neefus, Alex Steiner, Alex Yang, Alexander Bukharin, Alexander Young, Ali Hatamizadeh, Ali Taghibakhshi, Alina Galiautdinova, Alisa Liu, Alok Kumar, Ameya Sunil Mahabaleshwarkar, Amir Klein, Amit Zuker, Amnon Geifman, Anahita Bhiwandiwalla, Ananth Subramaniam, Andrew Tao, Anjaney Shrivastava, Anjulie Agrusa, Ankur Srivastava, Ankur Verma, Ann Guan, Anna Shors, Annamalai Chockalingam, Anubhav Mandarwal, Aparnaa Ramani, Arham Mehta, Arti Jain, Arun Venkatesan, Asha Anoosheh, Ashwath Aithal, Ashwin Poojary, Asif Ahamed, Asit Mishra, Asli Sabanci Demiroz, Asma Kuriparambil Thekkumpate, Atefeh Sohrabizadeh, Avinash Kaur, Ayush Dattagupta, Barath Subramaniam Anandan, Bardiya Sadeghi, Barnaby Simkin, Ben Lanir, Benedikt Schifferer, Benjamin Chislett, Besmira Nushi, Bilal Kartal, Bill Thiede, Bita Darvish Rouhani, Bobby Chen, Boris Ginsburg, Brandon Norick, Branislav Kisacanin, Brian Yu, Bryan Catanzaro, Buvaneswari Mani, Carlo del Mundo, Chankyu Lee, Chanran Kim, Chantal Hwang, Chao Ni, Charles Wang, Charlie Truong, Cheng-Ping Hsieh, Chenhan Yu, Chenjie Luo, Cherie Wang, Chetan Mungekar, Chintan Patel, Chris Alexiuk, Chris Holguin, Chris Wing, Christian Munley, Christopher Parisien, Chuck Desai, Chunyang Sheng, Collin Neale, Cyril Meurillon, Dakshi Kumar, Dan Gil, Dan Su, Dane Corneil, Daniel Afrimi, Daniel Burkhardt Eliuth Triana, Daniel Egert, Daniel Fatade, Daniel Lo, Daniel Rohrer, Daniel Serebrenik, Daniil Sorokin, Daria Gitman, Daria Levy, Darko Stosic, David Edelsohn, David Messina, David Mosallanezhad, David Tamok, Deena Donia, Deepak Narayanan, Devin O'Kelly, Dheeraj Peri, Dhruv Nathawani, Di Wu, Dima Rekesh, Dina Yared, Divyanshu Kakwani, Dmitry Konyagin Brandon Tuttle, Dong Ahn, Dongfu Jiang, Dorrin Poorkay, Douglas O'Flaherty, Duncan Riach, Dusan Stosic, Dustin Van Stee, Edgar Minasyan, Edward Lin, Eileen Peters Long, Elad Segal, Elena Lantz, Elena Lewis, Ellie Evans, Elliott Ning, Eric Chung, Eric Harper, Eric Pham-Hung, Eric W. Tramel, Erick Galinkin, Erik Pounds, Esti Etrog, Evan Briones, Evan Wu, Evelina Bakhturina, Evgeny Tsykunov, Ewa Dobrowolska, Farshad Saberi Movahed, Farzan Memarian, Fay Wang, Fei Jia, Felipe Soares, Felipe Vieira Frujeri, Feng Chen, Fengguang Lin, Ferenc Galko, Fortuna Zhang, Frankie Siino, Frida Hou, Gantavya Bhatt, Gargi Prasad, Geethapriya Venkataramani, Geetika Gupta, George Armstrong, Gerald Shen, Giulio Borghesi, Gordana Neskovic, Gorkem Batmaz, Grace Lam, Grace Wu, Greg Pauloski, Greyson Davis, Grigor Nalbandyan, Guoming Zhang, Guy Farber, Guyue Huang, Haifeng Qian, Haran Kumar Shiv Kumar, Harry Kim, Harsh Sharma, Hayate Iso, Hayley Ross, Herbert Hum, Herman Sahota, Hexin Wang, Himanshu Soni, Hiren Upadhyay, Huy Nguyen, Iain Cunningham, Ido Galil, Ido Shahaf, Igino Padovani, Igor Gitman, Igor Shovkun, Ikroop Dhillon, Ilya Loshchilov, Ingrid Kelly, Itamar Schen, Itay Levy, Ivan Moshkov, Izik Golan, Izzy Putterman, Jain Tu, Jan Baczek, Jan Kautz, Jane Polak Scowcroft, Janica Rosenberg, Jared Casper, Jarrod Pflum, Jason Grant, Jason Sewall, Jatin Mitra, Jeffrey Glick, Jenny Chen, Jesse Oliver, Jiacheng Xu, Jiafan Zhu, Jialin Song, Jian Zhang, Jiaqi Zeng, Jie Lou, Jill Milton, Jim Chow, Jimmy Zhang, Jinhang Choi, Jining Huang, Jocelyn Huang, Joel Caruso, Joey Conway, Joey Guman, Johan Jatko, John Kamalu, Johnny Greco, Jonathan Cohen, Jonathan Raiman, Joseph Jennings, Joyjit Daw, Juan Yu, Julio Tapia, Junkeun Yi, Jupinder Parmar, Jyothi Achar, Kari Briski, Kartik Mattoo, Katherine Cheung, Katherine Luna, Keith Wyss, Kevin Shih, Kezhi Kong, Khanh Nguyen, Khushi Bhardwaj, Kirill Buryak, Kirthi Shankar Sivamani, Konstantinos Krommydas, Kris Murphy, Krishna C. Puvvada, Krzysztof Pawelec, Kumar Anik, Laikh Tewari, Laya Sleiman, Leo Du, Leon Derczynski, Li Ding, Lilach Ilan, Lingjie Wu, Lizzie Wei, Luis Vega, Lun Su, Maarten Van Segbroeck, Maer Rodrigues de Melo, Magaret Zhang, Mahan Fathi, Makesh Narsimhan Sreedhar, Makesh Sreedhar, Makesh Tarun Chandran, Manuel Reyes Gomez, Maor Ashkenazi, Marc Cuevas, Marc Romeijn, Margaret Zhang, Mark Cai, Mark Gabel, Markus Kliegl, Martyna Patelka, Maryam Moosaei, Matthew Varacalli, Matvei Novikov, Mauricio Ferrato, Mehrzad Samadi, Melissa Corpuz, Meng Xin, Mengdi Wang, Mengru Wang, Meredith Price, Micah Schaffer, Michael Andersch, Michael Boone, Michael Evans, Michael Z Wang, Miguel Martinez, Mikail Khona, Mike Chrzanowski, Mike Hollinger, Mingyuan Ma, Minseok Lee, Mohammad Dabbah, Mohammad Shoeybi, Mostofa Patwary, Nabin Mulepati, Nader Khalil, Najeeb Nabwani, Nancy Agarwal, Nanthini Balasubramaniam, Narimane Hennouni, Narsi Kodukula, Natalie Hereth, Nathaniel Pinckney, Nave Assaf, Negar Habibi, Nestor Qin, Neta Zmora, Netanel Haber, Nick Reamaroon, Nickson Quak, Nidhi Bhatia, Nikhil Jukar, Nikki Pope, Nikolai Ludwig, Nima Tajbakhsh, Nir Ailon, Nirmal Juluru, Nirmalya De, Nowel Pitt, Oleg Rybakov, Oleksii Hrinchuk, Oleksii Kuchaiev, Olivier Delalleau, Oluwatobi Olabiyi, Omer Ullman Argov, Omri Almog, Omri Puny, Oren Tropp, Otavio Padovani, Ouye Xie, Parth Chadha, Pasha Shamis, Paul Gibbons, Pavlo Molchanov, Peter Belcak, Peter Jin, Pinky Xu, Piotr Januszewski, Pooya Jannaty, Prachi Shevate, Pradeep Thalasta, Pranav Prashant Thombre, Prasoon Varshney, Prerana Gambhir, Pritam Gundecha, Przemek Tredak, Qing Miao, Qiyu Wan, Quan Tran Minh, Rabeeh Karimi Mahabadi, Rachel Oberman, Rachit Garg, Rahul Kandu, Raina Zhong, Ran El-Yaniv, Ran Zilberstein, Rasoul Shafipour, Renee Yao, Renjie Pi, Richard Mazzarese, Richard Wang, Rick Izzo, Ridhima Singla, Rima Shahbazyan, Rishabh Garg, Ritika Borkar, Ritu Gala, Riyad Islam, Robert Clark, Robert Hesse, Roger Waleffe, Rohit Varma Kalidindi, Rohit Watve, Roi Koren, Ron Fan, Ruchika Kharwar, Ruisi Cai, Ruoxi Zhang, Russell J. Hewett, Ryan Prenger, Ryan Timbrook, Ryota Egashira, Sadegh Mahdavi, Sagar Singh Ashutosh Joshi, Sahil Modi, Samuel Kriman, Sandeep Pombra, Sanjay Kariyappa, Sanjeev Satheesh, Santiago Pombo, Saori Kaji, Satish Pasumarthi, Saurav Mishra, Saurav Muralidharan, Scott Hara, Sean Narenthiran, Sebastian Rogawski, Seonjin Na, Seonmyeong Bak, Sepehr Sameni, Seth Poulos, Shahar Mor, Shantanu Acharya, Shaona Ghosh Adam Lord, Sharath Turuvekere Sreenivas, Shaun Kotek, Shaya Gharghabi, Shelby Thomas, Sheng-Chieh Lin, Shibani Likhite, Shiqing Fan, Shiyang Chen, Shreya Gopal, Shrimai Prabhumoye, Shubham Pachori, Shubham Toshniwal, Shuo Zhang, Shuoyang Ding, Shyam Renjith, Shyamala Prayaga, Siddhartha Jain, Simeng Sun, Sirisha Rella, Sirshak Das, Smita Ithape, Sneha Harishchandra S, Somshubra Majumdar, Soumye Singhal, Sri Harsha Singudasu, Sriharsha Niverty, Stas Sergienko, Stefana Gloginic, Stefania Alborghetti, Stephen Ge, Stephen McCullough, Sugam Dipak Devare, Suguna Varshini Velury, Sukrit Rao, Sumeet Kumar Barua, Sunny Gai, Suseella Panguluri, Sushil Koundinyan, Swathi Patnam, Sweta Priyadarshi, Swetha Bhendigeri, Syeda Nahida Akter, Sylendran Arunagiri, Tailling Yuan, Talor Abramovich, Tan Bui, Tan Yu, Terry Kong, Thanh Do, Thomas Gburek, Thorgane Marques, Tiffany Moore, Tijmen Blankevoort, Tim Moon, Timothy Ma, Tiyasa Mitra, Tomasz Grzegorzek, Tomer Asida, Tomer Bar Natan, Tomer Keren, Tomer Ronen, Traian Rebedea, Trenton Starkey, Tugrul Konuk, Twinkle Vashishth, Tyler Condensa, Udi Karpas, Ushnish De, Vahid Noorozi, Vahid Noroozi, Vanshil Atul Shah, Veena Vaidyanathan, Venkat Srinivasan, Venmugil Elango, Victor Cui, Vijay Korthikanti, Vikas Mehta, Virginia Adams, Virginia Wu, Vitaly Kurin, Vitaly Lavrukhin, Vladimir Anisimov, Wan Seo, Wanli Jiang, Wasi Uddin Ahmad, Wei Du, Wei Ping, Wei-Ming Chen, Wendy Quan, Wenliang Dai, Wenwen Gao, Will Jennings, William Zhang, Xiaowei Ren, Xiaowen Xin, Xin Li, Yang Yu, Yangyi Chen, Yaniv Galron, Yashaswi Karnati, Yejin Choi, Yev Meyer, Yi-Fu Wu, Yian Zhang, Ying Lin, Yonatan Geifman, Yonggan Fu, Yoshi Suhara, Youngeun Kwon, Yuan Zhang, Yuki Huang, Zach Moshe, Zhilin Wang, Zhiyu Cheng, Zhongbo Zhu, Zhuolin Yang, Zihan Liu, Zijia Chen, Zijie Yan, Zuhair Ahmed

General AI

We describe the pre-training, post-training, and quantization of Nemotron 3 Super, a 120 billion (active 12 billion) parameter hybrid Mamba-Attention Mixture-of-Experts model. Nemotron 3 Super is the first model in the Nemotron 3 family to 1) be pre-trained in NVFP4, 2) leverage LatentMoE, a new Mixture-of-Experts arch…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.5

ReConText3D: Replay-based Continual Text-to-3D Generation

2026-04-15 · Muhammad Ahmed Ullah Khan, Muhammad Haris Bin Amir, Didier Stricker, Muhammad Zeshan Afzal

Research Track A · General AI

Continual learning enables models to acquire new knowledge over time while retaining previously learned capabilities. However, its application to text-to-3D generation remains unexplored. We present ReConText3D, the first framework for continual text-to-3D generation. We first demonstrate that existing text-to-3D model…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.5

LongAct: Harnessing Intrinsic Activation Patterns for Long-Context Reinforcement Learning

2026-04-16 · Bowen Ping, Zijun Chen, Tingfeng Hui, Qize Yu, Chenxuan Li, Junchi Yan, Baobao Chang

General AI

Reinforcement Learning (RL) has emerged as a critical driver for enhancing the reasoning capabilities of Large Language Models (LLMs). While recent advancements have focused on reward engineering or data synthesis, few studies exploit the model's intrinsic representation characteristics to guide the training process. I…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.5

AI scientists produce results without reasoning scientifically

2026-04-20 · Martiño Ríos-García, Nawaf Alampara, Chandan Gupta, Indrajeet Mandal, Sajid Mannan, Ali Asghar Aghajani, N. M. Anoop Krishnan, Kevin Maik Jablonka

General AI

Large language model (LLM)-based systems are increasingly deployed to conduct scientific research autonomously, yet whether their reasoning adheres to the epistemic norms that make scientific inquiry self-correcting is poorly understood. Here, we evaluate LLM-based scientific agents across eight domains, spanning workf…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.5

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

2026-04-20 · Guanting Dong, Junting Lu, Junjie Huang, Wanjun Zhong, Longxiang Liu, Shijue Huang, Zhenyu Li, Yang Zhao, Xiaoshuai Song, Xiaoxi Li, Jiajie Jin, Yutao Zhu, Hanbin Wang, Fangyu Lei, Qinyu Luo, Mingyang Chen, Zehui Chen, Jiazhan Feng, Ji-Rong Wen, Zhicheng Dou

General AI

Large language models are increasingly expected to serve as general-purpose agents that interact with external, stateful tool environments. The Model Context Protocol (MCP) and broader agent skills offer a unified interface for connecting agents with scalable real-world services, but training robust agents remains limi…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.5

DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

2026-04-21 · Venus Team, Sunhao Dai, Yong Deng, Jinzhen Lin, Yusheng Song, Guoqing Wang, Xiaofeng Wu, Yuqi Zhou, Shuo Yang, Zhenzhe Ying, Zhanwei Zhang, Changhua Meng, Weiqiang Wang

General AI

Edge-scale deep research agents based on small language models are attractive for real-world deployment due to their advantages in cost, latency, and privacy. In this work, we study how to train a strong small deep research agent under limited open-data by improving both data quality and data utilization. We present DR…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.5

Discovering Agentic Safety Specifications from 1-Bit Danger Signals

2026-04-25 · Víctor Gallego

General AI

Can large language model agents discover hidden safety objectives through experience alone? We introduce EPO-Safe (Experiential Prompt Optimization for Safe Agents), a framework where an LLM iteratively generates action plans, receives sparse binary danger warnings, and evolves a natural language behavioral specificati…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.5

Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction

2026-05-12 · Zhong Guan, Yongjian Guo, Haoran Sun, Wen Huang, Shuai Di, Xiong Jun Wu, Likang Wu, Hongke Zhao

General AI

Asynchronous reinforcement learning improves rollout throughput for large language model agents by decoupling sample generation from policy optimization, but it also introduces a critical failure mode for PPO-style off-policy correction. In heterogeneous training systems, the total importance ratio should ideally be de…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.4

Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models

2026-04-29 · Xingwei Tan, Marco Valentino, Mahmud Elahi Akhter, Yuxiang Zhou, Maria Liakata, Nikolaos Aletras

General AI

Large Language Models (LLMs) are known to acquire reasoning capabilities through shared inference patterns in pre-training data, which are further elicited via Chain-of-Thought (CoT) practices. However, whether fundamental reasoning patterns, such as induction, deduction, and abduction, can be decoupled from specific p…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.4

T^2PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning

2026-05-04 · Haixin Wang, Hejie Cui, Chenwei Zhang, Xin Liu, Shuowei Jin, Shijie Geng, Xinyang Zhang, Nasser Zalmout, Zhenyu Shi, Yizhou Sun

General AI

Recent progress in multi-turn reinforcement learning (RL) has significantly improved reasoning LLMs' performances on complex interactive tasks. Despite advances in stabilization techniques such as fine-grained credit assignment and trajectory filtering, instability remains pervasive and often leads to training collapse…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.3

General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks

2026-04-13 · Junlin Liu, Shengnan An, Shuang Zhou, Dan Ma, Shixiong Luo, Ying Xie, Yuan Zhang, Wenling Yuan, Yifan Zhou, Xiaoyu Li, Ziwen Wang, Xuezhi Cao, Xunliang Cai

General AI

Contemporary large language models (LLMs) have demonstrated remarkable reasoning capabilities, particularly in specialized domains like mathematics and physics. However, their ability to generalize these reasoning skills to more general and broader contexts--often termed general reasoning--remains under-explored. Unlik…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.3

Visual Preference Optimization with Rubric Rewards

2026-04-14 · Ya-Qi Yu, Fangyu Hong, Xiangyang Qu, Hao Wang, Gaojie Wu, Qiaoyu Luo, Nuo Xu, Huixin Wang, Wuheng Xu, Yongxin Liao, Zihao Chen, Haonan Li, Ziming Li, Dezhi Peng, Minghui Liao, Jihao Wu, Haoyu Ren, Dandan Tu

General AI

The effectiveness of Direct Preference Optimization (DPO) depends on preference data that reflect the quality differences that matter in multimodal tasks. Existing pipelines often rely on off-policy perturbations or coarse outcome-based signals, which are not well suited to fine-grained visual reasoning. We propose rDP…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.3

ReviewGrounder: Improving Review Substantiveness with Rubric-Guided, Tool-Integrated Agents

2026-04-15 · Zhuofeng Li, Yi Lu, Dongfu Jiang, Haoxiang Zhang, Yuyang Bai, Chuan Li, Yu Wang, Shuiwang Ji, Jianwen Xie, Yu Zhang

Research Track A · General AI

The rapid rise in AI conference submissions has driven increasing exploration of large language models (LLMs) for peer review support. However, LLM-based reviewers often generate superficial, formulaic comments lacking substantive, evidence-grounded feedback. We attribute this to the underutilization of two key compone…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.3

CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas

2026-04-16 · Emanuel Tewolde, Xiao Zhang, David Guzman Piedrahita, Vincent Conitzer, Zhijing Jin

General AI

It is increasingly important that LLM agents interact effectively and safely with other goal-pursuing agents, yet, recent works report the opposite trend: LLMs with stronger reasoning capabilities behave _less_ cooperatively in mixed-motive games such as the prisoner's dilemma and public goods settings. Indeed, our exp…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.3

Learning to Think Like a Cartoon Captionist: Incongruity-Resolution Supervision for Multimodal Humor Understanding

2026-04-16 · Hatice Merve Vural, Doga Kukul, Ege Erdem Ozlu, Demir Ekin Arikan, Bob Mankoff, Erkut Erdem, Aykut Erdem

General AI

Humor is one of the few cognitive tasks where getting the reasoning right matters as much as getting the answer right. While recent work evaluates humor understanding on benchmarks such as the New Yorker Cartoon Caption Contest (NYCC), it largely treats it as black-box prediction, overlooking the structured reasoning p…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.3

Do Vision-Language Models Truly Perform Vision Reasoning? A Rigorous Study of the Modality Gap

2026-04-17 · Yige Xu, Yongjie Wang, Zizhuo Wu, Kaisong Song, Jun Lin, Zhiqi Shen

General AI

Reasoning in vision-language models (VLMs) has recently attracted significant attention due to its broad applicability across diverse downstream tasks. However, it remains unclear whether the superior performance of VLMs stems from genuine vision-grounded reasoning or relies predominantly on the reasoning capabilities …

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.3

From Benchmarking to Reasoning: A Dual-Aspect, Large-Scale Evaluation of LLMs on Vietnamese Legal Text

2026-04-17 · Van-Truong Le

General AI

The complexity of Vietnam's legal texts presents a significant barrier to public access to justice. While Large Language Models offer a promising solution for legal text simplification, evaluating their true capabilities requires a multifaceted approach that goes beyond surface-level metrics. This paper introduces a co…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.3

Information Router for Mitigating Modality Dominance in Vision-Language Models

2026-04-17 · Seulgi Kim, Mohit Prabhushankar, Ghassan AlRegib

General AI

Vision Language models (VLMs) have demonstrated strong performance across a wide range of benchmarks, yet they often suffer from modality dominance, where predictions rely disproportionately on a single modality. Prior approaches primarily address this issue by steering model's attention allocation, implicitly assuming…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.3

Long-Term Memory for VLA-based Agents in Open-World Task Execution

2026-04-17 · Xu Huang, Weixin Mao, Yinhao Li, Hua Chen, Jiabao Zhao

General AI

Vision-Language-Action (VLA) models have demonstrated significant potential for embodied decision-making; however, their application in complex chemical laboratory automation remains restricted by limited long-horizon reasoning and the absence of persistent experience accumulation. Existing frameworks typically treat p…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.3

OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning

2026-04-20 · Xinyu Ma, Mingzhou Xu, Xuebo Liu, Chang Jin, Qiang Wang, Derek F. Wong, Min Zhang

General AI

Recent advancements in Reinforcement Learning with Verifiable Rewards (RLVR) have significantly improved Large Language Model (LLM) reasoning, yet models often struggle to explore novel trajectories beyond their initial latent space. While offline teacher guidance and entropy-driven strategies have been proposed to add…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.3

Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language

2026-04-21 · Yi Zhong, Buqiang Xu, Yijun Wang, Zifei Shan, Shuofei Qiao, Guozhou Zheng, Ningyu Zhang

General AI

At present, executable visual workflows have emerged as a mainstream paradigm in real-world industrial deployments, offering strong reliability and controllability. However, in current practice, such workflows are almost entirely constructed through manual engineering: developers must carefully design workflows, write …

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.3

AgentSOC: A Multi-Layer Agentic AI Framework for Security Operations Automation

2026-04-22 · Joyjit Roy, Samaresh Kumar Singh

General AI

Security Operations Centers (SOCs) increasingly encounter difficulties in correlating heterogeneous alerts, interpreting multi-stage attack progressions, and selecting safe and effective response actions. This study introduces AgentSOC, a multi-layered agentic AI framework that enhances SOC automation by integrating pe…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.3

Benchmarking Source-Sensitive Reasoning in Turkish: Humans and LLMs under Evidential Trust Manipulation

2026-04-27 · Sercan Karakaş, Yusuf Şimşek

General AI

This paper investigates whether source trustworthiness shapes Turkish evidential morphology and whether large language models (LLMs) track this sensitivity. We study the past-domain contrast between -DI and -mIs in controlled cloze contexts where the information source is overtly external, while only its perceived reli…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.3

Green Shielding: A User-Centric Approach Towards Trustworthy AI

2026-04-27 · Aaron J. Li, Nicolas Sanchez, Hao Huang, Ruijiang Dong, Jaskaran Bains, Katrin Jaradeh, Zhen Xiang, Bo Li, Feng Liu, Aaron Kornblith, Bin Yu

General AI

Large language models (LLMs) are increasingly deployed, yet their outputs can be highly sensitive to routine, non-adversarial variation in how users phrase queries, a gap not well addressed by existing red-teaming efforts. We propose Green Shielding, a user-centric agenda for building evidence-backed deployment guidanc…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.3

The Chameleon's Limit: Investigating Persona Collapse and Homogenization in Large Language Models

2026-04-27 · Yunze Xiao, Vivienne J. Zhang, Chenghao Yang, Ningshan Ma, Weihao Xuan, Jen-tse Huang

General AI

Applications based on large language models (LLMs), such as multi-agent simulations, require population diversity among agents. We identify a pervasive failure mode we term \emph{Persona Collapse}: agents each assigned a distinct profile nonetheless converge into a narrow behavioral mode, producing a homogeneous simula…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.3

G$^2$TR: Generation-Guided Visual Token Reduction for Separate-Encoder Unified Multimodal Models

2026-05-12 · Junxian Li, Kai Liu, Zizhong Ding, Zhixin Wang, Zhikai Chen, Renjing Pei, Yulun Zhang

General AI

The development of separate-encoder Unified multimodal models (UMMs) comes with a rapidly growing inference cost due to dense visual token processing. In this paper, we focus on understanding-side visual token reduction for improving the efficiency of separate-encoder UMMs. While this topic has been widely studied for …

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.3

OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation

2026-05-12 · Guohui Zhang, XiaoXiao Ma, Jie Huang, Hang Xu, Hu Yu, Siming Fu, Yuming Li, Zeyue Xue, Lin Song, Haoyang Huang, Nan Duan, Feng Zhao

General AI

Recent advances in joint audio-video generation have been remarkable, yet real-world applications demand strong per-modality fidelity, cross-modal alignment, and fine-grained synchronization. Reinforcement Learning (RL) offers a promising paradigm, but its extension to multi-objective and multi-modal joint audio-video …

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.2

Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations

2026-04-29 · Bochao Liu, Zhipeng Qian, Yang Zhao, Xinyuan Jiang, Zihan Liang, Yufei Ma, Junpeng Zhuang, Ben Chen, Shuo Yang, Hongen Wan, Yao Wu, Chenyi Lei, Xiao Liang

General AI

Operating and maintaining (O&M) large-scale online engine systems (search, recommendation, advertising) demands substantial human effort for release monitoring, alert response, and root cause analysis. While LLM-based agents are a natural fit for these tasks, the deployment bottleneck is not reasoning capability but or…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.2

Factorized Latent Reasoning for LLM-based Recommendation

2026-04-29 · Tianqi Gao, Chengkai Huang, Zihan Wang, Cao Liu, Ke Zeng, Lina Yao

General AI

Large language models (LLMs) have recently been adopted for recommendation by framing user preference modeling as a language generation problem. However, existing latent reasoning approaches typically represent user intent with a single latent vector, which struggles to capture the inherently multi-faceted nature of us…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.2

State Beyond Appearance: Diagnosing and Improving State Consistency in Dial-Based Measurement Reading

2026-04-29 · Yuanze Hu, Gen Li, Yuqin Lan, Qingchen Yu, Zhichao Yang, Junwei Jing, Zhaoxin Fan, Xiaotie Deng

General AI

Multimodal large language models (MLLMs) have achieved impressive progress on general multimodal tasks, yet they remain brittle on dial-based measurement reading. In this paper, we study this problem through controlled benchmarks and feature-space probing, and show that current MLLMs not only achieve unsatisfactory acc…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.2

HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

2026-04-30 · Xin Zhou, Dingkang Liang, Xiwu Chen, Feiyang Tan, Dingyuan Zhang, Hengshuang Zhao, Xiang Bai

General AI

Driving world models serve as a pivotal technology for autonomous driving by simulating environmental dynamics. However, existing approaches predominantly focus on future scene generation, often overlooking comprehensive 3D scene understanding. Conversely, while Large Language Models (LLMs) demonstrate impressive reaso…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.2

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

2026-04-30 · Keming Wu, Zuhao Yang, Kaichen Zhang, Shizun Wang, Haowei Zhu, Sicong Leng, Zhongyu Yang, Qijie Wang, Sudong Wang, Ziting Wang, Zili Wang, Hui Zhang, Haonan Wang, Hang Zhou, Yifan Pu, Xingxuan Li, Fangneng Zhan, Bo Li, Lidong Bing, Yuxin Song, Ziwei Liu, Wenhu Chen, Jingdong Wang, Xinchao Wang, Xiaojuan Qi, Shijian Lu, Bin Wang

General AI

Recent visual generation models have made major progress in photorealism, typography, instruction following, and interactive editing, yet they still struggle with spatial reasoning, persistent state, long-horizon consistency, and causal understanding. We argue that the field should move beyond appearance synthesis towa…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.2

What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Legible Evaluation Design

2026-04-30 · Ivan Bercovich

General AI

Terminal-agent benchmarks have become a primary signal for measuring the coding and system-administration capabilities of large language models. As the market for evaluation environments grows, so does the pressure to ship tasks quickly, often without thorough adversarial review of the verification logic. This paper is…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.2

Self-Adaptive Multi-Agent LLM-Based Security Pattern Selection for IoT Systems

2026-05-01 · Saeid Jamshidi, Foutse Khomh, Carol Fung, Kawser Wazed Nafi

General AI

The adoption of Internet of Things (IoT) systems at the network edge of smart architectures is increasing rapidly, intensifying the need for security mechanisms that are both adaptive and resource-efficient. In such environments, runtime defence mechanisms are no longer limited to detection alone but become a resource-…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.2

Semantic Risk-Aware Heuristic Planning for Robotic Navigation in Dynamic Environments: An LLM-Inspired Approach

2026-05-04 · Hamza Ahmed Durrani, Rafay Suleman Durrani

General AI

The integration of Large Language Model (LLM) reasoning principles into classical robot path planning represents a rapidly emerging research direction. In this paper, we propose a Semantic Risk-Aware Heuristic (SRAH) planner that encodes LLM-inspired cost functions penalising geometrically cluttered or high-risk zones …

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.2

Tool Use as Action: Towards Agentic Control in Mobile Core Networks

2026-05-04 · Purna Sai Garigipati, Onur Ayan, Kishor Chandra Joshi, Xueli An

General AI

Artificial Intelligence (AI) will play an essential role in 6G. It will fundamentally reshape the network architecture itself and drive major changes in the design of network entities, interfaces, and procedures. The adoption of agentic AI in next-generation networks is expected to enhance network intelligence and auto…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.0

Lifelong Embodied Navigation Learning

2026-03-06 · Xudong Wang, Jiahua Dong, Baichen Liu, Qi Lyu, Lianqing Liu, Zhi Han

Research Track A · General AI

Embodied navigation agents powered by large language models have shown strong performance on individual tasks but struggle to continually acquire new navigation skills, which suffer from catastrophic forgetting. We formalize this challenge as lifelong embodied navigation learning (LENL), where an agent is required to a…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.0

Deconfounded Lifelong Learning for Autonomous Driving via Dynamic Knowledge Spaces

2026-03-15 · Jiayuan Du, Yuebing Song, Yiming Zhao, Xianghui Pan, Jiawei Lian, Yuchu Lu, Liuyi Wang, Chengju Liu, Qijun Chen

Research Track A · General AI

End-to-End autonomous driving (E2E-AD) systems face challenges in lifelong learning, including catastrophic forgetting, difficulty in knowledge transfer across diverse scenarios, and spurious correlations between unobservable confounders and true driving intents. To address these issues, we propose DeLL, a Deconfounded…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.0

Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes

2026-03-26 · Yuqian Fu, Haohuan Huang, Kaiwen Jiang, Yuanheng Zhu, Dongbin Zhao

General AI

On-policy distillation (OPD) is appealing for large language model (LLM) post-training because it evaluates teacher feedback on student-generated rollouts rather than fixed teacher traces. In long-horizon settings, however, the common sampled-token variant is fragile: it reduces distribution matching to a one-token sig…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.0

Towards a Medical AI Scientist

2026-03-30 · Hongtao Wu, Boyun Zheng, Dingjie Song, Yu Jiang, Jianfeng Gao, Lei Xing, Lichao Sun, Yixuan Yuan

General AI

Autonomous systems that generate scientific hypotheses, conduct experiments, and draft manuscripts have recently emerged as a promising paradigm for accelerating discovery. However, existing AI Scientists remain largely domain-agnostic, limiting their applicability to clinical medicine, where research is required to be…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.0

DriveDreamer-Policy: A Geometry-Grounded World-Action Model for Unified Generation and Planning

2026-04-02 · Yang Zhou, Xiaofeng Wang, Hao Shao, Letian Wang, Guosheng Zhao, Jiangnan Shao, Jiagang Zhu, Tingdong Yu, Zheng Zhu, Guan Huang, Steven L. Waslander

General AI

Recently, world-action models (WAM) have emerged to bridge vision-language-action (VLA) models and world models, unifying their reasoning and instruction-following capabilities and spatio-temporal world modeling. However, existing WAM approaches often focus on modeling 2D appearance or latent representations, with limi…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.0

ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement

2026-04-02 · Difan Jiao, Qianfeng Wen, Blair Yang, Zhenwei Tang, Ashton Anderson

General AI

We introduce ThinkTwice, a simple two-phase framework that jointly optimizes LLMs to solve reasoning problems and refine the answers, based on Group Relative Policy Optimization (GRPO). In each pair of training steps, ThinkTwice first optimizes the model on solving reasoning problems, then optimizes it on refining its …

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.0

A Faster Path to Continual Learning

2026-04-13 · Wei Li, Hangjie Yuan, Zixiang Zhao, Borui Kang, Ziwei Liu, Tao Feng

Research Track A

Continual Learning (CL) aims to train neural networks on a dynamic stream of tasks without forgetting previously learned knowledge. Among optimization-based approaches, C-Flat has emerged as a promising solution due to its plug-and-play nature and its ability to encourage uniformly low-loss regions for both new and old…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.0

Continual Learning for fMRI-Based Brain Disorder Diagnosis via Functional Connectivity Matrices Generative Replay

2026-04-15 · Qianyu Chen, Shujian Yu

Research Track A

Functional magnetic resonance imaging (fMRI) is widely used for studying and diagnosing brain disorders, with functional connectivity (FC) matrices providing powerful representations of large-scale neural interactions. However, existing diagnostic models are trained either on a single site or under full multi-site acce…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.0

AIM: Asymmetric Information Masking for Visual Question Answering Continual Learning

2026-04-16 · Peifeng Zhang, Zice Qiu, Donghua Yu, Shilei Cao, Juepeng Zheng, Yutong Lu, Haohuan Fu

Research Track A · General AI

In continual visual question answering (VQA), existing Continual Learning (CL) methods are mostly built for symmetric, unimodal architectures. However, modern Vision-Language Models (VLMs) violate this assumption, as their trainable components are inherently asymmetric. This structural mismatch renders VLMs highly pron…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.0

CI-CBM: Class-Incremental Concept Bottleneck Model for Interpretable Continual Learning

2026-04-16 · Amirhosein Javadi, Tuomas Oikarinen, Tara Javidi, Tsui-Wei Weng

Research Track A · General AI

Catastrophic forgetting remains a fundamental challenge in continual learning, in which models often forget previous knowledge when fine-tuned on a new task. This issue is especially pronounced in class incremental learning (CIL), which is the most challenging setting in continual learning. Existing methods to address …

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.0

Tree of Concepts: Interpretable Continual Learners in Non-Stationary Clinical Domains

2026-04-18 · Dongkyu Cho, Xiyue Li, Samrachana Adhikari, Rumi Chunara

Research Track A · General AI

Continual learning aims to update models under distribution shift without forgetting, yet many high-stakes deployments, such as healthcare, also require interpretability. In practice, models that adapt well (e.g., deep networks) are often opaque, while models that are interpretable (e.g., decision trees) are brittle un…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.0

Lifecycle-Aware Federated Continual Learning in Mobile Autonomous Systems

2026-04-22 · Beining Wu, Jun Huang

Research Track A

Federated continual learning (FCL) allows distributed autonomous fleets to adapt collaboratively to evolving terrain types across extended mission lifecycles. However, current approaches face several key challenges: 1) they use uniform protection strategies that do not account for the varying sensitivities to forgettin…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.0

Temporally Extended Mixture-of-Experts Models

2026-04-22 · Zeyu Shen, Peter Henderson

Research Track A · General AI

Mixture-of-Experts models, now popular for scaling capacity at fixed inference speed, switch experts at nearly every token. Once a model outgrows available GPU memory, this churn can render optimizations like offloading and pre-fetching ineffective. We make the case that the options framework in reinforcement learning …

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.0

Fine-Tuning Regimes Define Distinct Continual Learning Problems

2026-04-23 · Paul-Tiberiu Iordache, Elena Burceanu

Research Track A · General AI

Continual learning (CL) studies how models acquire tasks sequentially while retaining previously learned knowledge. Despite substantial progress in benchmarking CL methods, comparative evaluations typically keep the fine-tuning regime fixed. In this paper, we argue that the fine-tuning regime, defined by the trainable …

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.0

OceanPile: A Large-Scale Multimodal Ocean Corpus for Foundation Models

2026-04-25 · Yida Xue, Ningyu Zhang, Tingwei Wu, Zhe Ma, Daxiong Ji, Zhao Wang, Guozhou Zheng, Huajun Chen

General AI

The vast and underexplored ocean plays a critical role in regulating global climate and supporting marine biodiversity, yet artificial intelligence has so far delivered limited impact in this domain due to a fundamental data bottleneck. Specifically, ocean data are highly fragmented across disparate sources and inheren…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.0

Co-Director: Agentic Generative Video Storytelling

2026-04-27 · Yale Song, Yiwen Song, Nick Losier, Nathan Hodson, Ye Jin, Rhyard Zhu, Yan Xu, Daniel Vlasic, Carina Claassen, Jasmine Leon, Khanh G. LeViet, Zack Chomyn, Joe Timmons, Brett Slatkin, Scott Penberthy, Tomas Pfister

General AI

While diffusion models generate high-fidelity video clips, transforming them into coherent storytelling engines remains challenging. Current agentic pipelines automate this via chained modules but suffer from semantic drift and cascading failures due to independent, handcrafted prompting. We present Co-Director, a hier…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.0

From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills

2026-04-27 · Qiliang Liang, Hansi Wang, Zhong Liang, Yang Liu

General AI

LLM agents increasingly rely on reusable skills, capability packages that combine instructions, control flow, constraints, and tool calls. In most current agent systems, however, skills are still represented by text-heavy artifacts, including SKILL.md-style documents and structured records whose machine-usable evidence…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.0

Leveraging LLMs for Multi-File DSL Code Generation: An Industrial Case Study

2026-04-27 · Sivajeet Chand, Kevin Nguyen, Peter Kuntz, Alexander Pretschner

Research Track A · General AI

Large language models (LLMs) perform strongly on general-purpose code generation, yet their applicability to enterprise domain-specific languages (DSLs) remains underexplored, especially for repository-scale change generation spanning multiple files and folder structures from a single natural-language (NL) instruction.…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.0

Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora

2026-04-27 · Chenkai Pan, Xinglong Xu, Yuhang Xu, Yujun Wu, Siyuan Li, Jintao Chen, Conghui He, Jingxuan Wei, Cheng Tan

General AI

Reliably transferring specialized human knowledge from text into large language models remains a fundamental challenge in artificial intelligence. Fine-tuning on domain corpora has enabled substantial capability gains, but the process operates without feedback: when a model fails on a domain task, there is no method to…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.0

TCOD: Exploring Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents

2026-04-27 · Jiaqi Wang, Wenhao Zhang, Weijie Shi, Yaliang Li, James Cheng

General AI

On-policy distillation (OPD) has shown strong potential for transferring reasoning ability from frontier or domain-specific models to smaller students. While effective on static single-turn tasks, its behavior in multi-turn agent settings remains underexplored. In this work, we identify a key limitation of vanilla OPD …

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.0

AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery

2026-04-28 · Lei Xiong, Kun Luo, Ziyi Xia, Wenbo Zhang, Jin-Ge Yao, Zheng Liu, Jingying Shao, Jianlyu Chen, Hongjin Qian, Xi Yang, Qian Yu, Hao Li, Chen Yue, Xiaan Du, Yuyang Wang, Yesheng Liu, Haiyu Xu, Zhicheng Dou

General AI

Autonomous scientific research is significantly advanced thanks to the development of AI agents. One key step in this process is finding the right scientific literature, whether to explore existing knowledge for a research problem, or to acquire evidence for verifying assumptions and supporting claims. To assess AI age…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.0

MIST: Reliable Streaming Decision Trees for Online Class-Incremental Learning via McDiarmid Bound

2026-05-12 · Phu-Hoa Pham, Chi-Nguyen Tran, Nguyen Lam Phu Quy, Dao Sy Duy Minh, Huynh Trung Kiet, Long Tran-Thanh

Research Track A · General AI

Streaming decision trees are natural candidates for open-world continual learning, as they perform local updates, enjoy bounded memory, and static decision boundaries. Despite these, they still fail in online class-incremental learning due to two coupled miscalibrations: (i) their split criterion grows unreliable as th…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

Moving Beyond Review: Applying Language Models to Planning and Translation in Reflection

2026-03-30 · Seyed Parsa Neshaei, Richard Lee Davis, Tanja Käser

General AI

Reflective writing is known to support the development of students' metacognitive skills, yet learners often struggle to engage in deep reflection, limiting learning gains. Although large language models (LLMs) have been shown to improve writing skills, their use as conversational agents for reflective writing has prod…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

Unsafe2Safe: Controllable Image Anonymization for Downstream Utility

2026-03-30 · Mih Dinh, SouYoung Jin

General AI

Large-scale image datasets frequently contain identifiable or sensitive content, raising privacy risks when training models that may memorize and leak such information. We present Unsafe2Safe, a fully automated pipeline that detects privacy-prone images and rewrites only their sensitive regions using multimodally guide…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks

2026-03-31 · Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa, Kai Greshake, Hanshen Xiao, Chaowei Xiao, G. Edward Suh

General AI

AI agents, predominantly powered by large language models (LLMs), are vulnerable to indirect prompt injection, in which malicious instructions embedded in untrusted data can trigger dangerous agent actions. This position paper discusses our vision for system-level defenses against indirect prompt injection attacks. We …

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

ActionParty: Multi-Subject Action Binding in Generative Video Games

2026-04-02 · Alexander Pondaven, Ziyi Wu, Igor Gilitschenski, Philip Torr, Sergey Tulyakov, Fabio Pizzati, Aliaksandr Siarohin

General AI

Recent advances in video diffusion have enabled the development of "world models" capable of simulating interactive environments. However, these models are largely restricted to single-agent settings, failing to control multiple agents simultaneously in a scene. In this work, we tackle a fundamental issue of action bin…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

CLEAR: Unlocking Generative Potential for Degraded Image Understanding in Unified Multimodal Models

2026-04-06 · Xiangzhao Hao, Zefeng Zhang, Zhenyu Zhang, Linhao Yu, Yao Chen, Yiqian Zhang, Haiyun Guo, Shuohuan Wang, Yu Sun

General AI

Image degradation from blur, noise, compression, and poor illumination severely undermines multimodal understanding in real-world settings. Unified multimodal models that combine understanding and generation within a single architecture are a natural fit for this challenge, as their generative pathway can model the fin…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

Synthetic Sandbox for Training Machine Learning Engineering Agents

2026-04-06 · Yuhang Zhou, Lizhu Zhang, Yifan Wu, Jiayi Liu, Xiangjun Fan, Zhuokai Zhao, Hong Yan

General AI

As large language model agents advance beyond software engineering (SWE) tasks toward machine learning engineering (MLE), verifying agent behavior becomes orders of magnitude more expensive: while SWE tasks can be verified via fast-executing unit tests, MLE verification requires running full ML pipelines -- data prepro…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

Vero: An Open RL Recipe for General Visual Reasoning

2026-04-06 · Gabriel Sarch, Linrong Cai, Qunzhong Wang, Haoyang Wu, Danqi Chen, Zhuang Liu

General AI

What does it take to build a visual reasoner that works across charts, science, spatial understanding, and open-ended tasks? The strongest vision-language models (VLMs) show such broad visual reasoning is within reach, but the recipe behind them remains unclear, locked behind proprietary reinforcement learning (RL) pip…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

"I See What You Did There": Can Large Vision-Language Models Understand Multimodal Puns?

2026-04-07 · Naen Xu, Jiayi Sheng, Changjiang Li, Chunyi Zhou, Yuyuan Li, Tianyu Du, Jun Wang, Zhihui Fu, Jinbao Li, Shouling Ji

General AI

Puns are a common form of rhetorical wordplay that exploits polysemy and phonetic similarity to create humor. In multimodal puns, visual and textual elements synergize to ground the literal sense and evoke the figurative meaning simultaneously. Although Vision-Language Models (VLMs) are widely used in multimodal unders…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

From Hallucination to Structure Snowballing: The Alignment Tax of Constrained Decoding in LLM Reflection

2026-04-07 · Hongxu Zhou

General AI

Intrinsic self-correction in Large Language Models (LLMs) frequently fails in open-ended reasoning tasks due to ``hallucination snowballing,'' a phenomenon in which models recursively justify early errors during free-text reflection. While structured feedback can mitigate this issue, existing approaches often rely on e…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

Ads in AI Chatbots? An Analysis of How Large Language Models Navigate Conflicts of Interest

2026-04-09 · Addison J. Wu, Ryan Liu, Shuyue Stella Li, Yulia Tsvetkov, Thomas L. Griffiths

General AI

Today's large language models (LLMs) are trained to align with user preferences through methods such as reinforcement learning. Yet models are beginning to be deployed not merely to satisfy users, but also to generate revenue for the companies that created them through advertisements. This creates the potential for LLM…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

MARD: A Multi-Agent Framework for Robust Android Malware Detection

2026-04-28 · Xueying Zeng, Youquan Xian, Sihao Liu, Xudong Mou, Yanze Li, Lei Cui, Bo Li

Research Track A · General AI

With the rapid evolution of Android applications, traditional machine learning-based detection models suffer from concept drift. Additionally, they are constrained by shallow features, lacking deep semantic understanding and interpretability of decisions. Although Large Language Models (LLMs) demonstrate remarkable sem…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents

2026-04-28 · Mengyao Du, Han Fang, Haokai Ma, Jiahao Chen, Kai Xu, Quanjun Yin, Ee-Chien Chang

Research Track B · General AI

Web agents have emerged as an effective paradigm for automating interactions with complex web environments, yet remain vulnerable to prompt injection attacks that embed malicious instructions into webpage content to induce unintended actions. This threat is further amplified for screenshot-based web agents, which opera…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

MINER: Mining Multimodal Internal Representation for Efficient Retrieval

2026-05-07 · Weien Li, Rui Song, Zeyu Li, Haochen Liu, Gonghao Zhang, Difan Jiao, Zhenwei Tang, Bowei He, Haolun Wu, Xue Liu, Ye Yuan

General AI

Visual document retrieval has become essential for accessing information in visually rich documents. Existing approaches fall into two camps. Late-interaction retrievers achieve strong quality through fine-grained token-level matching but store hundreds of vectors per page, incurring large index footprints and high ser…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction

2026-05-07 · Xiangyuan Xue, Yifan Zhou, Zidong Wang, Shengji Tang, Philip Torr, Wanli Ouyang, Lei Bai, Zhenfei Yin

General AI

Large language models (LLMs) are increasingly used as interactive agents, but optimizing them for long-horizon decision making remains difficult because current methods are largely purely reactive, which weakens both exploration and credit assignment over extended trajectories. In this work, we present Strategic Trajec…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

Verifier-Backed Hard Problem Generation for Mathematical Reasoning

2026-05-07 · Yuhang Lai, Jiazhan Feng, Yee Whye Teh, Ning Miao

General AI

Large Language Models (LLMs) demonstrate strong capabilities for solving scientific and mathematical problems, yet they struggle to produce valid, challenging, and novel problems - an essential component for advancing LLM training and enabling autonomous scientific research. Existing problem generation approaches eithe…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

An Executable Benchmarking Suite for Tool-Using Agents

2026-05-10 · Zhiqing Zhong, Zhijing Ye, Jiamin Wang, Xiaodong Yu

Research Track B · General AI

Closed-loop tool-using agents are increasingly evaluated in executable web, code, and micro-task environments, but benchmark reports often conflate workloads, action-generating drivers, and the evidence admitted for systems-facing claims. We present an executable benchmarking suite that makes these objects explicit und…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

Don't Click That: Teaching Web Agents to Resist Deceptive Interfaces

2026-05-10 · Yilin Zhang, Yingkai Hua, Chunyu Wei, Xin Wang, Yueguo Chen

Research Track B · General AI

Vision-language model (VLM) based web agents demonstrate impressive autonomous GUI interaction but remain vulnerable to deceptive interface elements. Existing approaches either detect deception without task integration or document attacks without proposing defenses. We formalize deception-aware web agent defense and pr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.6

Epistemic Uncertainty for Test-Time Discovery

2026-05-11 · Kainat Riaz, Muhammad Ahmed Mohsin, Ahsan Bilal, Muhammad Umer, Ayesha Mohsin, Aqib Riaz, Ali Subhan, John M. Cioffi

General AI

Automated scientific discovery using large language models relies on identifying genuinely novel solutions. Standard reinforcement learning penalizes high-variance mutations, which leads the policy to prioritize familiar patterns. As a result, the maximum reward plateaus even as the average reward increases. Overcoming…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.5

Pruned Adaptation Modules: A Simple yet Strong Baseline for Continual Foundation Models

2026-03-22 · Elif Ceren Gok Yildirim, Murat Onur Yildirim, Joaquin Vanschoren

Research Track A · General AI

The continual learning literature has rapidly shifted from traditional class incremental learning (CIL) techniques to foundation model (FM)-based CIL methods without a clear understanding of how these newer approaches compare to strong, lightweight convolutional baselines. This abrupt transition has created a substanti…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.5

Chameleons do not Forget: Prompt-Based Online Continual Learning for Next Activity Prediction

2026-04-01 · Marwan Hassani, Tamara Verbeek, Sjoerd van Straten

Research Track A

Predictive process monitoring (PPM) focuses on predicting future process trajectories, including next activity predictions. This is crucial in dynamic environments where processes change or face uncertainty. However, current frameworks often assume a static environment, overlooking dynamic characteristics and concept d…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.5

The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook

2026-04-02 · Xinlei Yu, Zhangquan Chen, Yongbo He, Tianyu Fu, Cheng Yang, Chengming Xu, Yue Ma, Xiaobin Hu, Zhe Cao, Jie Xu, Guibin Zhang, Jiale Tao, Jiayi Zhang, Siyuan Ma, Kaituo Feng, Haojie Huang, Youxing Li, Ronghao Chen, Huacan Wang, Chenglin Wu, Zikun Su, Xiaogang Xu, Kelu Yao, Kun Wang, Chen Gao, Yue Liao, Ruqi Huang, Tao Jin, Cheng Tan, Jiangning Zhang, Wenqi Ren, Yanwei Fu, Yong Liu, Yu Wang, Xiangyu Yue, Yu-Gang Jiang, Shuicheng Yan

Research Track A · General AI

Latent space is rapidly emerging as a native substrate for language-based models. While modern systems are still commonly understood through explicit token-level generation, an increasing body of work shows that many critical internal processes are more naturally carried out in continuous latent space than in human-rea…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.5

When Modalities Remember: Continual Learning for Multimodal Knowledge Graphs

2026-04-03 · Linyu Li, Zhi Jin, Yichi Zhang, Dongming Jin, Yuanpeng He, Haoran Duan, Gadeng Luosang, Nyima Tashi

Research Track A · General AI

Real-world multimodal knowledge graphs (MMKGs) are dynamic, with new entities, relations, and multimodal knowledge emerging over time. Existing continual knowledge graph reasoning (CKGR) methods focus on structural triples and cannot fully exploit multimodal signals from new entities. Existing multimodal knowledge grap…

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.5

MedConclusion: A Benchmark for Biomedical Conclusion Generation from Structured Abstracts

2026-04-07 · Weiyue Li, Ruizhi Qian, Yi Li, Yongce Li, Yunfan Long, Jiahui Cai, Yan Luo, Mengyu Wang

General AI

Large language models (LLMs) are widely explored for reasoning-intensive research tasks, yet resources for testing whether they can infer scientific conclusions from structured biomedical evidence remain limited. We introduce MedConclusion, a large-scale dataset of 5.7M PubMed structured abstracts for biomedical conclu…

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.5

Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind

2026-04-13 · Hanqi Xiao, Vaidehi Patil, Zaid Khan, Hyunji Lee, Elias Stengel-Eskin, Mohit Bansal

General AI

As large language models (LLMs) become the engine behind conversational systems, their ability to reason about the intentions and states of their dialogue partners (i.e., form and use a theory-of-mind, or ToM) becomes increasingly critical for safe interaction with potentially adversarial partners. We propose a novel p…

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.5

Chain-of-Thought Degrades Visual Spatial Reasoning Capabilities of Multimodal LLMs

2026-04-17 · Sai Srinivas Kancheti, Aditya Sanjiv Kanade, Vineeth N. Balasubramanian, Tanuja Ganu

General AI

Multimodal Reasoning Models (MRMs) leveraging Chain-of-Thought (CoT) based thinking have revolutionized mathematical and logical problem-solving. However, we show that this paradigm struggles with generalized spatial intelligence. We perform a comprehensive evaluation of seventeen models across thirteen spatial benchma…

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.5

MNAFT: modality neuron-aware fine-tuning of multimodal large language models for image translation

2026-04-18 · Bo Li, Ningyuan Deng, Tianyu Dong, Shaobo Wang, Shaolin Zhu, Lijie Wen

General AI

Multimodal large language models (MLLMs) have shown impressive capabilities, yet they often struggle to effectively capture the fine-grained textual information within images crucial for accurate image translation. This often leads to a modality gap between visual text inputs and textual inputs/outputs for image transl…

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.5

EvoMaster: A Foundational Agent Framework for Building Evolving Autonomous Scientific Agents at Scale

2026-04-19 · Xinyu Zhu, Yuzhu Cai, Zexi Liu, Cheng Wang, Fengyang Li, Wenkai Jin, Wanxu Liu, Zehao Bing, Bingyang Zheng, Jingyi Chai, Shuo Tang, Rui Ye, Yuwen Du, Xianghe Pang, Yaxin Du, Tingjia Miao, Yuzhi Zhang, Ruoxue Liao, Zhaohan Ding, Linfeng Zhang, Yanfeng Wang, Weinan E, Siheng Chen

General AI

The convergence of large language models and agents is catalyzing a new era of scientific discovery: Agentic Science. While the scientific method is inherently iterative, existing agent frameworks are predominantly static, narrowly scoped, and lack the capacity to learn from trial and error. To bridge this gap, we pres…

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.5

LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics

2026-04-19 · Yueyang Ding, HaoPeng Zhang, Rui Dai, Yi Wang, Tianyu Zong, Kaikui Liu, Xiangxiang Chu

General AI

Comprehensive understanding of time series remains a significant challenge for Large Language Models (LLMs). Current research is hindered by fragmented task definitions and benchmarks with inherent ambiguities, precluding rigorous evaluation and the development of unified Time Series Reasoning Models(TSRMs). To bridge …

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.5

Latent Preference Modeling for Cross-Session Personalized Tool Calling

2026-04-20 · Yejin Yoon, Minseo Kim, Taeuk Kim

General AI

Users often omit essential details in their requests to LLM-based agents, resulting in under-specified inputs for tool use. This poses a fundamental challenge for tool-augmented agents, as API execution typically requires complete arguments, highlighting the need for personalized tool calling. To study this problem, we…

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.5

MM-JudgeBias: A Benchmark for Evaluating Compositional Biases in MLLM-as-a-Judge

2026-04-20 · Sua Lee, Sanghee Park, Jinbae Im

General AI

Multimodal Large Language Models (MLLMs) have been increasingly used as automatic evaluators-a paradigm known as MLLM-as-a-Judge. However, their reliability and vulnerabilities to biases remain underexplored. We find that many MLLM judges fail to reliably integrate key visual or textual cues, yielding unreliable evalua…

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.5

OpenGame: Open Agentic Coding for Games

2026-04-20 · Yilei Jiang, Jinyuan Hu, Qianyin Xiao, Yaozhi Zheng, Ruize Ma, Kaituo Feng, Jiaming Han, Tianshuo Peng, Kaixuan Fan, Manyuan Zhang, Xiangyu Yue

General AI

Game development sits at the intersection of creative design and intricate software engineering, demanding the joint orchestration of game engines, real-time loops, and tightly coupled state across many files. While Large Language Models (LLMs) and code agents now solve isolated programming tasks with ease, they consis…

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.5

Trust but Verify: Introducing DAVinCI -- A Framework for Dual Attribution and Verification in Claim Inference for Language Models

2026-04-23 · Vipula Rawte, Ryan Rossi, Franck Dernoncourt, Nedim Lipka

General AI

Large Language Models (LLMs) have demonstrated remarkable fluency and versatility across a wide range of NLP tasks, yet they remain prone to factual inaccuracies and hallucinations. This limitation poses significant risks in high-stakes domains such as healthcare, law, and scientific communication, where trust and veri…

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.5

ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents

2026-04-26 · Fanqing Meng, Lingxiao Du, Zijian Wu, Guanzheng Chen, Xiangyan Liu, Jiaqi Liao, Chonghe Jiang, Zhenglin Wan, Jiawei Gu, Pengfei Zhou, Rui Huang, Ziqi Zhao, Shengyuan Ding, Ailing Yu, Bo Peng, Bowei Xia, Hao Sun, Haotian Liang, Ji Xie, Jiajun Chen, Jiajun Song, Liu Yang, Ming Xu, Qionglin Qiu, Runhao Fu, Shengfang Zhai, Shijian Wang, Tengfei Ma, Tianyi Wu, Weiyang Jin, Yan Wang, Yang Dai, Yao Lai, Youwei Shu, Yue Liu, Yunzhuo Hao, Yuwei Niu, Jinkai Huang, Jiayuan Zhuo, Zhennan Shen, Linyu Wu, Cihang Xie, Yuyin Zhou, Jiaheng Zhang, Zeyu Zheng, Mengkang Hu, Michael Qizhe Shieh

General AI

Language-model agents are increasingly used as persistent coworkers that assist users across multiple working days. During such workflows, the surrounding environment may change independently of the agent: new emails arrive, calendar entries shift, knowledge-base records are updated, and evidence appears across images,…

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.4

AcademiClaw: When Students Set Challenges for AI Agents

2026-05-04 · Junjie Yu, Pengrui Lu, Weiye Si, Hongliang Lu, Jiabao Wu, Kaiwen Tao, Kun Wang, Lingyu Yang, Qiran Zhang, Xiuting Guo, Xuanyu Wang, Yang Wang, Yanjie Wang, Yi Yang, Zijian Hu, Ziyi Yang, Zonghan Zhou, Binghao Qiang, Borui Zhang, Chenning Li, Enchang Zhang, Feifan Chen, Feng Jian, Fengyin Sun, Hao Qiu, Hao Zheng, Haoran Zhu, Hongyu Liu, Jianbin Deng, Jiaxin Song, Jiaying Chi, Jiayou Shi, Jie Fang, Jinghui Zhong, Jingyu Zhou, Jinze Li, Junfeng Yi, Junyan Yu, Junzhi Xue, Ni Song, Pengyi Chen, Qi Chen, Quansheng Li, Rui Tao, Shenghai Gong, Shenhang Lu, Tianqi Shen, Tianxiang Zhu, Tiehan Kang, Tingyu Li, Wendi Wu, Xiao Shen, Xiao Zhou, Xiaotao Zhang, Xinrong Li, Xuankun Yang, Xun Zhang, Yan Li, Ye Lu, Yi Wang, Yibo Zhou, Yichi Zhang, Yihao Sun, Yijun Huang, Yixin Zhu, Yixuan Wu, Yuchen Sun, Yue Wu, Yuheng Sun, Yukun Li, Yutian Tu, Yuxuan Qin, Yuzhuo Wu, Zeyu Li, Zhengyu Lou, Zhenning Ran, Zizhu He, Pengfei Liu

General AI

Benchmarks within the OpenClaw ecosystem have thus far evaluated exclusively assistant-level tasks, leaving the academic-level capabilities of OpenClaw largely unexamined. We introduce AcademiClaw, a bilingual benchmark of 80 complex, long-horizon tasks sourced directly from university students' real academic workflows…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.4

Automated In-the-Wild Data Collection for Continual AI Generated Image Detection

2026-05-04 · Thanasis Pantsios, Dimitrios Karageorgiou, Christos Koutlis, George Karantaidis, Olga Papadopoulou, Symeon Papadopoulos

Research Track A · General AI

The rapid advancement of generative Artificial Intelligence (AI) has introduced significant challenges for reliable AI-generated image detection. Existing detectors often suffer from performance degradation under distribution shifts and when encountering newly emerging generative models. In this work, we propose a data…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation

2026-04-13 · Yuqian Yuan, Wenqiao Zhang, Juekai Lin, Yu Zhong, Mingjian Gao, Binhe Yu, Yunqi Cao, Wentong Li, Yueting Zhuang, Beng Chin Ooi

General AI

Large Multimodal Models (LMMs) have achieved remarkable progress in general-purpose vision--language understanding, yet they remain limited in tasks requiring precise object-level grounding, fine-grained spatial reasoning, and controllable visual manipulation. In particular, existing systems often struggle to identify …

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

Agentic Discovery with Active Hypothesis Exploration for Visual Recognition

2026-04-14 · Jaywon Koo, Jefferson Hernandez, Ruozhen He, Hanjie Chen, Chen Wei, Vicente Ordonez

General AI

We introduce HypoExplore, an agentic framework that formulates neural architecture discovery for visual recognition as a hypothesis-driven scientific inquiry. Given a human-specified high-level research direction, HypoExplore ideates, implements, evaluates, and improves neural architectures through evolutionary branchi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

Don't Show Pixels, Show Cues: Unlocking Visual Tool Reasoning in Language Models via Perception Programs

2026-04-14 · Muhammad Kamran Janjua, Hugo Silva, Di Niu, Bahador Rashidi

General AI

Multimodal language models (MLLMs) are increasingly paired with vision tools (e.g., depth, flow, correspondence) to enhance visual reasoning. However, despite access to these tool-generated visual cues, MLLMs often fail to benefit from them. Existing approaches typically feed raw tool outputs into the model, but these …

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

PolicyLLM: Towards Excellent Comprehension of Public Policy for Large Language Models

2026-04-14 · Han Bao, Penghao Zhang, Yue Huang, Zhengqing Yuan, Yanchi Ru, Rui Su, Yujun Zhou, Xiangqi Wang, Kehan Guo, Nitesh V Chawla, Yanfang Ye, Xiangliang Zhang

General AI

Large Language Models (LLMs) are increasingly integrated into real-world decision-making, including in the domain of public policy. Yet, their ability to comprehend and reason about policy-related content remains underexplored. To fill this gap, we present \textbf{\textit{PolicyBench}}, the first large-scale cross-syst…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

See, Point, Refine: Multi-Turn Approach to GUI Grounding with Visual Feedback

2026-04-14 · Himangi Mittal, Gaurav Mittal, Nelson Daniel Troncoso, Yu Hu

Research Track B · General AI

Computer Use Agents (CUAs) fundamentally rely on graphical user interface (GUI) grounding to translate language instructions into executable screen actions, but editing-level grounding in dense coding interfaces, where sub-pixel accuracy is required to interact with dense IDE elements, remains underexplored. Existing a…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

Agentic Microphysics: A Manifesto for Generative AI Safety

2026-04-16 · Federico Pierucci, Matteo Prandi, Marcantonio Bracale Syrnikov, Marcello Galisai, Piercosma Bisconti

General AI

This paper advances a methodological proposal for safety research in agentic AI. As systems acquire planning, memory, tool use, persistent identity, and sustained interaction, safety can no longer be analysed primarily at the level of the isolated model. Population-level risks arise from structured interaction among ag…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

Blue Data Intelligence Layer: Streaming Data and Agents for Multi-source Multi-modal Data-Centric Applications

2026-04-16 · Moin Aminnaseri, Farima Fatahi Bayat, Nikita Bhutani, Jean-Flavien Bussotti, Kevin Chan, Rafael Li Chen, Yanlin Feng, Jackson Hassell, Estevam Hruschka, Eser Kandogan, Hannah Kim, James Levine, Seiji Maekawa, Jalal Mahmud, Kushan Mitra, Naoki Otani, Pouya Pezeshkpour, Nima Shahbazi, Chen Shen, Dan Zhang

General AI

NL2SQL systems aim to address the growing need for natural language interaction with data. However, real-world information rarely maps to a single SQL query because (1) users express queries iteratively (2) questions often span multiple data sources beyond the closed-world assumption of a single database, and (3) queri…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

Feedback-Driven Execution for LLM-Based Binary Analysis

2026-04-16 · XiangRui Zhang, Qiang Li, Haining Wang

General AI

Binary analysis increasingly relies on large language models (LLMs) to perform semantic reasoning over complex program behaviors. However, existing approaches largely adopt a one-pass execution paradigm, where reasoning operates over a fixed program representation constructed by static analysis tools. This formulation …

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

QuantCode-Bench: A Benchmark for Evaluating the Ability of Large Language Models to Generate Executable Algorithmic Trading Strategies

2026-04-16 · Alexey Khoroshilov, Alexey Chernysh, Orkhan Ekhtibarov, Nini Kamkia, Dmitry Zmitrovich

General AI

Large language models have demonstrated strong performance on general-purpose programming tasks, yet their ability to generate executable algorithmic trading strategies remains underexplored. Unlike standard code benchmarks, trading-strategy generation requires simultaneous mastery of domain-specific financial logic, k…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

ChemGraph-XANES: An Agentic Framework for XANES Simulation and Analysis

2026-04-17 · Vitor F. Grizzi, Thang Duc Pham, Luke N. Pretzie, Jiayi Xu, Murat Keceli, Cong Liu

General AI

Computational X-ray absorption near-edge structure (XANES) is widely used to probe local coordination environments, oxidation states, and electronic structure in chemically complex systems. However, the use of computational XANES at scale is constrained more by workflow complexity than by the underlying simulation meth…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

Learning to Reason with Insight for Informal Theorem Proving

2026-04-17 · Yunhe Li, Hao Shi, Bowen Deng, Wei Wang, Mengzhe Ruan, Hanxu Hou, Zhongxiang Dai, Siyang Gao, Chao Wang, Shuang Qiu, Linqi Song

General AI

Although most of the automated theorem-proving approaches depend on formal proof systems, informal theorem proving can align better with large language models' (LLMs) strength in natural language processing. In this work, we identify a primary bottleneck in informal theorem proving as a lack of insight, namely the diff…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

SwanNLP at SemEval-2026 Task 5: An LLM-based Framework for Plausibility Scoring in Narrative Word Sense Disambiguation

2026-04-17 · Deshan Sumanathilaka, Nicholas Micallef, Julian Hough, Saman Jayasinghe

General AI

Recent advances in language models have substantially improved Natural Language Understanding (NLU). Although widely used benchmarks suggest that Large Language Models (LLMs) can effectively disambiguate, their practical applicability in real-world narrative contexts remains underexplored. SemEval-2026 Task 5 addresses…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

Where Do Vision-Language Models Fail? World Scale Analysis for Image Geolocalization

2026-04-17 · Siddhant Bharadwaj, Ashish Vashist, Fahimul Aleem, Shruti Vyas

General AI

Image geolocalization has traditionally been addressed through retrieval-based place recognition or geometry-based visual localization pipelines. Recent advances in Vision-Language Models (VLMs) have demonstrated strong zero-shot reasoning capabilities across multimodal tasks, yet their performance in geographic infere…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

Phase-Scheduled Multi-Agent Systems for Token-Efficient Coordination

2026-04-19 · Mohit Dubey

Research Track B · General AI

Multi-agent systems (MAS) powered by large language models suffer from severe token inefficiency arising from two compounding sources: (i) unstructured parallel execution, where all agents activate simultaneously irrespective of input readiness; and (ii) unrestricted context sharing, where every agent receives the full…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

Document-as-Image Representations Fall Short for Scientific Retrieval

2026-04-20 · Ghazal Khalighinejad, Raghuveer Thirukovalluru, Alexander H. Oh, Bhuwan Dhingra

General AI

Many recent document embedding models are trained on document-as-image representations, embedding rendered pages as images rather than the underlying source. Meanwhile, existing benchmarks for scientific document retrieval, such as ArXivQA and ViDoRe, treat documents as images of pages, implicitly favoring such represe…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

Sessa: Selective State Space Attention

2026-04-20 · Liubomyr Horbatko

General AI

Modern sequence models are dominated by Transformers, where self-attention mixes information from the visible context in an input-dependent way. However, when retrieval is not sharp and attention remains diffuse over an effective support $S_{\mathrm{eff}}(t)$, the influence of any individual token is diluted, typically…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

RespondeoQA: a Benchmark for Bilingual Latin-English Question Answering

2026-04-22 · Marisa Hudspeth, Patrick J. Burns, Brendan O'Connor

General AI

We introduce a benchmark dataset for question answering and translation in bilingual Latin and English settings, containing about 7,800 question-answer pairs. The questions are drawn from Latin pedagogical sources, including exams, quizbowl-style trivia, and textbooks ranging from the 1800s to the present. After automa…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

Self-Awareness before Action: Mitigating Logical Inertia via Proactive Cognitive Awareness

2026-04-22 · Fulong Fan, Peilin Liu, Fengzhe Liu, Shuyan Yang, Gang Yan

General AI

Large language models perform well on many reasoning tasks, yet they often lack awareness of whether their current knowledge or reasoning state is complete. In non-interactive puzzle settings, the narrative is fixed and the underlying structure is hidden; once a model forms an early hypothesis under incomplete premises…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

StructMem: Structured Memory for Long-Horizon Behavior in LLMs

2026-04-23 · Buqiang Xu, Yijun Chen, Jizhan Fang, Ruobin Zhong, Yunzhi Yao, Yuqi Zhu, Lun Du, Shumin Deng

Research Track A · General AI

Long-term conversational agents need memory systems that capture relationships between events, not merely isolated facts, to support temporal reasoning and multi-hop question answering. Current approaches face a fundamental trade-off: flat memory is efficient but fails to model relational structure, while graph-based m…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

Task-Driven Co-Design of Heterogeneous Multi-Robot Systems

2026-04-23 · Maximilian Stralz, Meshal Alharbi, Yujun Huang, Gioele Zardini

General AI

Designing multi-agent robotic systems requires reasoning across tightly coupled decisions spanning heterogeneous domains, including robot design, fleet composition, and planning. Much effort has been devoted to isolated improvements in these domains, whereas system-level co-design considering trade-offs and task requir…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

Rethinking Math Reasoning Evaluation: A Robust LLM-as-a-Judge Framework Beyond Symbolic Rigidity

2026-04-24 · Erez Yosef, Oron Anschel, Shunit Haviv Hakimi, Asaf Gendler, Adam Botach, Nimrod Berman, Igor Kviatkovsky

General AI

Recent advancements in large language models have led to significant improvements across various tasks, including mathematical reasoning, which is used to assess models' intelligence in logical reasoning and problem-solving. Models are evaluated on mathematical reasoning benchmarks by verifying the correctness of the f…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

Can LLMs Act as Historians? Evaluating Historical Research Capabilities of LLMs via the Chinese Imperial Examination

2026-04-27 · Lirong Gao, Zeqing Wang, Yuyan Cai, Jiayi Deng, Yanmei Gu, Yiming Zhang, Jia Zhou, Yanfei Zhang, Junbo Zhao

General AI

While Large Language Models (LLMs) have increasingly assisted in historical tasks such as text processing, their capacity for professional-level historical reasoning remains underexplored. Existing benchmarks primarily assess basic knowledge breadth or lexical understanding, failing to capture the higher-order skills, …

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

Aligning Flow Map Policies with Optimal Q-Guidance

2026-05-12 · Christos Ziakas, Alessandra Russo, Avishek Joey Bose

General AI

Generative policies based on expressive model classes, such as diffusion and flow matching, are well-suited to complex control problems with highly multimodal action distributions. Their expressivity, however, comes at a significant inference cost: generating each action typically requires simulating many steps of the …

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

Events as Triggers for Behavioral Diversity in Multi-Agent Reinforcement Learning

2026-05-12 · Hannes Büchi, Manon Flageat, Eduardo Sebastián, Amanda Prorok

General AI

Effective multi-agent cooperation requires agents to adopt diverse behaviors as task conditions evolve-and to do so at the right moment. Yet, current Multi-Agent Reinforcement Learning (MARL) frameworks that facilitate this diversity are still limited by the fact that they bind fixed behaviors to fixed agent identities…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.2

AI-Generated Smells: An Analysis of Code and Architecture in LLM and Agent-Driven Development

2026-05-04 · Yuecai Zhu, Nikolaos Tsantalis, Peter C. Rigby

General AI

The promise of Large Language Models in automated software engineering is often measured by functional correctness, overlooking the critical issue of long term maintainability. This paper presents a systematic audit of technical debt in AI-generated software, revealing that AI does not eliminate flaws but rather introd…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.2

EvoPoC: Automated Exploit Synthesis for DeFi Smart Contracts via Hierarchical Knowledge Graphs

2026-05-04 · Ruichao Liang, Jing Chen, Xianglong Li, Huangpeng Gu, Yebo Feng, Yue Xue, Cong Wu, Yang Liu

General AI

Smart contract vulnerabilities in Decentralized Finance caused over billions of dollars losses every year, yet the security community faces a critical bottleneck: identifying a vulnerability is not the same as proving it is exploitable. Manual PoC construction is prohibitively labor-intensive, leaving most disclosed vu…

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.0

FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization

2026-03-20 · Chiyu Ma, Shuo Yang, Kexin Huang, Jinda Lu, Haoming Meng, Shangshang Wang, Bolin Ding, Soroush Vosoughi, Guoyin Wang, Jingren Zhou

General AI

We present Future-KL Influenced Policy Optimization (FIPO), a reinforcement learning algorithm designed to overcome reasoning bottlenecks in large language models. While GRPO style training scales effectively, it typically relies on outcome-based rewards (ORM) that distribute a global advantage uniformly across every t…

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.0

GEMS: Agent-Native Multimodal Generation with Memory and Skills

2026-03-30 · Zefeng He, Siyuan Huang, Xiaoye Qu, Yafu Li, Tong Zhu, Yu Cheng, Yang Yang

General AI

Recent multimodal generation models have achieved remarkable progress on general-purpose generation tasks, yet continue to struggle with complex instructions and specialized downstream tasks. Inspired by the success of advanced agent frameworks such as Claude Code, we propose GEMS (Agent-Native Multimodal GEneration wi…

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.0

AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation

2026-03-31 · Zhaohe Liao, Kaixun Jiang, Zhihang Liu, Yujie Wei, Junqiu Yu, Quanhao Li, Hong-Tao Yu, Pandeng Li, Yuzheng Wang, Zhen Xing, Shiwei Zhang, Chen-Wei Xie, Yun Zheng, Xihui Liu

General AI

Although image generation has boosted various applications via its rapid evolution, whether the state-of-the-art models are able to produce ready-to-use academic illustrations for papers is still largely unexplored. Directly comparing or evaluating the illustration with VLM is native but requires oracle multi-modal und…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.0

Dual-Imbalance Continual Learning for Real-World Food Recognition

2026-03-31 · Xiaoyan Zhang, Jiangpeng He

Research Track A · General AI

Visual food recognition in real-world dietary logging scenarios naturally exhibits severe data imbalance, where a small number of food categories appear frequently while many others occur rarely, resulting in long-tailed class distributions. In practice, food recognition systems often operate in a continual learning se…

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.0

CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery

2026-04-02 · Ao Qu, Han Zheng, Zijian Zhou, Yihao Yan, Yihong Tang, Shao Yong Ong, Fenglu Hong, Kaichen Zhou, Chonghe Jiang, Minwei Kong, Jiacheng Zhu, Xuan Jiang, Sirui Li, Cathy Wu, Bryan Kian Hsiang Low, Jinhua Zhao, Paul Pu Liang

General AI

Large language model (LLM)-based evolution is a promising approach for open-ended discovery, where progress requires sustained search and knowledge accumulation. Existing methods still rely heavily on fixed heuristics and hard-coded exploration rules, which limit the autonomy of LLM agents. We present CORAL, the first …

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.0

ClawArena: Benchmarking AI Agents in Evolving Information Environments

2026-04-05 · Haonian Ji, Kaiwen Xiong, Siwei Han, Peng Xia, Shi Qiu, Yiyang Zhou, Jiaqi Liu, Jinlong Li, Bingzhou Li, Zeyu Zheng, Cihang Xie, Huaxiu Yao

General AI

AI agents deployed as persistent assistants must maintain correct beliefs as their information environment evolves. In practice, evidence is scattered across heterogeneous sources that often contradict one another, new information can invalidate earlier conclusions, and user preferences surface through corrections rath…

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.0

SkillX: Automatically Constructing Skill Knowledge Bases for Agents

2026-04-06 · Chenxi Wang, Zhuoyun Yu, Xin Xie, Wuguannan Yao, Runnan Fang, Shuofei Qiao, Kexin Cao, Guozhou Zheng, Xiang Qi, Peng Zhang, Shumin Deng

General AI

Learning from experience is critical for building capable large language model (LLM) agents, yet prevailing self-evolving paradigms remain inefficient: agents learn in isolation, repeatedly rediscover similar behaviors from limited experience, resulting in redundant exploration and poor generalization. To address this …

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.0

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

2026-04-06 · Chaoyou Fu, Haozhi Yuan, Yuhao Dong, Yi-Fan Zhang, Yunhang Shen, Xiaoxing Hu, Xueying Li, Jinsen Su, Chengwu Long, Xiaoyao Xie, Yongkang Xie, Xiawu Zheng, Xue Yang, Haoyu Cao, Yunsheng Wu, Ziwei Liu, Xing Sun, Caifeng Shan, Ran He

General AI

With the rapid advancement of video understanding, existing benchmarks are becoming increasingly saturated, exposing a critical discrepancy between inflated leaderboard scores and real-world model capabilities. To address this widening gap, we introduce Video-MME-v2, a comprehensive benchmark designed to rigorously eva…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.0

CobwebTM: Probabilistic Concept Formation for Lifelong and Hierarchical Topic Modeling

2026-04-15 · Karthik Singaravadivelan, Anant Gupta, Zekun Wang, Christopher MacLellan, Christopher J. MacLellan

Research Track A

Topic modeling seeks to uncover latent semantic structure in text corpora with minimal supervision. Neural approaches achieve strong performance but require extensive tuning and struggle with lifelong learning due to catastrophic forgetting and fixed capacity, while classical probabilistic models lack flexibility and a…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.0

AEGIS: Anchor-Enforced Gradient Isolation for Knowledge-Preserving Vision-Language-Action Fine-Tuning

2026-04-17 · Guransh Singh

Research Track A

Adapting pre-trained vision-language models (VLMs) for robotic control requires injecting high-magnitude continuous gradients from a flow-matching action expert into a backbone trained exclusively with cross-entropy. This cross-modal gradient asymmetry - the spectral dimensionality mismatch between low-rank MSE regress…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.0

Meta-Tool: Efficient Few-Shot Tool Adaptation for Small Language Models

2026-04-22 · Sachin Kumar

Research Track B · General AI

Can small language models achieve strong tool-use performance without complex adaptation mechanisms? This paper investigates this question through Meta-Tool, a controlled empirical study comparing hypernetwork-based LoRA adaptation against carefully designed few-shot prompting. Using a Llama-3.2-3B-Instruct backbone, w…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.0

Task-specific Subnetwork Discovery in Reinforcement Learning for Autonomous Underwater Navigation

2026-04-23 · Yi-Ling Liu, Melvin Laux, Mariela De Lucas Alvarez, Frank Kirchner, Rebecca Adam

Research Track A · General AI

Autonomous underwater vehicles are required to perform multiple tasks adaptively and in an explainable manner under dynamic, uncertain conditions and limited sensing, challenges that classical controllers struggle to address. This demands robust, generalizable, and inherently interpretable control policies for reliable…

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.0

Step-Audio-R1.5 Technical Report

2026-04-28 · Yuxin Zhang, Xiangyu Tony Zhang, Daijiao Liu, Fei Tian, Yayue Deng, Jun Chen, Qingjian Lin, Haoyang Zhang, Yuxin Li, Jinglan Gong, Yechang Huang, Liang Zhao, Chengyuan Yao, Hexin Liu, Eng Siong Chng, Xuerui Yang, Gang Yu, Xiangyu Zhang, Daxin Jiang

General AI

Recent advancements in large audio language models have extended Chain-of-Thought (CoT) reasoning into the auditory domain, enabling models to tackle increasingly complex acoustic and spoken tasks. To elicit and sustain these extended reasoning chains, the prevailing paradigm -- driven by the success of text-based reas…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.0

Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

2026-05-07 · Yuxing Liu, Jianyu Wang, Tong Zhang

Research Track A · General AI

Optimizers play an important role in both pretraining and finetuning stages when training large language models (LLMs). In this paper, we present an observation that full finetuning with the same optimizer as in pretraining achieves a better learning-forgetting tradeoff, i.e., forgetting less while achieving the same o…

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.0

Large Language Models over Networks: Collaborative Intelligence under Resource Constraints

2026-05-09 · Liangqi Yuan, Wenzhi Fang, Shiqiang Wang, H. Vincent Poor, Christopher G. Brinton

General AI

Large language models (LLMs) are transforming society, powering applications from smartphone assistants to autonomous driving. Yet cloud-based LLM services alone cannot serve a growing class of applications, including those operating under intermittent connectivity, sub-second latency budgets, data-residency constraint…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

WebNavigator: Global Web Navigation via Interaction Graph Retrieval

2026-03-20 · Xuanwang Zhang, Yuteng Han, Jinnan Qi, Mulong Xie, Zhen Wu, Xinyu Dai

Research Track B · General AI

Despite significant advances in autonomous web navigation, current methods remain far from human-level performance in complex web environments. We argue that this limitation stems from Topological Blindness, where agents are forced to explore via trial-and-error without access to the global topological structure of the…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

Agent Factories for High Level Synthesis: How Far Can General-Purpose Coding Agents Go in Hardware Optimization?

2026-03-26 · Abhishek Bhandwaldar, Mihir Choudhury, Ruchir Puri, Akash Srivastava

General AI

We present an empirical study of how far general-purpose coding agents -- without hardware-specific training -- can optimize hardware designs from high-level algorithmic specifications. We introduce an agent factory, a two-stage pipeline that constructs and coordinates multiple autonomous optimization agents. In Stage~…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

Just Zoom In: Cross-View Geo-Localization via Autoregressive Zooming

2026-03-26 · Yunus Talha Erzurumlu, Jiyong Kwag, Alper Yilmaz

General AI

Cross-view geo-localization (CVGL) estimates a camera's location by matching a street-view image to geo-referenced overhead imagery, enabling GPS-denied localization and navigation. Existing methods almost universally formulate CVGL as an image-retrieval problem in a contrastively trained embedding space. This ties per…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

Scaling Teams or Scaling Time? Memory Enabled Lifelong Learning in LLM Multi-Agent Systems

2026-03-27 · Shanglin Wu, Yuyang Luo, Yueqing Liang, Kaiwen Shi, Yanfang Ye, Ali Payani, Kai Shu

Research Track A · General AI

Large language model (LLM) multi-agent systems can scale along two distinct dimensions: by increasing the number of agents and by improving through accumulated experience over time. Although prior work has studied these dimensions separately, their interaction under realistic cost constraints remains unclear. In this p…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

Fine-Tuning Large Language Models for Cooperative Tactical Deconfliction of Small Unmanned Aerial Systems

2026-03-30 · Iman Sharifi, Alex Zongo, Peng Wei

General AI

The growing deployment of small Unmanned Aerial Systems (sUASs) in low-altitude airspaces has increased the need for reliable tactical deconfliction under safety-critical constraints. Tactical deconfliction involves short-horizon decision-making in dense, partially observable, and heterogeneous multi-agent environments…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

Can Commercial LLMs Be Parliamentary Political Companions? Comparing LLM Reasoning Against Romanian Legislative Expuneri de Motive

2026-03-31 · Iulian Lucău, Adelin-George Voicu

General AI

This paper evaluates whether commercial large language models (LLMs) can function as reliable political advisory tools by comparing their outputs against official legislative reasoning. Using a dataset of 15 Romanian Senate law proposals paired with their official explanatory memoranda (expuneri de motive), we test six…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

SurgTEMP: Temporal-Aware Surgical Video Question Answering with Text-guided Visual Memory for Laparoscopic Cholecystectomy

2026-03-31 · Shi Li, Vinkle Srivastav, Nicolas Chanel, Saurav Sharma, Nabani Banik, Lorenzo Arboit, Kun Yuan, Pietro Mascagni, Nicolas Padoy

General AI

Surgical procedures are inherently complex and risky, requiring extensive expertise and constant focus to well navigate evolving intraoperative scenes. Computer-assisted systems such as surgical visual question answering (VQA) offer promises for education and intraoperative support. Current surgical VQA research largel…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

Think Anywhere in Code Generation

2026-03-31 · Xue Jiang, Tianyu Zhang, Ge Li, Mengyang Liu, Taozhi Chen, Zhenhua Xu, Binhua Li, Wenpin Jiao, Zhi Jin, Yongbin Li, Yihong Dong

General AI

Recent advances in reasoning Large Language Models (LLMs) have primarily relied on upfront thinking, where reasoning occurs before final answer. However, this approach suffers from critical limitations in code generation, where upfront thinking is often insufficient as problems' full complexity only reveals itself duri…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

Batched Contextual Reinforcement: A Task-Scaling Law for Efficient Reasoning

2026-04-02 · Bangji Yang, Hongbo Ma, Jiajun Fan, Ge Liu

General AI

Large Language Models employing Chain-of-Thought reasoning achieve strong performance but suffer from excessive token consumption that inflates inference costs. Existing efficiency methods such as explicit length penalties, difficulty estimators, or multi-stage curricula either degrade reasoning quality or require comp…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

Steerable Visual Representations

2026-04-02 · Jona Ruthardt, Manu Gaur, Deva Ramanan, Makarand Tapaswi, Yuki M. Asano

General AI

Pretrained Vision Transformers (ViTs) such as DINOv2 and MAE provide generic image features that can be applied to a variety of downstream tasks such as retrieval, classification, and segmentation. However, such representations tend to focus on the most salient visual cues in the image, with no way to direct them towar…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing

2026-04-02 · Gengsheng Li, Tianyu Yang, Junfeng Fang, Mingyang Song, Mao Zheng, Haiyun Guo, Dan Zhang, Jinqiao Wang, Tat-Seng Chua

General AI

Reinforcement learning with verifiable rewards (RLVR) has become a standard paradigm for post-training large language models. While Group Relative Policy Optimization (GRPO) is widely adopted, its coarse credit assignment uniformly penalizes failed rollouts, lacking the token-level focus needed to efficiently address s…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

RL-Driven Sustainable Land-Use Allocation for the Lake Malawi Basin

2026-04-04 · Ying Yao

Research Track A · General AI

Unsustainable land-use practices in ecologically sensitive regions threaten biodiversity, water resources, and the livelihoods of millions. This paper presents a deep reinforcement learning (RL) framework for optimizing land-use allocation in the Lake Malawi Basin to maximize total ecosystem service value (ESV). Drawin…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

Rethinking Exploration in RLVR: From Entropy Regularization to Refinement via Bidirectional Entropy Modulation

2026-04-06 · Hengrui Gu, Xiaotian Han, Yujing Bian, Kaixiong Zhou

General AI

Reinforcement learning with verifiable rewards (RLVR) has significantly advanced the reasoning capabilities of large language models (LLMs). However, it faces a fundamental limitation termed \textit{restricted exploration}, where the policy rapidly converges to a narrow set of solutions. While entropy regularization is…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

ForkKV: Scaling Multi-LoRA Agent Serving via Copy-on-Write Disaggregated KV Cache

2026-04-07 · Shao Wang, Rui Ren, Lin Gui

General AI

The serving paradigm of large language models (LLMs) is rapidly shifting towards complex multi-agent workflows where specialized agents collaborate over massive shared contexts. While Low-Rank Adaptation (LoRA) enables the efficient co-hosting of these specialized agents on a single base model, it introduces a critical…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

Social Dynamics as Critical Vulnerabilities that Undermine Objective Decision-Making in LLM Collectives

2026-04-07 · Changgeon Ko, Jisu Shin, Hoyun Song, Huije Lee, Eui Jun Hwang, Jong C. Park

General AI

Large language model (LLM) agents are increasingly acting as human delegates in multi-agent environments, where a representative agent integrates diverse peer perspectives to make a final decision. Drawing inspiration from social psychology, we investigate how the reliability of this representative agent is undermined …

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

UAVReason: A Unified, Large-Scale Benchmark for Multimodal Aerial Scene Reasoning and Generation

2026-04-07 · Jintao Sun, Hu Zhang, Donglin Di, Gangyi Ding, Zhedong Zheng

General AI

Vision-Language models (VLMs) have demonstrated remarkable capability in ground-view visual understanding but often fracture when deployed on high-altitude Unmanned Aerial Vehicles (UAVs). The failure largely stems from a pronounced domain shift, characterized by tiny and densely packed objects, repetitive textures, an…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

Joint Optimization of Reasoning and Dual-Memory for Self-Learning Diagnostic Agent

2026-04-08 · Bingxuan Li, Simo Du, Yue Guo

Research Track A · General AI

Clinical expertise improves not only by acquiring medical knowledge, but by accumulating experience that yields reusable diagnostic patterns. Recent LLMs-based diagnostic agents have shown promising progress in clinical reasoning for decision support. However, most approaches treat cases independently, limiting experie…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

Large Language Model Post-Training: A Unified View of Off-Policy and On-Policy Learning

2026-04-09 · Shiwan Zhao, Zhihu Wang, Xuyang Zhao, Jiaming Zhou, Caiyue Xu, Chenfei Liu, Liting Zhang, Yuhang Jia, Yanzhe Zhang, Hualong Yu, Zichen Xu, Qicheng Li, Yong Qin

Research Track A · General AI

Post-training has become central to turning pretrained large language models (LLMs) into aligned and deployable systems. Recent progress spans supervised fine-tuning (SFT), preference optimization, reinforcement learning (RL), process supervision, verifier-guided methods, distillation, and multi-stage pipelines. Yet th…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

Seeing but Not Thinking: Routing Distraction in Multimodal Mixture-of-Experts

2026-04-09 · Haolei Xu, Haiwen Hong, Hongxing Li, Rui Zhou, Yang Zhang, Longtao Huang, Hui Xue, Yongliang Shen, Weiming Lu, Yueting Zhuang

General AI

Multimodal Mixture-of-Experts (MoE) models have achieved remarkable performance on vision-language tasks. However, we identify a puzzling phenomenon termed Seeing but Not Thinking: models accurately perceive image content yet fail in subsequent reasoning, while correctly solving identical problems presented as pure tex…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

Vision-Language Navigation for Aerial Robots: Towards the Era of Large Language Models

2026-04-09 · Xingyu Xia, Lekai Zhou, Yujie Tang, Xiaozhou Zhu, Hai Zhu, Wen Yao

General AI

Aerial vision-and-language navigation (Aerial VLN) aims to enable unmanned aerial vehicles (UAVs) to interpret natural language instructions and autonomously navigate complex three-dimensional environments by grounding language in visual perception. This survey provides a critical and analytical review of the Aerial VL…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

What do Language Models Learn and When? The Implicit Curriculum Hypothesis

2026-04-09 · Emmy Liu, Kaiser Sun, Millicent Li, Isabelle Lee, Lindia Tjuatja, Jen-tse Huang, Graham Neubig

General AI

Large language models (LLMs) can perform remarkably complex tasks, yet the fine-grained details of how these capabilities emerge during pretraining remain poorly understood. Scaling laws on validation loss tell us how much a model improves with additional compute, but not what skills it acquires in which order. To reme…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

ADEMA: A Knowledge-State Orchestration Architecture for Long-Horizon Knowledge Synthesis with LLMAgents

2026-04-28 · Zhou Hanlin, Chan Huah Yong

General AI

Long-horizon LLM tasks often fail not because a single answer is unattainable, but because knowledge states drift across rounds, intermediate commitments remain implicit, and interruption fractures the evolving evidence chain. This paper presents ADEMA as a knowledge-state orchestration architecture for long-horizon kn…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

SIEVES: Selective Prediction Generalizes through Visual Evidence Scoring

2026-04-28 · Hector G. Rodriguez, Marcus Rohrbach

General AI

Multimodal large language models (MLLMs) achieve ever-stronger performance on visual-language tasks. Even as traditional visual question answering benchmarks approach saturation, reliable deployment requires satisfying low error tolerances in real-world out-of-distribution (OOD) scenarios. Precisely, selective predicti…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

OpenWatch: A Multimodal Benchmark for Hand Gesture Recognition on Smartwatches

2026-05-06 · Pietro Bonazzi, Youssef Ahmed, Daniel Eckert, Andrea Ronco, Junjie Zeng, Dengxin Dai, Michele Magno

General AI

Despite widespread adoption of smartwatches worldwide, open-benchmarks for wrist-based gesture recognition remain surprisingly limited. In this work, we introduce the first open-access multi-modal benchmark, OpenWatch, for wrist-based gesture recognition using synchronized inertial and physiological sensing on a commer…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study

2026-05-07 · Hao Dong, Hongzhao Li, Shupan Li, Muhammad Haris Khan, Eleni Chatzi, Olga Fink

General AI

Despite the growing popularity of Multimodal Domain Generalization (MMDG) for enhancing model robustness, it remains unclear whether reported performance gains reflect genuine algorithmic progress or are artifacts of inconsistent evaluation protocols. Current research is fragmented, with studies varying significantly a…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

Debiased Multimodal Personality Understanding through Dual Causal Intervention

2026-05-07 · Yangfu Zhu, Zitong Han, Nianwen Ning, Yuting Wei, Yuandong Wang, Hang Feng, Zhenzhou Shao

General AI

Multimodalpersonalityunderstandingplaysacriticalroleinhuman centered artificial intelligence. Previous work mainly focus on learn-ing rich multimodal representations for video personality under standing. However, they often suffer from potential harm caused by subject bias (e.g., observable age and unobservable mental …

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems

2026-05-07 · Zhexuan Wang, Xuebo Liu, Li Wang, Zifei Shan, Yutong Wang, Zhenxi Song, Min Zhang

General AI

Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts. While the quality of these prompts is pivotal, jointly optimizing them across interacting agents remains a non-trivial challenge, prim…

Review
pending
Role
unreviewed
Read
now
huggingface Score 13.5

AgriIR: A Scalable Framework for Domain-Specific Knowledge Retrieval

2026-03-17 · Shuvam Banerji Seal, Aheli Poddar, Alok Mishra, Dwaipayan Roy

General AI

This paper introduces AgriIR, a configurable retrieval augmented generation (RAG) framework designed to deliver grounded, domain-specific answers while maintaining flexibility and low computational cost. Instead of relying on large, monolithic models, AgriIR decomposes the information access process into declarative mo…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.5

Learning from Many and Adapting to the Unknown in Open-set Test Streams

2026-04-01 · Xiao Zhang, Juntao Lyu, Tianyu Hu, Qianchuan Zhao, Huimin Ma

Research Track A · General AI

Large Language Models (LLMs) generalize across tasks via reusable representations and flexible reasoning, yet remain brittle in real deployment under evolving tasks and continual distribution shift. A common approach is Test-Time Adaptation (TTA), existing ones of which updates models with hand-designed unsupervised ob…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.5

Learning to Learn-at-Test-Time: Language Agents with Learnable Adaptation Policies

2026-04-01 · Zhanzhi Lou, Hui Chen, Yibo Li, Qian Wang, Bryan Hooi

Research Track B · General AI

Test-Time Learning (TTL) enables language agents to iteratively refine their performance through repeated interactions with the environment at inference time. At the core of TTL is an adaptation policy that updates the actor policy based on experience from previous episodes, thereby improving future behavior. Existing …

Review
pending
Role
unreviewed
Read
now
huggingface Score 13.5

When Reasoning Models Hurt Behavioral Simulation: A Solver-Sampler Mismatch in Multi-Agent LLM Negotiation

2026-04-12 · Sandro Andric

General AI

Large language models are increasingly used as agents in social, economic, and policy simulations. A common assumption is that stronger reasoning should improve simulation fidelity. We argue that this assumption can fail when the objective is not to solve a strategic problem, but to sample plausible boundedly rational …

Review
pending
Role
unreviewed
Read
now
huggingface Score 13.5

Self-Evolving LLM Memory Extraction Across Heterogeneous Tasks

2026-04-13 · Yuqing Yang, Tengxiao Liu, Wang Bill Zhu, Taiwei Shi, Linxin Song, Robin Jia

General AI

As LLM-based assistants become persistent and personalized, they must extract and retain useful information from past conversations as memory. However, the types of information worth remembering vary considerably across tasks. We formalize the heterogeneous memory extraction task and introduce BEHEMOTH, a benchmark tha…

Review
pending
Role
unreviewed
Read
now
huggingface Score 13.5

Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG

2026-04-16 · Yiqun Sun, Pengfei Wei, Lawrence B. Hsieh

General AI

Retrieval-Augmented Generation (RAG) grounds LLM responses in external evidence but treats the model as a passive consumer of search results: it never sees how the corpus is organized or what it has not yet retrieved, limiting its ability to backtrack or combine scattered evidence. We present Corpus2Skill, which distil…

Review
pending
Role
unreviewed
Read
now
huggingface Score 13.5

Scaling Test-Time Compute for Agentic Coding

2026-04-16 · Joongwon Kim, Wannan Yang, Kelvin Niu, Hongming Zhang, Yun Zhu, Eryk Helenowski, Ruan Silva, Zhengxing Chen, Srinivasan Iyer, Manzil Zaheer, Daniel Fried, Hannaneh Hajishirzi, Sanjeev Arora, Gabriel Synnaeve, Ruslan Salakhutdinov, Anirudh Goyal

General AI

Test-time scaling has become a powerful way to improve large language models. However, existing methods are best suited to short, bounded outputs that can be directly compared, ranked or refined. Long-horizon coding agents violate this premise: each attempt produces an extended trajectory of actions, observations, erro…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.5

Continual Hand-Eye Calibration for Open-world Robotic Manipulation

2026-04-17 · Fazeng Li, Gan Sun, Chenxi Liu, Yao He, Wei Cong, Yang Cong

Research Track A

Hand-eye calibration through visual localization is a critical capability for robotic manipulation in open-world environments. However, most deep learning-based calibration models suffer from catastrophic forgetting when adapting into unseen data amongst open-world scene changes, while simple rehearsal-based continual …

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.5

HyCal: A Training-Free Prototype Calibration Method for Cross-Discipline Few-Shot Class-Incremental Learning

2026-04-17 · Eunju Lee, MiHyeon Kim, JuneHyoung Kwon, Yoonji Lee, JiHyun Kim, Soojin Jang, YoungBin Kim

Research Track A · General AI

Pretrained Vision-Language Models (VLMs) like CLIP show promise in continual learning, but existing Few-Shot Class-Incremental Learning (FSCIL) methods assume homogeneous domains and balanced data distributions, limiting real-world applicability where data arises from heterogeneous disciplines with imbalanced sample av…

Review
pending
Role
unreviewed
Read
now
huggingface Score 13.5

AgentSearchBench: A Benchmark for AI Agent Search in the Wild

2026-04-24 · Bin Wu, Arastun Mammadli, Xiaoyu Zhang, Emine Yilmaz

General AI

The rapid growth of AI agent ecosystems is transforming how complex tasks are delegated and executed, creating a new challenge of identifying suitable agents for a given task. Unlike traditional tools, agent capabilities are often compositional and execution-dependent, making them difficult to assess from textual descr…

Review
pending
Role
unreviewed
Read
now
huggingface Score 13.4

Learning to Act and Cooperate for Distributed Black-Box Consensus Optimization

2026-05-01 · Zi-Bo Qin, Feng-Feng Wei, Tai-You Chen, Wei-Neng Chen

General AI

Distributed blackbox consensus optimization is a fundamental problem in multi-agent systems, where agents must improve a global objective using only local objective queries and limited neighbor communication. Existing methods largely rely on handcrafted update rules and static cooperation patterns, which often struggle…

Review
pending
Role
unreviewed
Read
now
huggingface Score 13.4

Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning

2026-05-01 · Chengshuai Shi, Wenzhe Li, Xinran Liang, Yizhou Lu, Wenjia Yang, Ruirong Feng, Seth Karten, Ziran Yang, Zihan Ding, Gabriel Sarch, Danqi Chen, Karthik Narasimhan, Chi Jin

General AI

Given the rapidly growing capabilities of vision-language models (VLMs), extending them to interactive decision-making tasks such as video games has emerged as a promising frontier. However, existing approaches either rely on large-scale supervised fine-tuning (SFT) on human trajectories or apply reinforcement learning…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

CASK: Core-Aware Selective KV Compression for Reasoning Traces

2026-04-13 · Buseong Kim, Heejun Gwon

Research Track A · General AI

In large language models performing long-form reasoning, the KV cache grows rapidly with decode length, creating bottlenecks in memory and inference stability. Existing reasoning-oriented KV compression has mostly followed an eviction-centered view: estimate token importance more accurately, then discard lower-ranked e…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

GenTac: Generative Modeling and Forecasting of Soccer Tactics

2026-04-13 · Jiayuan Rao, Tianlin Gui, Haoning Wu, Yanfeng Wang, Weidi Xie

General AI

Modeling open-play soccer tactics is a formidable challenge due to the stochastic, multi-agent nature of the game. Existing computational approaches typically produce single, deterministic trajectory forecasts or focus on highly structured set-pieces, fundamentally failing to capture the inherent variance and branching…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

A Sanity Check on Composed Image Retrieval

2026-04-14 · Yikun Liu, Jiangchao Yao, Weidi Xie, Yanfeng Wang

General AI

Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image, and a relative caption that specifies the desired modification. Despite the rapid development of CIR models, their performance is not well characterized by existing benchmarks, which inherently contain indeter…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

Modeling Co-Pilots for Text-to-Model Translation

2026-04-14 · Serdar Kadioglu, Karthik Uppuluri, Akash Singirikonda

General AI

There is growing interest in leveraging large language models (LLMs) for text-to-model translation and optimization tasks. This paper aims to advance this line of research by introducing \textsc{Text2Model} and \textsc{Text2Zinc}. \textsc{Text2Model} is a suite of co-pilots based on several LLM strategies with varying …

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

Compressing Sequences in the Latent Embedding Space: $K$-Token Merging for Large Language Models

2026-04-16 · Zihao Xu, John Harvill, Ziwei Fan, Yizhou Sun, Hao Ding, Hao Wang

General AI

Large Language Models (LLMs) incur significant computational and memory costs when processing long prompts, as full self-attention scales quadratically with input length. Token compression aims to address this challenge by reducing the number of tokens representing inputs. However, existing prompt-compression approache…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

Beyond Distribution Sharpening: The Importance of Task Rewards

2026-04-17 · Sarthak Mittal, Leo Gagnon, Guillaume Lajoie

General AI

Frontier models have demonstrated exceptional capabilities following the integration of task-reward-based reinforcement learning (RL) into their training pipelines, enabling systems to evolve from pure reasoning models into sophisticated agents. However, debate persists regarding whether RL genuinely instills new skill…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

Evaluating the Progression of Large Language Model Capabilities for Small-Molecule Drug Design

2026-04-17 · Shriram Chennakesavalu, Kirill Shmilovich, Hayley Weir, Colin Grambow, John Bradshaw, Patricia Suriana, Chen Cheng, Kangway Chuang

General AI

Large Language Models (LLMs) have the potential to accelerate small molecule drug design due to their ability to reason about information from diverse sources and formats. However, their practical utility remains unclear due to the lack of benchmarks that reflect real-world scenarios. In this work, we introduce a suite…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

FUSE: Ensembling Verifiers with Zero Labeled Data

2026-04-20 · Joonhyuk Lee, Virginia Ma, Sarah Zhao, Yash Nair, Asher Spector, Regev Cohen, Emmanuel J. Candès

General AI

Verification of model outputs is rapidly emerging as a key primitive for both training and real-world deployment of large language models (LLMs). In practice, this often involves using imperfect LLM judges and reward models since ground truth acquisition can be time-consuming and expensive. We introduce Fully Unsupervi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

Omni-Embed-Audio: Leveraging Multimodal LLMs for Robust Audio-Text Retrieval

2026-04-20 · HaeJun Yoo, Yongseop Shin, Insung Lee, Myoung-Wan Koo, Du-Seong Chang

General AI

Audio-text retrieval systems based on Contrastive Language-Audio Pretraining (CLAP) achieve strong performance on traditional benchmarks; however, these benchmarks rely on caption-style queries that differ substantially from real-world search behavior, limiting their assessment of practical retrieval robustness. We pre…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

Revisiting Change VQA in Remote Sensing with Structured and Native Multimodal Qwen Models

2026-04-20 · Yakoub Bazi, Mohamad M. Al Rahhal, Mansour Zuair, Faroun Mohamed

General AI

Change visual question answering (Change VQA) addresses the problem of answering natural-language questions about semantic changes between bi-temporal remote sensing (RS) images. Although vision-language models (VLMs) have recently been studied for temporal RS image understanding, Change VQA remains underexplored in th…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

When Can LLMs Learn to Reason with Weak Supervision?

2026-04-20 · Salman Rahman, Jingyan Shen, Anna Mordvina, Hamid Palangi, Saadia Gabriel, Pavel Izmailov

General AI

Large language models have achieved significant reasoning improvements through reinforcement learning with verifiable rewards (RLVR). Yet as model capabilities grow, constructing high-quality reward signals becomes increasingly difficult, making it essential to understand when RLVR can succeed under weaker forms of sup…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

PlayCoder: Making LLM-Generated GUI Code Playable

2026-04-21 · Zhiyuan Peng, Wei Tao, Xin Yin, Chenhao Ying, Yuan Luo, Yiwen Guo

General AI

Large language models (LLMs) have achieved strong results in code generation, but their ability to generate GUI applications, especially games, remains insufficiently studied. Existing benchmarks mainly evaluate correctness through test cases, which are inadequate for GUI applications because these systems are interact…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

MCP Pitfall Lab: Exposing Developer Pitfalls in MCP Tool Server Security under Multi-Vector Attacks

2026-04-23 · Run Hao, Zhuoran Tan

General AI

Model Context Protocol (MCP) is increasingly adopted for tool-integrated LLM agents, but its multi-layer design and third-party server ecosystem expand risks across tool metadata, untrusted outputs, cross-tool flows, multimodal inputs, and supply-chain vectors. Existing MCP benchmarks largely measure robustness to mali…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

TEMA: Anchor the Image, Follow the Text for Multi-Modification Composed Image Retrieval

2026-04-23 · Zixu Li, Yupeng Hu, Zhiheng Fu, Zhiwei Chen, Yongqi Li, Liqiang Nie

General AI

Composed Image Retrieval (CIR) is an important image retrieval paradigm that enables users to retrieve a target image using a multimodal query that consists of a reference image and modification text. Although research on CIR has made significant progress, prevailing setups still rely simple modification texts that typ…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

LARA: Validation-Driven Agentic Supercomputer Workflows for Atomistic Modeling

2026-04-24 · William Dawson, Louis Beal, Yoann Curé, Giuseppe Fisicaro, Dorian Rolland, Luigi Genovese

General AI

Large language models (LLMs) and agentic systems have recently demonstrated potential for automating scientific workflows, including atomistic simulations. However, their deployment in high-performance computing (HPC) environments remains limited by the lack of mechanisms ensuring correctness, reproducibility, and safe…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

Seeing the Whole Elephant: A Benchmark for Failure Attribution in LLM-based Multi-Agent Systems

2026-04-24 · Mengzhuo Chen, Junjie Wang, Fangwen Mu, Yawen Wang, Zhe Liu, Huanxiang Feng, Qing Wang

General AI

Failure attribution, i.e., identifying the responsible agent and decisive step of a failure, is particularly challenging in LLM-based multi-agent systems (MAS) due to their natural-language reasoning, nondeterministic outputs, and intricate interaction dynamics. A reliable benchmark is therefore essential to guide and …

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

DepthKV: Layer-Dependent KV Cache Pruning for Long-Context LLM Inference

2026-04-27 · Zahra Dehghanighobadi, Asja Fischer

General AI

Long-context reasoning is a critical capability of large language models (LLMs), enabling applications such as long-document understanding, summarization, and code generation. However, efficient autoregressive inference relies on the key-value (KV) cache, whose memory footprint grows linearly with sequence length, lead…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

Mela: Test-Time Memory Consolidation based on Transformation Hypothesis

2026-05-11 · Lungchuan Chen

Research Track A · General AI

Memory consolidation, the process by which transient experiences are transformed into stable, structured representations, is a foundational organizing principle in the human brain, yet it remains largely unexplored as a design principle for modern sequence models. In this work, we leverage established neuroscientific t…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

Covering Human Action Space for Computer Use: Data Synthesis and Benchmark

2026-05-12 · Miaosen Zhang, Xiaohan Zhao, Zhihong Tan, Zhou Huoshen, Yijia Fan, Yifan Yang, Kai Qiu, Bei Liu, Justin Wagle, Chenzhong Yin, Mingxi Cheng, Ji Li, Qi Dai, Chong Luo, Xu Yang, Xin Geng, Baining Guo

Research Track B · General AI

Computer-use agents (CUAs) automate on-screen work, as illustrated by GPT-5.4 and Claude. Yet their reliability on complex, low-frequency interactions is still poor, limiting user trust. Our analysis of failure cases from advanced models suggests a long-tail pattern in GUI operations, where a relatively small fraction …

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

Solve the Loop: Attractor Models for Language and Reasoning

2026-05-12 · Jacob Fein-Ashley, Paria Rashidinejad

General AI

Looped Transformers offer a promising alternative to purely feed-forward computation by iteratively refining latent representations, improving language modeling and reasoning. Yet recurrent architectures remain unstable to train, costly to optimize and deploy, and constrained to small, fixed recurrence depths. We intro…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.2

LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models

2026-04-30 · Hao Chen, Jiaming Liu, Zhonghao Yan, Nuowei Han, Renrui Zhang, Chenyang Gu, Jialin Gao, Ziyu Guo, Siyuan Qian, Yinxi Wang, Peng Jia, Chi-Wing Fu, Shanghang Zhang, Pheng-Ann Heng

General AI

Vision-Language-Action (VLA) models have increasingly incorporated reasoning mechanisms for complex robotic manipulation. However, existing approaches share a critical limitation: whether employing explicit linguistic reasoning that suffers from latency and discretization, or utilizing more expressive continuous latent…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.2

Synthetic Computers at Scale for Long-Horizon Productivity Simulation

2026-04-30 · Tao Ge, Baolin Peng, Hao Cheng, Jianfeng Gao

General AI

Realistic long-horizon productivity work is strongly conditioned on user-specific computer environments, where much of the work context is stored and organized through directory structures and content-rich artifacts. To scale synthetic data creation for such productivity scenarios, we introduce Synthetic Computers at S…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.2

Let ViT Speak: Generative Language-Image Pre-training

2026-05-01 · Yan Fang, Mengcheng Lan, Zilong Huang, Weixian Lei, Yunqing Zhao, Yujie Zhong, Yingchen Yu, Qi She, Yao Zhao, Yunchao Wei

General AI

In this paper, we present \textbf{Gen}erative \textbf{L}anguage-\textbf{I}mage \textbf{P}re-training (GenLIP), a minimalist generative pretraining framework for Vision Transformers (ViTs) designed for multimodal large language models (MLLMs). To better align vision encoders with the autoregressive nature of LLMs, GenLI…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.2

Robust Multimodal Recommendation via Graph Retrieval-Enhanced Modality Completion

2026-05-01 · Yuan Li, Jun Hu, Jiaxin Jiang, Bryan Hooi, Bingsheng He

General AI

Multimodal data plays a critical role in web-based recommendation systems, where information from diverse modalities such as vision and text enhances representation learning. However, real-world multimodal datasets often suffer from modality incompleteness due to sensor failures, annotation scarcity, or privacy constra…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.2

When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models

2026-05-01 · Sailesh Panda, Pritam Kadasi, Abhishek Upperwal, Mayank Singh

General AI

Large language models (LLMs) often achieve strong performance on reasoning benchmarks, but final-answer accuracy alone does not show whether they faithfully execute the procedure specified in a prompt. We study this question through a controlled diagnostic benchmark for procedural execution, where models are given a st…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.2

Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detection

2026-05-04 · Mohamad Khajezade, Fatemeh H. Fard, Mohamed Sami Shehata

General AI

Cross-language code clone detection (X-CCD) is challenging because semantically equivalent programs written in different languages often share little surface similarity. Although large language models (LLMs) have shown promise for semantic clone detection, their use as black-box systems raises concerns about cost, repr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.2

Visual Latents Know More Than They Say: Unsilencing Latent Reasoning in MLLMs

2026-05-04 · Xin Zhang, Qiqi Tao, Jiawei Du, Moyun Liu, Joey Tianyi Zhou

General AI

Continuous latent-space reasoning offers a compact alternative to textual chain-of-thought for multimodal models, enabling high-dimensional visual evidence to be integrated without explicit reasoning tokens. However, we identify a previously overlooked optimization pathology in existing latent visual reasoning methods:…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.0

Reframing Long-Tailed Learning via Loss Landscape Geometry

2026-03-22 · Shenghan Chen, Yiming Liu, Yanzhen Wang, Yujia Wang, Xiankai Lu

Research Track A · General AI

Balancing performance trade-off on long-tail (LT) data distributions remains a long-standing challenge. In this paper, we posit that this dilemma stems from a phenomenon called "tail performance degradation" (the model tends to severely overfit on head classes while quickly forgetting tail classes) and pose a solution …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 13.0

BioVITA: Biological Dataset, Model, and Benchmark for Visual-Textual-Acoustic Alignment

2026-03-25 · Risa Shinoda, Kaede Shiohara, Nakamasa Inoue, Kuniaki Saito, Hiroaki Santo, Fumio Okura

General AI

Understanding animal species from multimodal data poses an emerging challenge at the intersection of computer vision and ecology. While recent biological models, such as BioCLIP, have demonstrated strong alignment between images and textual taxonomic information for species identification, the integration of the audio …

Review
pending
Role
unreviewed
Read
now
huggingface Score 13.0

Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models

2026-03-25 · Isha Puri, Mehul Damani, Idan Shenfeld, Marzyeh Ghassemi, Jacob Andreas, Yoon Kim

General AI

Given a question, a language model (LM) implicitly encodes a distribution over possible answers. In practice, post-training procedures for LMs often collapse this distribution onto a single dominant mode. While this is generally not a problem for benchmark-style evaluations that assume one correct answer, many real-wor…

Review
pending
Role
unreviewed
Read
now
huggingface Score 13.0

Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization

2026-03-30 · He Du, Qiming Ge, Jiakai Hu, Aijun Yang, Zheng Cai, Zixian Huang, Sheng Yuan, Qinxiu Cheng, Xinchen Xie, Yicheng Chen, Yining Li, Jiaxing Xie, Huanan Dong, Yaguang Wu, Xiangjun Huang, Jian Yang, Hui Wang, Bowen Zhou, Bowen Li, Qipeng Guo, Kai Chen

General AI

We present Kernel-Smith, a framework for high-performance GPU kernel and operator generation that combines a stable evaluation-driven evolutionary agent with an evolution-oriented post-training recipe. On the agent side, Kernel-Smith maintains a population of executable candidates and iteratively improves them using an…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.0

Continual Vision-Language Learning for Remote Sensing: Benchmarking and Analysis

2026-04-01 · Xingxing Weng, Ruifeng Ni, Chao Pang, XiangYu Hao, Yishan Wang, Xiaokang Zhang, Wei Xu, Gui-Song Xia

Research Track A · General AI

Current remote sensing vision-language models (RS VLMs) demonstrate impressive performance in image interpretation but rely on static training data, limiting their ability to accommodate continuously emerging sensing modalities and downstream tasks. This exposes a fundamental challenge: enabling RS VLMs to continually …

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.0

ProTPS: Prototype-Guided Text Prompt Selection for Continual Learning

2026-04-01 · Jie Mei, Li-Leng Peng, Keith Fuller, Jenq-Neng Hwang

Research Track A

For continual learning, text-prompt-based methods leverage text encoders and learnable prompts to encode semantic features for sequentially arrived classes over time. A common challenge encountered by existing works is how to learn unique text prompts, which implicitly carry semantic information of new classes, so that…

Review
pending
Role
unreviewed
Read
now
huggingface Score 13.0

Can LLMs Learn to Reason Robustly under Noisy Supervision?

2026-04-05 · Shenzhi Yang, Guangcheng Zhu, Bowen Song, Sharon Li, Haobo Wang, Xing Zheng, Yingfan Ma, Zhongqi Chen, Weiqiang Wang, Gang Chen

General AI

Reinforcement Learning with Verifiable Rewards (RLVR) effectively trains reasoning models that rely on abundant perfect labels, but its vulnerability to unavoidable noisy labels due to expert scarcity remains critically underexplored. In this work, we take the first step toward a systematic analysis of noisy label mech…

Review
pending
Role
unreviewed
Read
now
huggingface Score 13.0

QEIL v2: Heterogeneous Computing for Edge Intelligence via Roofline-Derived Pareto-Optimal Energy Modeling and Multi-Objective Orchestration

2026-04-05 · Satyam Kumar, Saurabh Jha

General AI

Deploying large language models (LLMs) on heterogeneous edge devices demands frameworks that jointly optimize energy efficiency, inference quality, and reliability. Our prior QEIL v1 (Kumar & Jha, 2026) achieved 4.82x IPW improvement but relied on static efficiency factors, greedy optimization, and unverified candidate…

Review
pending
Role
unreviewed
Read
now
huggingface Score 13.0

How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings

2026-04-06 · Yujian Liu, Jiabao Ji, Li An, Tommi Jaakkola, Yang Zhang, Shiyu Chang

General AI

Agent skills, which are reusable, domain-specific knowledge artifacts, have become a popular mechanism for extending LLM-based agents, yet formally benchmarking skill usage performance remains scarce. Existing skill benchmarking efforts focus on overly idealized conditions, where LLMs are directly provided with hand-cr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.0

Adaptive Data Dropout: Towards Self-Regulated Learning in Deep Neural Networks

2026-04-14 · Amar Gahir, Varshil Patel, Shreyank N Gowda

Research Track A · General AI

Deep neural networks are typically trained by uniformly sampling large datasets across epochs, despite evidence that not all samples contribute equally throughout learning. Recent work shows that progressively reducing the amount of training data can improve efficiency and generalization, but existing methods rely on f…

Review
pending
Role
unreviewed
Read
now
huggingface Score 13.0

PRL-Bench: A Comprehensive Benchmark Evaluating LLMs' Capabilities in Frontier Physics Research

2026-04-16 · Tingjia Miao, Wenkai Jin, Muhua Zhang, Jinxin Tan, Yuelin Hu, Tu Guo, Jiejun Zhang, Yuhan Wang, Wenbo Li, Yinuo Gao, Shuo Chen, Weiqi Jiang, Yayun Hu, Zixing Lei, Xianghe Pang, Zexi Liu, Yuzhi Zhang, Linfeng Zhang, Kun Chen, Wei Wang, Weinan E, Siheng Chen

General AI

The paradigm of agentic science requires AI systems to conduct robust reasoning and engage in long-horizon, autonomous exploration. However, current scientific benchmarks remain confined to domain knowledge comprehension and complex reasoning, failing to evaluate the exploratory nature and procedural complexity of real…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.0

Why Fine-Tuning Encourages Hallucinations and How to Fix It

2026-04-16 · Guy Kaplan, Zorik Gekhman, Zhen Zhu, Lotem Rozner, Yuval Reif, Swabha Swayamdipta, Derek Hoiem, Roy Schwartz

Research Track A · General AI

Large language models are prone to hallucinating factually incorrect statements. A key source of these errors is exposure to new factual information through supervised fine-tuning (SFT), which can increase hallucinations w.r.t. knowledge acquired during pre-training. In this work, we explore whether SFT-induced halluci…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.0

BAMI: Training-Free Bias Mitigation in GUI Grounding

2026-05-07 · Borui Zhang, Bo Zhang, Bo Wang, Wenzhao Zheng, Yuhao Cheng, Liang Tang, Yiqiang Yan, Jie Zhou, Jiwen Lu

Research Track B · General AI

GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance. Utilizing the proposed \textbf{Masked Prediction Distribution (MPD)} attribution metho…

Review
pending
Role
unreviewed
Read
now
huggingface Score 13.0

BioTool: A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models

2026-05-07 · Xin Gao, Ruiyi Zhang, Meixi Du, Peijia Qin, Pengtao Xie

Research Track A · General AI

Despite the success of large language models (LLMs) on general-purpose tasks, their performance in highly specialized domains such as biomedicine remains unsatisfactory. A key limitation is the inability of LLMs to effectively leverage biomedical tools, which clinical experts and biomedical researchers rely on extensiv…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.0

On the Implicit Reward Overfitting and the Low-rank Dynamics in RLVR

2026-05-07 · Hao Ye, Jisheng Dang, Junfeng Fang, Bimei Wang, Yizhou Zhang, Ning Lv, Wencan Zhang, Hong Peng, Bin Hu, Tat-Seng Chua

Research Track A · General AI

Recent extensive research has demonstrated that the enhanced reasoning capabilities acquired by models through Reinforcement Learning with Verifiable Rewards (RLVR) are primarily concentrated within the rank-1 components. Predicated on this observation, we employed Periodic Rank-1 Substitution and identified a counteri…

Review
pending
Role
unreviewed
Read
now
huggingface Score 12.9

Agentic Fusion of Large Atomic and Language Models to Accelerate Superconductors Discovery

2026-04-29 · Mingze Li, Yu Rong, Songyou Li, Lihong Wang, Jiacheng Cen, Liming Wu, Anyi Li, Zongzhao Li, Qiuliang Liu, Rui Jiao, Tian Bian, Pengju Wang, Hao Sun, Jianfeng Zhang, Ji-Rong Wen, Deli Zhao, Shifeng Jin, Tingyang Xu, Wenbing Huang

General AI

The discovery of novel materials is critical for global energy and quantum technology transitions. While deep learning has fundamentally reshaped this landscape, existing predictive or generative models typically operate in isolation, lacking the autonomous orchestration required to execute the full discovery process. …

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.9

SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters

2026-05-01 · Dongxin Guo, Jikun Wu, Siu Ming Yiu

Research Track B · General AI

AI agents execute tens to hundreds of chained LLM calls per task, yet GPU schedulers treat each call as independent, discarding gigabytes of intermediate state between steps and inflating end-to-end latency by 3-8x. We argue that this request-level abstraction is fundamentally mismatched to compound AI workloads, and p…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

ELITE: Experiential Learning and Intent-Aware Transfer for Self-improving Embodied Agents

2026-03-25 · Bingqing Wei, Zhongyu Xia, Dingai Liu, Xiaoyu Zhou, Zhiwei Lin, Yongtao Wang

General AI

Vision-language models (VLMs) have shown remarkable general capabilities, yet embodied agents built on them fail at complex tasks, often skipping critical steps, proposing invalid actions, and repeating mistakes. These failures arise from a fundamental gap between the static training data of VLMs and the physical inter…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning

2026-03-26 · Zirui Zhang, Haoyu Dong, Kexin Pei, Chengzhi Mao

General AI

Robust perception and reasoning require consistency across sensory modalities. Yet current multimodal models often violate this principle, yielding contradictory predictions for visual and textual representations of the same concept. Rather than masking these failures with standard voting mechanisms, which can amplify …

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

Vega: Learning to Drive with Natural Language Instructions

2026-03-26 · Sicheng Zuo, Yuxuan Li, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu

General AI

Vision-language-action models have reshaped autonomous driving to incorporate languages into the decision-making process. However, most existing pipelines only utilize the language modality for scene descriptions or reasoning and lack the flexibility to follow diverse user instructions for personalized driving. To addr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

FairLLaVA: Fairness-Aware Parameter-Efficient Fine-Tuning for Large Vision-Language Assistants

2026-03-27 · Mahesh Bhosale, Abdul Wasi, Shantam Srivastava, Shifa Latif, Tianyu Luan, Mingchen Gao, David Doermann, Xuan Gong

General AI

While powerful in image-conditioned generation, multimodal large language models (MLLMs) can display uneven performance across demographic groups, highlighting fairness risks. In safety-critical clinical settings, such disparities risk producing unequal diagnostic narratives and eroding trust in AI-assisted decision-ma…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

AgentSwing: Adaptive Parallel Context Management Routing for Long-Horizon Web Agents

2026-03-29 · Zhaopeng Feng, Liangcai Su, Zhen Zhang, Xinyu Wang, Xiaotian Zhang, Xiaobin Wang, Runnan Fang, Qi Zhang, Baixuan Li, Shihao Cai, Rui Ye, Hui Chen, Jiang Yong, Joey Tianyi Zhou, Chenxiong Qian, Pengjun Xie, Bryan Hooi, Zuozhu Liu, Jingren Zhou

Research Track B · General AI

As large language models (LLMs) evolve into autonomous agents for long-horizon information-seeking, managing finite context capacity has become a critical bottleneck. Existing context management methods typically commit to a single fixed strategy throughout the entire trajectory. Such static designs may work well in so…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

See it to Place it: Evolving Macro Placements with Vision-Language Models

2026-03-30 · Ikechukwu Uchendu, Swati Goel, Karly Hou, Ebrahim Songhori, Kuang-Huei Lee, Joe Wenjie Jiang, Vijay Janapa Reddi, Vincent Zhuang

General AI

We propose using Vision-Language Models (VLMs) for macro placement in chip floorplanning, a complex optimization task that has recently shown promising advancements through machine learning methods. Because human designers rely heavily on spatial reasoning to arrange components on the chip canvas, we hypothesize that V…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

PsychAgent: An Experience-Driven Lifelong Learning Agent for Self-Evolving Psychological Counselor

2026-04-01 · Yutao Yang, Junsong Li, Qianjun Pan, Jie Zhou, Kai Chen, Qin Chen, Jingyuan Zhao, Ningning Zhou, Xin Li, Liang He

Research Track A · General AI

Existing methods for AI psychological counselors predominantly rely on supervised fine-tuning using static dialogue datasets. However, this contrasts with human experts, who continuously refine their proficiency through clinical practice and accumulated experience. To bridge this gap, we propose an Experience-Driven Li…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

Beyond Referring Expressions: Scenario Comprehension Visual Grounding

2026-04-02 · Ruozhen He, Nisarg A. Shah, Qihua Dong, Zilin Xiao, Jaywon Koo, Vicente Ordonez

General AI

Existing visual grounding benchmarks primarily evaluate alignment between image regions and literal referring expressions, where models can often succeed by matching a prominent named category. We explore a complementary and more challenging setting of scenario-based visual grounding, where the target must be inferred …

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

2026-04-02 · Zhengxi Lu, Zhiyuan Yao, Jinyang Wu, Chengcheng Han, Qi Gu, Xunliang Cai, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen

General AI

Agent skills, structured packages of procedural knowledge and executable resources that agents dynamically load at inference time, have become a reliable mechanism for augmenting LLM agents. Yet inference-time skill augmentation is fundamentally limited: retrieval noise introduces irrelevant guidance, injected skill co…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

The Tool Illusion: Rethinking Tool Use in Web Agents

2026-04-03 · Renze Lou, Baolin Peng, Wenlin Yao, Qianhui Wu, Hao Cheng, Suman Nath, Wenpeng Yin, Jianfeng Gao

Research Track B · General AI

As web agents rapidly evolve, an increasing body of work has moved beyond conventional atomic browser interactions and explored tool use as a higher-level action paradigm. Although prior studies have shown the promise of tools, their conclusions are often drawn from limited experimental scales and sometimes non-compara…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

Assessing Large Language Models for Stabilizing Numerical Expression in Scientific Software

2026-04-06 · Tien Nguyen, Muhammad Ali Gulzar, Kirshanthan Sundararajah

General AI

Scientific software relies on high-precision computation, yet finite floating-point representations can introduce precision errors that propagate in safety-critical domains. Despite the growing use of large language models (LLMs) in scientific applications, their reliability in handling floating-point numerical stabili…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

2026-04-06 · Weian Mao, Xi Lin, Wei Huang, Yuxin Xie, Tianfu Fu, Bohan Zhuang, Song Han, Yukang Chen

General AI

Extended reasoning in large language models (LLMs) creates severe KV cache memory bottlenecks. Leading KV cache compression methods estimate KV importance using attention scores from recent post-RoPE queries. However, queries rotate with position during RoPE, making representative queries very few, leading to poor top-…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

Leveraging Image Editing Foundation Models for Data-Efficient CT Metal Artifact Reduction

2026-04-07 · Ahmet Rasim Emirdagi, Süleyman Aslan, Mısra Yavuz, Görkay Aydemir, Yunus Bilge Kurt, Nasrin Rahimi, Burak Can Biner, M. Akın Yılmaz

General AI

Metal artifacts from high-attenuation implants severely degrade CT image quality, obscuring critical anatomical structures and posing a challenge for standard deep learning methods that require extensive paired training data. We propose a paradigm shift: reframing artifact reduction as an in-context reasoning task by a…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

Lightweight LLM Agent Memory with Small Language Models

2026-04-09 · Jiaquan Zhang, Chaoning Zhang, Shuxu Chen, Zhenzhen Huang, Pengcheng Zheng, Zhicheng Wang, Ping Guo, Fan Mo, Sung-Ho Bae, Jie Zou, Jiwei Wei, Yang Yang

Research Track A · General AI

Although LLM agents can leverage tools for complex tasks, they still need memory to maintain cross-turn consistency and accumulate reusable information in long-horizon interactions. However, retrieval-based external memory systems incur low online overhead but suffer from unstable accuracy due to limited query construc…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

CGU-ILALab at FoodBench-QA 2026: Comparing Traditional and LLM-based Approaches for Recipe Nutrient Estimation

2026-04-28 · Wei-Chun Chen, Yu-Xuan Chen, I-Fang Chung, Ying-Jia Lin

General AI

Accurate nutrient estimation from unstructured recipe text is an important yet challenging problem in dietary monitoring, due to ambiguous ingredient terminology and highly variable quantity expressions. We systematically evaluate models spanning a wide range of representational capacity, from lexical matching methods …

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

MarkIt: Training-Free Visual Markers for Precise Video Temporal Grounding

2026-04-28 · Pengcheng Fang, Yuxia Chen, Xiaohao Cai

General AI

Video temporal grounding (VTG) aims to localize the start and end timestamps of the event described by a given query within an untrimmed video. Despite the strong open-world video understanding and recognition ability of video language large models (Vid-LLMs), outputting precise temporal grounding information remains c…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generation

2026-05-07 · Omar El Khalifi, Thomas Rossi, Oscar Fossey, Thibault Fouque, Ulysse Mizrahi, Philip Torr, Ivan Laptev, Fabio Pizzati, Baptiste Bellot-Gurlet

General AI

For artistic applications, video generation requires fine-grained control over both performance and cinematography, i.e., the actor's motion and the camera trajectory. We present ActCam, a zero-shot method for video generation that jointly transfers character motion from a driving video into a new scene and enables per…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

NeuroAgent: LLM Agents for Multimodal Neuroimaging Analysis and Research

2026-05-07 · Lujia Zhong, Yihao Xia, Jianwei Zhang, Shuo huang, Jiaxin Yue, Mingyang Xia, Yonggang Shi

General AI

Multimodal neuroimaging analysis often involves complex, modality-specific preprocessing workflows that require careful configuration, quality control, and coordination across heterogeneous toolchains. Beyond preprocessing, downstream statistical analysis and disease classification commonly require task-specific code, …

Review
pending
Role
unreviewed
Read
now
huggingface Score 12.8

GLiNER-Relex: A Unified Framework for Joint Named Entity Recognition and Relation Extraction

2026-05-11 · Ihor Stepanov, Oleksandr Lukashov, Mykhailo Shtopko, Vivek Kalyanarangan

General AI

Joint named entity recognition (NER) and relation extraction (RE) is a fundamental task in natural language processing for constructing knowledge graphs from unstructured text. While recent approaches treat NER and RE as separate tasks requiring distinct models, we introduce GLiNER-Relex, a unified architecture that ex…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.6

Controllability in preference-conditioned multi-objective reinforcement learning

2026-05-11 · Pau de las Heras Molins, Beyazit Yalcinkaya, Lasse Peters, David Fridovich-Keil, Georgios Bakirtzis

General AI

Multi-objective reinforcement learning (MORL) allows a user to express preference over outcomes in terms of the relative importance of the objectives, but standard metrics cannot capture whether changes in preference reliably change the agent's behavior in the intended way, a property termed controllability. As a resul…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.5

LACE: Loss-Adaptive Capacity Expansion for Continual Learning

2026-03-30 · Shivnath Tathe

Research Track A

Fixed representational capacity is a fundamental constraint in continual learning: practitioners must guess an appropriate model width before training, without knowing how many distinct concepts the data contains. We propose LACE (Loss-Adaptive Capacity Expansion), a simple online mechanism that expands a model's repre…

Review
pending
Role
unreviewed
Read
now
huggingface Score 12.5

Brainstacks: Cross-Domain Cognitive Capabilities via Frozen MoE-LoRA Stacks for Continual LLM Learning

2026-04-01 · Mohammad R. Abu Ayyash

Research Track A · General AI

We present Brainstacks, a modular architecture for continual multi-domain fine-tuning of large language models that packages domain expertise as frozen adapter stacks composing additively on a shared frozen base at inference. Five interlocking components: (1) MoE-LoRA with Shazeer-style noisy top-2 routing across all s…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.5

When Users Change Their Mind: Evaluating Interruptible Agents in Long-Horizon Web Navigation

2026-04-01 · Henry Peng Zou, Chunyu Miao, Wei-Chieh Huang, Yankai Chen, Yue Zhou, Hanrong Zhang, Yaozu Wu, Liancheng Fang, Zhengyao Gu, Zhen Zhang, Kening Zheng, Fangxin Wang, Yi Nian, Shanghao Li, Wenzhe Fan, Langzhou He, Weizhi Zhang, Xue Liu, Philip S. Yu

Research Track B · General AI

As LLM agents transition from short, static problem solving to executing complex, long-horizon tasks in dynamic environments, the ability to handle user interruptions, such as adding requirement or revising goals, during mid-task execution is becoming a core requirement for realistic deployment. However, existing bench…

Review
pending
Role
unreviewed
Read
now
huggingface Score 12.5

Many-Tier Instruction Hierarchy in LLM Agents

2026-04-10 · Jingyu Zhang, Tianjian Li, William Jurayj, Hongyuan Zhan, Benjamin Van Durme, Daniel Khashabi

General AI

Large language model agents receive instructions from many sources-system messages, user prompts, tool outputs, and more-each carrying different levels of trust and authority. When these instructions conflict, models must reliably follow the highest-privilege instruction to remain safe and effective. The dominant parad…

Review
pending
Role
unreviewed
Read
now
huggingface Score 12.5

Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs

2026-04-12 · Yu Li, Xiaoran Shang, Qizhi Pei, Yun Zhu, Xin Gao, Honglin Lin, Zhanping Zhong, Zhuoshi Pan, Zheng Liu, Xiaoyang Wang, Conghui He, Dahua Lin, Feng Zhao, Lijun Wu

General AI

Post-training data plays a pivotal role in shaping the capabilities of Large Language Models (LLMs), yet datasets are often treated as isolated artifacts, overlooking the systemic connections that underlie their evolution. To disentangle these complex relationships, we introduce the concept of data lineage to the LLM e…

Review
pending
Role
unreviewed
Read
now
huggingface Score 12.5

UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

2026-04-20 · Jiaqi Wang, Haoge Deng, Ting Pan, Yang Liu, Chengyuan Wang, Fan Zhang, Yonggang Qi, Xinlong Wang

General AI

Uniform Discrete Diffusion Model (UDM) has recently emerged as a promising paradigm for discrete generative modeling; however, its integration with reinforcement learning remains largely unexplored. We observe that naively applying GRPO to UDM leads to training instability and marginal performance gains. To address thi…

Review
pending
Role
unreviewed
Read
now
huggingface Score 12.5

Learning Evidence Highlighting for Frozen LLMs

2026-04-24 · Shaoang Li, Yanhang Shi, Yufei Li, Mingfu Liang, Xiaohan Wei, Yunchen Pu, Fei Tian, Chonglin Sun, Frank Shyu, Luke Simon, Sandeep Pandey, Xi Liu, Jian Li

General AI

Large Language Models (LLMs) can reason well, yet often miss decisive evidence when it is buried in long, noisy contexts. We introduce HiLight, an Evidence Emphasis framework that decouples evidence selection from reasoning for frozen LLM solvers. HiLight avoids compressing or rewriting the input, which can discard or …

Review
pending
Role
unreviewed
Read
now
huggingface Score 12.5

Stabilizing Efficient Reasoning with Step-Level Advantage Selection

2026-04-27 · Han Wang, Xiaodong Yu, Jialian Wu, Jiang Liu, Ximeng Sun, Mohit Bansal, Zicheng Liu

General AI

Large language models (LLMs) achieve strong reasoning performance by allocating substantial computation at inference time, often generating long and verbose reasoning traces. While recent work on efficient reasoning reduces this overhead through length-based rewards or pruning, many approaches are post-trained under a …

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.5

Beyond Forgetting in Continual Medical Image Segmentation: A Comprehensive Benchmark Study

2026-05-07 · Bomin Wang, Hangqi Zhou, Yibo Gao, Xiahai Zhuang

Research Track A · General AI

Continual learning (CL) is essential for deploying medical image segmentation models in clinical environments where imaging domains, anatomical targets, and diagnostic tasks evolve over time. However, continual segmentation still faces three main challenges. First, the scenarios for this task remain insufficiently stan…

Review
pending
Role
unreviewed
Read
now
huggingface Score 12.5

L2P: Unlocking Latent Potential for Pixel Generation

2026-05-12 · Zhennan Chen, Junwei Zhu, Xu Chen, Jiangning Zhang, Jiawei Chen, Zhuoqi Zeng, Wei Zhang, Chengjie Wang, Jian Yang, Ying Tai

General AI

Pixel diffusion models have recently regained attention for visual generation. However, training advanced pixel-space models from scratch demands prohibitive computational and data resources. To address this, we propose the Latent-to-Pixel (L2P) transfer paradigm, an efficient framework that directly harnesses the rich…

Review
pending
Role
unreviewed
Read
now
huggingface Score 12.5

WildRelight: A Real-World Benchmark and Physics-Guided Adaptation for Single-Image Relighting

2026-05-12 · Lezhong Wang, Mehmet Onurcan Kaya, Siavash Bigdeli, Jeppe Revall Frisvad

General AI

Recent single-image relighting methods, powered by advanced generative models, have achieved impressive photorealism on synthetic benchmarks. However, their effectiveness in the complex visual landscape of the real world remains largely unverified. A critical gap exists, as current datasets are typically designed for m…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.4

NORACL: Neurogenesis for Oracle-free Resource-Adaptive Continual Learning

2026-04-29 · Karthik Charan Raghunathan, Christian Metzner, Laura Kriener, Melika Payvand

Research Track A · General AI

In a continual learning setting, we require a model to be plastic enough to learn a new task and stable enough to not disturb previously learned capabilities. We argue that this dilemma has an architectural root. A finite network has limited representational and plastic resources, yet the required capacity depends on p…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.4

When Does Structure Matter in Continual Learning? Dimensionality Controls When Modularity Shapes Representational Geometry

2026-04-30 · Kathrin Korte, Joachim Winter Pedersen, Eleni Nisioti, Sebastian Risi

Research Track A

To preserve previously learned representations, continual learning systems must strike a balance between plasticity, the ability to acquire new knowledge, and stability. This stability-plasticity dilemma affects how representations can be reused across tasks: shared structure enables transfer when tasks are similar but…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.4

Autonomous Drift Learning in Data Streams: A Unified Perspective

2026-05-02 · Xiaoyu Yang, En Yu, Jie Lu

Research Track A

In the pursuit of autonomous learning systems, the foundational assumption of stationarity, the premise that data distributions and model behaviors remain constant, is fundamentally untenable. Historically, the research community has addressed non-stationary environments almost exclusively under the scope of concept dr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.3

Ambivalence/Hesitancy Recognition in Videos for Personalized Digital Health Interventions

2026-04-13 · Manuela González-González, Soufiane Belharbi, Muhammad Osama Zeeshan, Masoumeh Sharafi, Muhammad Haseeb Aslam, Lorenzo Sia, Nicolas Richet, Marco Pedersoli, Alessandro Lameiras Koerich, Simon L Bacon, Eric Granger

General AI

Using behavioural science, health interventions focus on behaviour change by providing a framework to help patients acquire and maintain healthy habits that improve medical outcomes. In-person interventions are costly and difficult to scale, especially in resource-limited regions. Digital health interventions offer a c…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.3

StarVLA-$α$: Reducing Complexity in Vision-Language-Action Systems

2026-04-13 · Jinhui Ye, Ning Gao, Senqiao Yang, Jinliang Zheng, Zixuan Wang, Yuxin Chen, Pengguang Chen, Yilun Chen, Shu Liu, Jiaya Jia

General AI

Vision-Language-Action (VLA) models have recently emerged as a promising paradigm for building general-purpose robotic agents. However, the VLA landscape remains highly fragmented and complex: as existing approaches vary substantially in architectures, training data, embodiment configurations, and benchmark-specific en…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.3

LIFE -- an energy efficient advanced continual learning agentic AI framework for frontier systems

2026-04-14 · Anne Lee, Gurudutt Hosangadi

Research Track A · General AI

The rapid advancement of AI has changed the character of HPC usage such as dimensioning, provisioning, and execution. Not only has energy demand been amplified, but existing rudimentary continual learning capabilities limit ability of AI to effectively manage HPCs. This paper reviews emerging directions beyond monolith…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.3

Toward Autonomous Long-Horizon Engineering for ML Research

2026-04-14 · Guoxin Chen, Jie Chen, Lei Chen, Jiale Zhao, Fanzhe Meng, Wayne Xin Zhao, Ruihua Song, Cheng Chen, Ji-Rong Wen, Kai Jia

General AI

Autonomous AI research has advanced rapidly, but long-horizon ML research engineering remains difficult: agents must sustain coherent progress across task comprehension, environment setup, implementation, experimentation, and debugging over hours or days. We introduce AiScientist, a system for autonomous long-horizon e…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.3

From Tokens to Steps: Verification-Aware Speculative Decoding for Efficient Multi-Step Reasoning

2026-04-16 · Kiran Purohit, Ramasuri Narayanam, Soumyabrata Pal

General AI

Speculative decoding (SD) accelerates large language model inference by allowing a lightweight draft model to propose outputs that a stronger target model verifies. However, its token-centric nature allows erroneous steps to propagate. Prior approaches mitigate this using external reward models, but incur additional la…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.3

Targeted Exploration via Unified Entropy Control for Reinforcement Learning

2026-04-16 · Chen Wang, Lai Wei, Yanzhi Zhang, Chenyang Shao, Zedong Dan, Weiran Huang, Ge Lan, Yue Wang

General AI

Recent advances in reinforcement learning (RL) have improved the reasoning capabilities of large language models (LLMs) and vision-language models (VLMs). However, the widely used Group Relative Policy Optimization (GRPO) consistently suffers from entropy collapse, causing the policy to converge prematurely and lose di…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.3

Semantic Area Graph Reasoning for Multi-Robot Language-Guided Search

2026-04-17 · Ruiyang Wang, Hao-Lun Hsu, Jiwoo Kim, Miroslav Pajic

General AI

Coordinating multi-robot systems (MRS) to search in unknown environments is particularly challenging for tasks that require semantic reasoning beyond geometric exploration. Classical coordination strategies rely on frontier coverage or information gain and cannot incorporate high-level task intent, such as searching fo…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.3

HeLa-Mem: Hebbian Learning and Associative Memory for LLM Agents

2026-04-18 · Jinchang Zhu, Jindong Li, Cheng Zhang, Jiahong Liu, Menglin Yang

Research Track A · General AI

Long-term memory is a critical challenge for Large Language Model agents, as fixed context windows cannot preserve coherence across extended interactions. Existing memory systems represent conversation history as unstructured embedding vectors, retrieving information through semantic similarity. This paradigm fails to …

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.3

Beyond Meta-Reasoning: Metacognitive Consolidation for Self-Improving LLM Reasoning

2026-04-19 · Ziqing Zhuang, Linhai Zhang, Jiasheng Si, Deyu Zhou, Yulan He

Research Track A · General AI

Large language models (LLMs) have demonstrated strong reasoning capabilities, and as existing approaches for enhancing LLM reasoning continue to mature, increasing attention has shifted toward meta-reasoning as a promising direction for further improvement. However, most existing meta-reasoning methods remain episodic:…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.3

AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model

2026-04-21 · Yutian Chen, Shi Guo, Renbiao Jin, Tianshuo Yang, Xin Cai, Yawen Luo, Mingxin Yang, Mulin Yu, Linning Xu, Tianfan Xue

General AI

Sparse-view 3D reconstruction is essential for modeling scenes from casual captures, but remain challenging for non-generative reconstruction. Existing diffusion-based approaches mitigates this issues by synthesizing novel views, but they often condition on only one or two capture frames, which restricts geometric cons…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.3

DT2IT-MRM: Debiased Preference Construction and Iterative Training for Multimodal Reward Modeling

2026-04-21 · Zhihong Zhang, Jie Zhao, Xiaojian Huang, Jin Xu, Zhuodong Luo, Xin Liu, Jiansheng Wei, Xuejin Chen

General AI

Multimodal reward models (MRMs) play a crucial role in aligning Multimodal Large Language Models (MLLMs) with human preferences. Training a good MRM requires high-quality multimodal preference data. However, existing preference datasets face three key challenges: lack of granularity in preference strength, textual styl…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.3

Discovering a Shared Logical Subspace: Steering LLM Logical Reasoning via Alignment of Natural-Language and Symbolic Views

2026-04-21 · Feihao Fang, My T. Thai, Yuanyuan Lei

General AI

Large Language Models (LLMs) still struggle with multi-step logical reasoning. Existing approaches either purely refine the reasoning chain in natural language form or attach a symbolic solver as an external module. In this work, we instead ask whether LLMs contain a shared internal logical subspace that simultaneously…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.3

FASTER: Value-Guided Sampling for Fast RL

2026-04-21 · Perry Dong, Alexander Swerdlow, Dorsa Sadigh, Chelsea Finn

General AI

Some of the most performant reinforcement learning algorithms today can be prohibitively expensive as they use test-time scaling methods such as sampling multiple action candidates and selecting the best one. In this work, we propose FASTER, a method for getting the benefits of sampling-based test-time scaling of diffu…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.3

Pause or Fabricate? Training Language Models for Grounded Reasoning

2026-04-21 · Yiwen Qiu, Linjuan Wu, Yizhou Liu, Yuchen Yan, Jin Ma, Xu Tan, Yao Hu, Daoxin Zhang, Wenqi Zhang, Weiming Lu, Jun Xiao, Yongliang Shen

General AI

Large language models have achieved remarkable progress on complex reasoning tasks. However, they often implicitly fabricate information when inputs are incomplete, producing confident but unreliable conclusions -- a failure mode we term ungrounded reasoning. We argue that this issue arises not from insufficient reason…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.3

PokeVLA: Empowering Pocket-Sized Vision-Language-Action Model with Comprehensive World Knowledge Guidance

2026-04-22 · Yupeng Zheng, Xiang Li, Songen Gu, Yuhang Zheng, Shuai Tian, Weize Li, Linbo Wang, Senyu Fei, Pengfei Li, Yinfeng Gao, Zebin Xing, Yilun Chen, Qichao Zhang, Haoran Li, Wenchao Ding

General AI

Recent advances in Vision-Language-Action (VLA) models have opened new avenues for robot manipulation, yet existing methods exhibit limited efficiency and a lack of high-level knowledge and spatial awareness. To address these challenges, we propose PokeVLA, a lightweight yet powerful foundation model for embodied manip…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.3

R-CoV: Region-Aware Chain-of-Verification for Alleviating Object Hallucinations in LVLMs

2026-04-22 · Jiahao Xie, Alessio Tonioni, Nathalie Rauschmayr, Federico Tombari, Bernt Schiele

General AI

Large vision-language models (LVLMs) have demonstrated impressive performance in various multimodal understanding and reasoning tasks. However, they still struggle with object hallucinations, i.e., the claim of nonexistent objects in the visual input. To address this challenge, we propose Region-aware Chain-of-Verifica…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.3

Context Unrolling in Omni Models

2026-04-23 · Ceyuan Yang, Zhijie Lin, Yang Zhao, Fei Xiao, Hao He, Qi Zhao, Chaorui Deng, Kunchang Li, Zihan Ding, Yuwei Guo, Fuyun Wang, Fangqi Zhu, Xiaonan Nie, Shenhan Zhu, Shanchuan Lin, Hongsheng Li, Weilin Huang, Guang Shi, Haoqi Fan

General AI

We present Omni, a unified multimodal model natively trained on diverse modalities, including text, images, videos, 3D geometry, and hidden representations. We find that such training enables Context Unrolling, where the model explicitly reasons across multiple modal representations before producing predictions. This p…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.3

MTT-Bench: Predicting Social Dominance in Mice via Multimodal Large Language Models

2026-04-24 · Yunquan Chen, Haoyu Chen

General AI

Understanding social dominance in animal behavior is critical for neuroscience and behavioral studies. In this work, we explore the capability of Multimodal Large Language Models(MLLMs) to analyze raw behavioral video of mice and predict their dominance hierarchy. We introduce MTT-Bench, a novel benchmark comprising an…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.3

NeuroClaw Technical Report

2026-04-27 · Cheng Wang, Zhibin He, Zhihao Peng, Shengyuan Liu, Yufan Hu, Lichao Sun, Xiang Li, Yixuan Yuan

General AI

Agentic artificial intelligence systems promise to accelerate scientific workflows, but neuroimaging poses unique challenges: heterogeneous modalities (sMRI, fMRI, dMRI, EEG), long multi-stage pipelines, and persistent reproducibility risks. To address this gap, we present NeuroClaw, a domain-specialized multi-agent re…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.3

When Prompt Under-Specification Improves Code Correctness: An Exploratory Study of Prompt Wording and Structure Effects on LLM-Based Code Generation

2026-04-27 · Amal AKLI, Mike PAPADAKIS, Maxime CORDY, Yves Le TRAON

General AI

Large language models are increasingly used for code generation, yet the correctness of their outputs depends not only on model capability but also on how tasks are specified. Prior studies demonstrate that small changes in natural language prompts, particularly under-specification can substantially reduce code correct…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.2

Machine Collective Intelligence for Explainable Scientific Discovery

2026-04-30 · Gyoung S. Na, Chanyoung Park

Research Track A · General AI

Deriving governing equations from empirical observations is a longstanding challenge in science. Although artificial intelligence (AI) has demonstrated substantial capabilities in function approximation, the discovery of explainable and extrapolatable equations remains a fundamental limitation of modern AI, posing a ce…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.2

PhyCo: Learning Controllable Physical Priors for Generative Motion

2026-04-30 · Sriram Narayanan, Ziyu Jiang, Srinivasa Narasimhan, Manmohan Chandraker

General AI

Modern video diffusion models excel at appearance synthesis but still struggle with physical consistency: objects drift, collisions lack realistic rebound, and material responses seldom match their underlying properties. We present PhyCo, a framework that introduces continuous, interpretable, and physically grounded co…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.2

Generating Statistical Charts with Validation-Driven LLM Workflows

2026-05-01 · Pavlin G. Poličar, Andraž Pevcin, Blaž Zupan

General AI

Generating diverse, readable statistical charts from tabular data remains challenging for LLMs, as many failures become apparent after rendering and are not detectable from data or code alone. Existing chart datasets also rarely provide fully aligned artifacts, such as executable code, dataset context, and question-ans…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.2

SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering

2026-05-04 · Jiujiu Chen, Yazheng Liu, Sihong Xie, Hui Xiong

General AI

Large language models excel at complex reasoning, yet evaluating their intermediate steps remains challenging. Although process reward models provide step-wise supervision, they often suffer from a risk compensation effect, where incorrect steps are offset by later correct ones, assigning high rewards to flawed reasoni…

Review
pending
Role
unreviewed
Read
now
huggingface Score 12.0

Learn2Fold: Structured Origami Generation with World Model Planning

2026-02-02 · Yanjia Huang, Yunuo Chen, Ying Jiang, Jinru Han, Zhengzhong Tu, Yin Yang, Chenfanfu Jiang

General AI

The ability to transform a flat sheet into a complex three-dimensional structure is a fundamental test of physical intelligence. Unlike cloth manipulation, origami is governed by strict geometric axioms and hard kinematic constraints, where a single invalid crease or collision can invalidate the entire folding sequence…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.0

Vibe Code Bench: Evaluating AI Models on End-to-End Web Application Development

2026-03-04 · Hung Tran, Langston Nashold, Rayan Krishnan, Antoine Bigeard, Alex Gu

Research Track B · General AI

Code generation has emerged as one of AI's highest-impact use cases, yet existing benchmarks measure isolated tasks rather than the complete "zero-to-one" process of building a working application from scratch. We introduce Vibe Code Bench, a benchmark of 100 web application specifications (50 public validation, 50 hel…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 12.0

ClawTrap: A MITM-Based Red-Teaming Framework for Real-World OpenClaw Security Evaluation

2026-03-19 · Haochen Zhao, Shaoyang Cui

Research Track B · General AI

Autonomous web agents such as \textbf{OpenClaw} are rapidly moving into high-impact real-world workflows, but their security robustness under live network threats remains insufficiently evaluated. Existing benchmarks mainly focus on static sandbox settings and content-level prompt attacks, which leaves a practical gap …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 12.0

AdaRubric: Task-Adaptive Rubrics for LLM Agent Evaluation

2026-03-22 · Liang Ding

Research Track B · General AI

LLM-as-Judge evaluation fails agent tasks because a fixed rubric cannot capture what matters for this task: code debugging demands Correctness and Error Handling; web navigation demands Goal Alignment and Action Efficiency. We present ADARUBRIC, which closes this gap by generating task-specific evaluation rubrics on th…

Review
pending
Role
unreviewed
Read
now
huggingface Score 12.0

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

2026-03-29 · Meituan LongCat Team, Bin Xiao, Chao Wang, Chengjiang Li, Chi Zhang, Chong Peng, Hang Yu, Hao Yang, Haonan Yan, Haoze Sun, Haozhe Zhao, Hong Liu, Hui Su, Jiaqi Zhang, Jiawei Wang, Jing Li, Kefeng Zhang, Manyuan Zhang, Minhao Jing, Peng Pei, Quan Chen, Taofeng Xue, Tongxin Pan, Xiaotong Li, Xiaoyang Li, Xiaoyu Zhao, Xing Hu, Xinyang Lin, Xunliang Cai, Yan Bai, Yan Feng, Yanjie Li, Yao Qiu, Yerui Sun, Yifan Lu, Ying Luo, Yipeng Mei, Yitian Chen, Yuchen Xie, Yufang Liu, Yufei Chen, Yulei Qian, Yuqi Peng, Zhihang Yu, Zhixiong Han, Changran Wang, Chen Chen, Dian Zheng, Fengjiao Chen, Ge Yang, Haowei Guo, Haozhe Wang, Hongyu Li, Huicheng Jiang, Jiale Hong, Jialv Zou, Jiamu Li, Jianping Lin, Jiaxing Liu, Jie Yang, Jing Jin, Jun Kuang, Juncheng She, Kunming Luo, Kuofeng Gao, Lin Qiu, Linsen Guo, Mianqiu Huang, Qi Li, Qian Wang, Rumei Li, Siyu Ren, Wei Wang, Wenlong He, Xi Chen, Xiao Liu, Xiaoyu Li, Xu Huang, Xuanyu Zhu, Xuezhi Cao, Yaoming Zhu, Yifei Cao, Yimeng Jia, Yizhen Jiang, Yufei Gao, Zeyang Hu, Zhenlong Yuan, Zijian Zhang, Ziwen Wang

General AI

The prevailing Next-Token Prediction (NTP) paradigm has driven the success of large language models through discrete autoregressive modeling. However, contemporary multimodal systems remain language-centric, often treating non-linguistic modalities as external attachments, leading to fragmented architectures and subopt…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.0

COvolve: Adversarial Co-Evolution of Large-Language-Model-Generated Policies and Environments via Two-Player Zero-Sum Game

2026-03-30 · Alkis Sygkounas, Rishi Hazra, Andreas Persson, Pedro Zuidberg Dos Martires, Amy Loutfi

Research Track A · General AI

A central challenge in building continually improving agents is that training environments are typically static or manually constructed. This restricts continual learning and generalization beyond the training distribution. We address this with COvolve, a co-evolutionary framework that leverages large language models (…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.0

Emergence WebVoyager: Toward Consistent and Transparent Evaluation of (Web) Agents in The Wild

2026-03-30 · Deepak Akkil, Mowafak Allaham, Amal Raj, Tamer Abuelsaad, Ravi Kokku

Research Track B · General AI

Reliable evaluation of AI agents operating in complex, real-world environments requires methodologies that are robust, transparent, and contextually aligned with the tasks agents are intended to perform. This study identifies persistent shortcomings in existing AI agent evaluation practices that are particularly acute …

Review
pending
Role
unreviewed
Read
now
huggingface Score 12.0

Experience Transfer for Multimodal LLM Agents in Minecraft Game

2026-04-07 · Chenghao Li, Jun Liu, Songbo Zhang, Huadong Jian, Hao Ni, Lik-Hang Lee, Sung-Ho Bae, Guoqing Wang, Yang Yang, Chaoning Zhang

General AI

Multimodal LLM agents operating in complex game environments must continually reuse past experience to solve new tasks efficiently. In this work, we propose Echo, a transfer-oriented memory framework that enables agents to derive actionable knowledge from prior interactions rather than treating memory as a passive repo…

Review
pending
Role
unreviewed
Read
now
huggingface Score 12.0

Lighting-grounded Video Generation with Renderer-based Agent Reasoning

2026-04-09 · Ziqi Cai, Taoyu Yang, Zheng Chang, Si Li, Han Jiang, Shuchen Weng, Boxin Shi

General AI

Diffusion models have achieved remarkable progress in video generation, but their controllability remains a major limitation. Key scene factors such as layout, lighting, and camera trajectory are often entangled or only weakly modeled, restricting their applicability in domains like filmmaking and virtual production wh…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.0

From Imitation to Discrimination: Progressive Curriculum Learning for Robust Web Navigation

2026-04-14 · Chuang Peng, Wei Zhang, Renshuai Tao, Xinhao Zhang, Jian Yang

Research Track B · General AI

Text-based web agents offer computational efficiency for autonomous web navigation, yet developing robust agents remains challenging due to the noisy and heterogeneous nature of real-world HTML. Standard Supervised Fine-Tuning (SFT) approaches fail in two critical dimensions: they lack discrimination capabilities to re…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.0

Mistake gating leads to energy and memory efficient continual learning

2026-04-15 · Aaron Pache, Mark CW van Rossum

Research Track A · General AI

Synaptic plasticity is metabolically expensive, yet animals continuously update their internal models without exhausting energy reserves. However, when artificial neural networks are trained, the network parameters are typically updated on every sample that is presented, even if the sample was classified correctly. Ins…

Review
pending
Role
unreviewed
Read
now
huggingface Score 12.0

GoClick: Lightweight Element Grounding Model for Autonomous GUI Interaction

2026-04-27 · Hongxin Li, Yuntao Chen, Zhaoxiang Zhang

Research Track B · General AI

Graphical User Interface (GUI) element grounding (precisely locating elements on screenshots based on natural language instructions) is fundamental for agents interacting with GUIs. Deploying this capability directly on resource-constrained devices like mobile phones is increasingly critical for GUI agents requiring lo…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.0

PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors

2026-05-07 · Xinmiao Huang, Jinwei Hu, Rajarshi Roy, Changshun Wu, Yi Dong, Xiaowei Huang

Research Track B · General AI

Large language model (LLM) agents now execute long, tool-using tasks where final outcome checks can arrive too late for intervention. Online warning requires lightweight prefix monitors over heterogeneous traces, but hand-authored event schemas are brittle and deployment-time LLM judging is costly. We introduce PrefixG…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.0

WebTrap: Stealthy Mid-Task Hijacking of Browser Agents During Navigation

2026-05-08 · Zhichao Liu, Wenbo Pan, Haining Yu, Ge Gao, Tianqing Zhu, Xiaohua Jia

Research Track B · General AI

Browser agents are increasingly deployed in long-horizon tasks, which require executing extended action chains to accomplish user goals. However, this prolonged execution process provides attackers with more opportunities to inject malicious instructions. Existing prompt injection attacks against browser agents expose …

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.8

SOMA: Strategic Orchestration and Memory-Augmented System for Vision-Language-Action Model Robustness via In-Context Adaptation

2026-03-25 · Zhuoran Li, Zhiyang Li, Kaijun Zhou, Jinyu Gu

Research Track A · General AI

Despite the promise of Vision-Language-Action (VLA) models as generalist robotic controllers, their robustness against perceptual noise and environmental variations in out-of-distribution (OOD) tasks remains fundamentally limited by the absence of long-term memory, causal failure attribution, and dynamic intervention c…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.8

Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving

2026-03-26 · Zehao Wang, Huaide Jiang, Shuaiwu Dong, Yuping Wang, Hang Qiu, Jiachen Li

General AI

Human driving behavior is inherently personal, which is shaped by long-term habits and influenced by short-term intentions. Individuals differ in how they accelerate, brake, merge, yield, and overtake across diverse situations. However, existing end-to-end autonomous driving systems either optimize for generic objectiv…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.8

Label-Free Cross-Task LoRA Merging with Null-Space Compression

2026-03-27 · Wonyoung Lee, Wooseong Jeong, Kuk-Jin Yoon

General AI

Model merging combines independently fine-tuned checkpoints without joint multi-task training. In the era of foundation-model, fine-tuning with Low-Rank Adaptation (LoRA) is prevalent, making LoRA merging a promising target. Existing approaches can work in homogeneous settings where all target tasks are classification …

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.8

AdaptToken: Entropy-based Adaptive Token Selection for MLLM Long Video Understanding

2026-03-30 · Haozhe Qi, Kevin Qu, Mahdi Rad, Rui Wang, Alexander Mathis, Marc Pollefeys

General AI

Long video understanding remains challenging for Multi-modal Large Language Models (MLLMs) due to high memory costs and context-length limits. Prior approaches mitigate this by scoring and selecting frames/tokens within short clips, but they lack a principled mechanism to (i) compare relevance across distant video clip…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.8

SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning

2026-03-30 · Philip Schroeder, Thomas Weng, Karl Schmeckpeper, Eric Rosen, Stephen Hart, Ondrej Biza

General AI

Vision-language models (VLMs) have shown impressive capabilities across diverse tasks, motivating efforts to leverage these models to supervise robot learning. However, when used as evaluators in reinforcement learning (RL), today's strongest models often fail under partial observability and distribution shift, enablin…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.8

Do Emotions in Prompts Matter? Effects of Emotional Framing on Large Language Models

2026-04-02 · Minda Zhao, Yutong Yang, Chufei Peng, Rachel Gonsalves, Weiyue Li, Ruyi Yang, Zhixi Liu, Mengyu Wang

General AI

Emotional tone is pervasive in human communication, yet its influence on large language model (LLM) behaviour remains unclear. Here, we examine how first-person emotional framing in user-side queries affect LLM performance across six benchmark domains, including mathematical reasoning, medical question answering, readi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.8

Model-Based Reinforcement Learning for Control under Time-Varying Dynamics

2026-04-02 · Klemens Iten, Bruce Lee, Chenhao Li, Lenart Treven, Andreas Krause, Bhavya Sukhija

General AI

Learning-based control methods typically assume stationary system dynamics, an assumption often violated in real-world systems due to drift, wear, or changing operating conditions. We study reinforcement learning for control under time-varying dynamics. We consider a continual model-based reinforcement learning setting…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.8

Read More, Think More: Revisiting Observation Reduction for Web Agents

2026-04-02 · Masafumi Enomoto, Ryoma Obara, Haochen Zhang, Masafumi Oyamada

Research Track B · General AI

Web agents based on large language models (LLMs) rely on observations of web pages -- commonly represented as HTML -- as the basis for identifying available actions and planning subsequent steps. Prior work has treated the verbosity of HTML as an obstacle to performance and adopted observation reduction as a standard p…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.8

SPARK-IL: Spectral Retrieval-Augmented RAG for Knowledge-driven Deepfake Detection via Incremental Learning

2026-04-04 · Hessen Bougueffa Eutamene, Abdellah Zakaria Sellam, Abdelmalik Taleb-Ahmed, Abdenour Hadid

Research Track A · General AI

Detecting AI-generated images remains a significant challenge because detectors trained on specific generators often fail to generalize to unseen models; however, while pixel-level artifacts vary across models, frequency-domain signatures exhibit greater consistency, providing a promising foundation for cross-generator…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.8

Early Stopping for Large Reasoning Models via Confidence Dynamics

2026-04-06 · Parsa Hosseini, Sumit Nawathe, Mahdi Salmani, Meisam Razaviyayn, Soheil Feizi

General AI

Large reasoning models rely on long chain-of-thought generation to solve complex problems, but extended reasoning often incurs substantial computational cost and can even degrade performance due to overthinking. A key challenge is determining when the model should stop reasoning and produce the final answer. In this wo…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.8

Paper Espresso: From Paper Overload to Research Insight

2026-04-06 · Mingzhe Du, Luu Anh Tuan, Dong Huang, See-kiong Ng

Research Track A · General AI

The accelerating pace of scientific publishing makes it increasingly difficult for researchers to stay current. We present Paper Espresso, an open-source platform that automatically discovers, summarizes, and analyzes trending arXiv papers. The system uses large language models (LLMs) to generate structured summaries w…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.8

QED-Nano: Teaching a Tiny Model to Prove Hard Theorems

2026-04-06 · LM-Provers, Yuxiao Qu, Amrith Setlur, Jasper Dekoninck, Edward Beeching, Jia Li, Ian Wu, Lewis Tunstall, Aviral Kumar

General AI

Proprietary AI systems have recently demonstrated impressive capabilities on complex proof-based problems, with gold-level performance reported at the 2025 International Mathematical Olympiad (IMO). However, the training pipelines behind these systems remain largely undisclosed, and their reliance on large "internal" m…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.8

GRASS: Gradient-based Adaptive Layer-wise Importance Sampling for Memory-efficient Large Language Model Fine-tuning

2026-04-09 · Kaiyuan Tian, Yu Tang, Gongqingjian Jiang, Baihui Liu, Yifu Gao, Xialin Su, Linbo Qiao, Dongsheng Li

General AI

Full-parameter fine-tuning of large language models is constrained by substantial GPU memory requirements. Low-rank adaptation methods mitigate this challenge by updating only a subset of parameters. However, these approaches often limit model expressiveness and yield lower performance than full-parameter fine-tuning. …

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.8

RewardFlow: Generate Images by Optimizing What You Reward

2026-04-09 · Onkar Susladkar, Dong-Hwan Jang, Tushar Prakash, Adheesh Juvekar, Vedant Shah, Ayush Barik, Nabeel Bashir, Muntasir Wahed, Ritish Shrirao, Ismini Lourentzou

General AI

We introduce RewardFlow, an inversion-free framework that steers pretrained diffusion and flow-matching models at inference time through multi-reward Langevin dynamics. RewardFlow unifies complementary differentiable rewards for semantic alignment, perceptual fidelity, localized grounding, object consistency, and human…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.8

AI Co-Mathematician: Accelerating Mathematicians with Agentic AI

2026-05-07 · Daniel Zheng, Ingrid von Glehn, Yori Zwols, Iuliya Beloshapka, Lars Buesing, Daniel M. Roy, Martin Wattenberg, Bogdan Georgiev, Tatiana Schmidt, Andrew Cowie, Fernanda Viegas, Dimitri Kanevsky, Vineet Kahlon, Hartmut Maennel, Sophia Alj, George Holland, Alex Davies, Pushmeet Kohli

General AI

We introduce the AI co-mathematician, a workbench for mathematicians to interactively leverage AI agents to pursue open-ended research. The AI co-mathematician is optimized to provide holistic support for the exploratory and iterative reality of mathematical workflows, including ideation, literature search, computation…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.8

Recursive Agent Optimization

2026-05-07 · Apurva Gandhi, Satyaki Chakraborty, Xiangjun Wang, Aviral Kumar, Graham Neubig

General AI

We introduce Recursive Agent Optimization (RAO), a reinforcement learning approach for training recursive agents: agents that can spawn and delegate sub-tasks to new instantiations of themselves recursively. Recursive agents implement an inference-time scaling algorithm that naturally allows agents to scale to longer c…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.5

Bi-CRCL: Bidirectional Conservative-Radical Complementary Learning with Pre-trained Foundation Models for Class-incremental Medical Image Analysis

2026-03-24 · Xinyao Wu, Zhe Xu, Cheng Chen, Jiawei Ma, Yefeng Zheng, Raymond Kai-yu Tong

Research Track A · General AI

Class-incremental learning (CIL) in medical image-guided diagnosis requires retaining prior diagnostic knowledge while adapting to newly emerging disease categories, which is critical for scalable clinical deployment. This problem is particularly challenging due to heterogeneous data and privacy constraints that preven…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 11.5

Critical Patch-Aware Sparse Prompting with Decoupled Training for Continual Learning on the Edge

2026-04-08 · Wonseon Lim, Jaesung Lee, Dae-Won Kim

Research Track A · General AI

Continual learning (CL) on edge devices requires not only high accuracy but also training-time efficiency to support on-device adaptation under strict memory and computational constraints. While prompt-based continual learning (PCL) is parameter-efficient and achieves competitive accuracy, prior work has focused mainly…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.5

SPASM: Stable Persona-driven Agent Simulation for Multi-turn Dialogue Generation

2026-04-10 · Han Luo, Guy Laban

General AI

Large language models are increasingly deployed in multi-turn settings such as tutoring, support, and counseling, where reliability depends on preserving consistent roles, personas, and goals across long horizons. This requirement becomes critical when LLMs are used to generate synthetic dialogues for training and eval…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.5

HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System

2026-04-15 · Tianshuo Yang, Guanyu Chen, Yutian Chen, Zhixuan Liang, Yitian Liu, Zanxin Chen, Chunpu Xu, Haotian Liang, Jiangmiao Pang, Yao Mu, Ping Luo

General AI

While end-to-end Vision-Language-Action (VLA) models offer a promising paradigm for robotic manipulation, fine-tuning them on narrow control data often compromises the profound reasoning capabilities inherited from their base Vision-Language Models (VLMs). To resolve this fundamental trade-off, we propose HiVLA, a visu…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.5

Reinforcement Learning via Value Gradient Flow

2026-04-15 · Haoran Xu, Kaiwen Hu, Somayeh Sojoudi, Amy Zhang

General AI

We study behavior-regularized reinforcement learning (RL), where regularization toward a reference distribution (the dataset in offline RL or the base model in LLM RL finetuning) is essential to prevent value over-optimization caused by erroneous out-of-distribution extrapolation. Existing methods either rely on repara…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.5

A Semantic Geometry for Uncovering Paradigm Dynamics via Scientific Publications

2026-04-16 · Jinchang Liu, Qingshan Zhou, Hongkan Chen, Yi Bu

Research Track A

Science advances not only by accumulating discovered patterns but by changing how new problems and solutions are expressed. While structural indicators track scholarly attention, they offer only an indirect proxy for the reorganization of meaning. We propose a semantic geometry based on the R-P-C (references, focal pub…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.5

The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation

2026-04-18 · Jiaxin Zhang, Xiangyu Peng, Qinglin Chen, Qinyuan Ye, Caiming Xiong, Chien-Sheng Wu

Research Track A

On-policy distillation (OPD) is an increasingly important paradigm for post-training language models. However, we identify a pervasive Scaling Law of Miscalibration: while OPD effectively improves task accuracy, it systematically traps models in severe overconfidence. We trace this failure to an information mismatch: t…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.5

Enhancing Continual Learning of Vision-Language Models via Dynamic Prefix Weighting

2026-04-20 · Hyeonseo Jang, Hyuk Kwon, Kibok Lee

Research Track A

We investigate recently introduced domain-class incremental learning scenarios for vision-language models (VLMs). Recent works address this challenge using parameter-efficient methods, such as prefix-tuning or adapters, which facilitate model adaptation to downstream tasks by incorporating task-specific information int…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.5

SkillLearnBench: Benchmarking Continual Learning Methods for Agent Skill Generation on Real-World Tasks

2026-04-22 · Shanshan Zhong, Yi Lu, Jingjie Ning, Yibing Wan, Lihan Feng, Yuyi Ao, Leonardo F. R. Ribeiro, Markus Dreyer, Sean Ammirati, Chenyan Xiong

Research Track A · General AI

Skills have become the de facto way to enable LLM agents to perform complex real-world tasks with customized instructions, workflows, and tools, but how to learn them automatically and effectively remains unclear. We introduce SkillLearnBench, the first benchmark for evaluating continual skill learning methods, compris…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.5

Efficient Agent Evaluation via Diversity-Guided User Simulation

2026-04-23 · Itay Nakash, George Kour, Ateret Anaby-Tavor

General AI

Large language models (LLMs) are increasingly deployed as customer-facing agents, yet evaluating their reliability remains challenging due to stochastic, multi-turn interactions. Current evaluation protocols rely on linear Monte Carlo rollouts of complete agent-user conversations to estimate success. However, this appr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.5

Shortcut Solutions Learned by Transformers Impair Continual Compositional Reasoning

2026-05-06 · William T. Redman, Erik C. Johnson, Brian Robinson

Research Track A · General AI

Identifying and exploiting common features across domains is at the heart of the human ability to make analogies, and is believed to be crucial for the ability to continually learn. To do this successfully, general and flexible computational strategies must be developed. While the extent to which Transformer neural net…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.5

Scene-Adaptive Continual Learning for CSI-based Human Activity Recognition with Mixture of Experts

2026-05-07 · Wenhan Zheng, Yuyi Mao, Ivan Wang-Hei Ho

Research Track A

Channel state information (CSI)-based human activity recognition (HAR) is vulnerable to performance degradation under domain shifts across varying physical environments. Continual learning (CL) offers a principled way to learn new domains sequentially while preserving past knowledge, but existing CL solutions for CSI-b…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.4

Efficient Training on Multiple Consumer GPUs with RoundPipe

2026-04-29 · Yibin Luo, Shiwei Gao, Huichuan Zheng, Youyou Lu, Jiwu Shu

General AI

Fine-tuning Large Language Models (LLMs) on consumer-grade GPUs is highly cost-effective, yet constrained by limited GPU memory and slow PCIe interconnects. Pipeline parallelism combined with CPU offloading mitigates these hardware bottlenecks by reducing communication overhead. However, existing PP schedules suffer fr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.4

CTM-AI: A Blueprint for General AI Inspired by a Model of Consciousness

2026-04-30 · Haofei Yu, Yining Zhao, Lenore Blum, Manuel Blum, Paul Pu Liang

Research Track B · General AI

Despite remarkable advances, today's AI systems remain narrow in scope, falling short of the flexible, adaptive, and multisensory intelligence that characterizes human capabilities. This gap has fueled longstanding debates about whether AI might one day achieve human-like generality or even consciousness, and whether t…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 11.4

Leveraging Verifier-Based Reinforcement Learning in Image Editing

2026-04-30 · Hanzhong Guo, Jie Wu, Jie Liu, Yu Gao, Zilyu Ye, Linxiao Yuan, Xionghui Wang, Yizhou Yu, Weilin Huang

General AI

While Reinforcement Learning from Human Feedback (RLHF) has become a pivotal paradigm for text-to-image generation, its application to image editing remains largely unexplored. A key bottleneck is the lack of a robust general reward model for all editing tasks. Existing edit reward models usually give overall scores wi…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.4

Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

2026-05-01 · Indraneil Paul, Glavaš Glavas, Iryna Gurevych

General AI

Reward models (RMs) have become an indispensable fixture of the language model (LM) post-training playbook, enabling policy alignment and test-time scaling. Research on the application of RMs in code generation, however, has been comparatively sparse, with existing work largely focusing on execution feedback. This choi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.4

A Compound AI Agent for Conversational Grant Discovery

2026-05-04 · Zhisheng Tang, Mayank Kejriwal

Research Track B · General AI

Research funding discovery remains fundamentally fragmented: researchers navigate disparate agency portals (e.g., in the United States, NSF, NIH, DARPA, Grants.gov, and many others) with heterogeneous interfaces, search capabilities, and data schemas. We present a compound AI system that unifies this landscape through …

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

Distorted or Fabricated? A Survey on Hallucination in Video LLMs

2026-04-14 · Yiyang Huang, Yitian Zhang, Yizhou Wang, Mingyuan Zhang, Liang Shi, Huimin Zeng, Yun Fu

General AI

Despite significant progress in video-language modeling, hallucinations remain a persistent challenge in Video Large Language Models (Vid-LLMs), referring to outputs that appear plausible yet contradict the content of the input video. This survey presents a comprehensive analysis of hallucinations in Vid-LLMs and intro…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

One Token Away from Collapse: The Fragility of Instruction-Tuned Helpfulness

2026-04-14 · Erfan Baghaei Potraghloo, Seyedarmin Azizi, Souvik Kundu, Massoud Pedram

General AI

Instruction-tuned large language models produce helpful, structured responses, but how robust is this helpfulness when trivially constrained? We show that simple lexical constraints (banning a single punctuation character or common word) cause instruction-tuned LLMs to collapse their responses, losing 14--48% of compre…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

Parallax: Why AI Agents That Think Must Never Act

2026-04-14 · Joel Fokou

General AI

Autonomous AI agents are rapidly transitioning from experimental tools to operational infrastructure, with projections that 80% of enterprise applications will embed AI copilots by the end of 2026. As agents gain the ability to execute real-world actions (reading files, running commands, making network requests, modify…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

Scepsy: Serving Agentic Workflows Using Aggregate LLM Pipelines

2026-04-16 · Marcel Wagenländer, Otto White, Britannio Jarrett, Pedro Silvestre, Yanda Tao, Guo Li, Huanzhou Zhu, Llúis Vilanova, Peter Pietzuch

General AI

Agentic workflows carry out complex tasks by orchestrating multiple large language models (LLMs) and tools. Serving such workflows at a target throughput with low latency is challenging because they can be defined using arbitrary agentic frameworks and exhibit unpredictable execution times: execution may branch, fan-ou…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

The Agentification of Scientific Research: A Physicist's Perspective

2026-04-16 · Xiao-Liang Qi

General AI

This article argues that the most important significance of the AI revolution, especially the rise of large language models, lies not simply in automation, but in a fundamental change in how complex information and human know-how are carried, replicated, and shared. From this perspective, AI for Science is especially i…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

MARCH: Multi-Agent Radiology Clinical Hierarchy for CT Report Generation

2026-04-17 · Yi Lin, Yihao Ding, Yonghui Wu, Yifan Peng

General AI

Automated 3D radiology report generation often suffers from clinical hallucinations and a lack of the iterative verification found in human practice. While recent Vision-Language Models (VLMs) have advanced the field, they typically operate as monolithic "black-box" systems without the collaborative oversight character…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

ClawEnvKit: Automatic Environment Generation for Claw-Like Agents

2026-04-20 · Xirui Li, Ming Li, Derry Xu, Wei-Lin Chiang, Ion Stoica, Cho-Jui Hsieh, Tianyi Zhou

General AI

Constructing environments for training and evaluating claw-like agents remains a manual, human-intensive process that does not scale. We argue that what is needed is not just a dataset, but an automated pipeline capable of generating diverse, verified environments on demand. To this end, we introduce ClawEnvKit, an aut…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering

2026-04-20 · Manan Gupta, Dhruv Kumar

General AI

Large language models frequently commit unrecoverable reasoning errors mid-generation: once a wrong step is taken, subsequent tokens compound the mistake rather than correct it. We introduce $\textbf{Latent Phase-Shift Rollback}$ (LPSR): at each generation step, we monitor the residual stream at a critical layer lcrit,…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

Multilingual Training and Evaluation Resources for Vision-Language Models

2026-04-20 · Daniela Baiamonte, Elena Fano, Matteo Gabburo, Stefano Simonazzi, Leonardo Rigutini, Andrea Zugarini

General AI

Vision Language Models (VLMs) achieved rapid progress in the recent years. However, despite their growth, VLMs development is heavily grounded on English, leading to two main limitations: (i) the lack of multilingual and multimodal datasets for training, and (ii) the scarcity of comprehensive evaluation benchmarks acro…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning

2026-04-21 · Xianming Li, Zongxi Li, Tsz-fung Andrew Lee, Jing Li, Haoran Xie, Qing Li

General AI

Parameter-efficient fine-tuning (PEFT) reduces the training cost of full-parameter fine-tuning for large language models (LLMs) by training only a small set of task-specific parameters while freezing the pretrained backbone. However, existing approaches, such as Low-Rank Adaptation (LoRA), achieve adaptation by inserti…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

Diagnosing CFG Interpretation in LLMs

2026-04-22 · Hanqi Li, Lu Chen, Kai Yu

General AI

As LLMs are increasingly integrated into agentic systems, they must adhere to dynamically defined, machine-interpretable interfaces. We evaluate LLMs as in-context interpreters: given a novel context-free grammar, can LLMs generate syntactically valid, behaviorally functional, and semantically faithful outputs? We intr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

Agentic Artificial Intelligence in Finance: A Comprehensive Survey

2026-04-23 · Irene Aldridge, Jolie An, Riley Burke, Michael Cao, Chia-Yi Chien, Kexin Deng, Ruipeng Deng, Yichen Gao, Olivia Guo, Shunran He, Zheng Li, George Lin, Weihang Lin, Percy Lyu, Alex Ng, Qi Wang, Hanxi Xiao, Dora Xu, Yuanyuan Xue, Sheng Zhang, Sirui Zhang, Yun Zhang, Sirui Zhao, Xiaolong Zhao, Yihan Zhao, Waner Zheng

General AI

The emergence of agentic artificial intelligence (AI) represents a fundamental transformation in financial markets, characterized by autonomous systems capable of reasoning, planning, and adaptive decision-making with minimal human intervention. This comprehensive survey synthesizes recent advances in agentic AI across…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

Long-Horizon Manipulation via Trace-Conditioned VLA Planning

2026-04-23 · Isabella Liu, An-Chieh Cheng, Rui Yan, Geng Chen, Ri-Zhao Qiu, Xueyan Zou, Sha Yi, Hongxu Yin, Xiaolong Wang, Sifei Liu

General AI

Long-horizon manipulation remains challenging for vision-language-action (VLA) policies: real tasks are multi-step, progress-dependent, and brittle to compounding execution errors. We present LoHo-Manip, a modular framework that scales short-horizon VLA execution to long-horizon instruction following via a dedicated ta…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

Machine Behavior in Relational Moral Dilemmas: Moral Rightness, Predicted Human Behavior, and Model Decisions

2026-04-23 · Jiseon Kim, Jea Kwon, Luiz Felipe Vecchietti, Wenchao Dong, Jaehong Kim, Meeyoung Cha

General AI

Human moral judgment is context-dependent and modulated by interpersonal relationships. As large language models (LLMs) increasingly function as decision-support systems, determining whether they encode these social nuances is critical. We characterize machine behavior using the Whistleblower's Dilemma by varying two e…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models

2026-04-23 · Naheed Rayhan, Sohely Jahan

General AI

Large language models (LLMs) are increasingly integrated into sensitive workflows, raising the stakes for adversarial robustness and safety. This paper introduces Transient Turn Injection(TTI), a new multi-turn attack technique that systematically exploits stateless moderation by distributing adversarial intent across …

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

ATRS: Adaptive Trajectory Re-splitting via a Shared Neural Policy for Parallel Optimization

2026-04-24 · Jiajun Yu, Guodong Liu, Li Wang, Pengxiang Zhou, Wentao Liu, Yin He, Chao Xu, Fei Gao, Yanjun Cao

General AI

Parallel trajectory optimization via the Alternating Direction Method of Multipliers (ADMM) has emerged as a scalable approach to long-horizon motion planning. However, existing frameworks typically decompose the problem into parallel subproblems based on a predefined fixed structure. Such structural rigidity often cau…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks

2026-04-24 · Longju Bai, Zhemin Huang, Xingyao Wang, Jiao Sun, Rada Mihalcea, Erik Brynjolfsson, Alex Pentland, Jiaxin Pei

General AI

The wide adoption of AI agents in complex human workflows is driving rapid growth in LLM token consumption. When agents are deployed on tasks that require a significant amount of tokens, three questions naturally arise: (1) Where do AI agents spend the tokens? (2) Which models are more token-efficient? and (3) Can agen…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

Long-Context Aware Upcycling: A New Frontier for Hybrid LLM Scaling

2026-04-27 · Parsa Ashrafi Fashi, Utkarsh Saxena, Mehdi Rezagholizadeh, Aref Jafari, Akash Haridas, Mingyu Yang, Vansh Bhatia, Guihong Li, Vikram Appia, Emad Barsoum

General AI

Hybrid sequence models that combine efficient Transformer components with linear sequence modeling blocks are a promising alternative to pure Transformers, but most are still pretrained from scratch and therefore fail to reuse existing Transformer checkpoints. We study upcycling as a practical path to convert pretraine…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

The Price of Agreement: Measuring LLM Sycophancy in Agentic Financial Applications

2026-04-27 · Zhenyu Zhao, Aparna Balagopalan, Adi Agrawal, Dilshoda Yergasheva, Waseem Alshikh, Daniel M. Bikel

General AI

Given the increased use of LLMs in financial systems today, it becomes important to evaluate the safety and robustness of such systems. One failure mode that LLMs frequently display in general domain settings is that of sycophancy. That is, models prioritize agreement with expressed user beliefs over correctness, leadi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

Masked Generative Transformer Is What You Need for Image Editing

2026-05-11 · Wei Chow, Linfeng Li, Xian Sun, Lingdong Kong, Zefeng Li, Qi Xu, Hang Song, Tian Ye, Xian Wang, Jinbin Bai, Shilin Xu, Xiangtai Li, Junting Pan, Shaoteng Liu, Ran Zhou, Tianshu Yang, Songhua Liu

Research Track A · General AI

Diffusion models dominate image editing, yet their global denoising mechanism entangles edited regions with surrounding context, causing modifications to propagate into areas that should remain intact. We propose a fundamentally different approach by leveraging Masked Generative Transformers (MGTs), whose localized tok…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 11.3

From Web to Pixels: Bringing Agentic Search into Visual Perception

2026-05-12 · Bokang Yang, Xinyi Sun, Kaituo Feng, Xingping Dong, Dongming Wu, Xiangyu Yue

General AI

Visual perception connects high-level semantic understanding to pixel-level perception, but most existing settings assume that the decisive evidence for identifying a target is already in the image or frozen model knowledge. We study a more practical yet harder open-world case where a visible object must first be resol…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 11.3

The Algorithmic Caricature: Auditing LLM-Generated Political Discourse Across Crisis Events

2026-05-12 · Gunjan, Sidahmed Benabderrahmane, Talal Rahwan

General AI

Large Language Models (LLMs) can generate fluent political text at scale, raising concerns about synthetic discourse during crises and social conflict. Existing AI-text detection often focuses on sentence-level cues such as perplexity, burstiness, or token irregularities, but these signals may weaken as generative syst…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 11.2

Electricity price forecasting across Norway's five bidding zones in the post-crisis era

2026-04-29 · My Thi Diem Phan, Trung Tuyen Truong, Hoai Phuong Ha, Dat Thanh Nguyen

General AI

Norway's electricity market is heavily dominated by hydropower, but the 2021--2022 energy crisis and stronger integration with Continental Europe have fundamentally altered price formation, reducing the reliability of forecasting models calibrated on historical data. Despite the critical need for updated models, a unif…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.2

Multimodal LLMs are not all you need for Pediatric Speech Language Pathology

2026-04-29 · Darren Fürst, Sebastian Steindl, Ulrich Schäfer

General AI

Speech Sound Disorders (SSD) affect roughly five percent of children, yet speech-language pathologists face severe staffing shortages and unmanageable caseloads. We test a hierarchical approach to SSD classification on the granular multi-task SLPHelmUltraSuitePlus benchmark. We propose a cascading approach from binary …

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.2

Virtual-reality based patient-specific simulation of spine surgical procedures: A fast, highly automated and high-fidelity system for surgical education and planning

2026-04-29 · Raj Kumar Ranabhat, Tayler D Ross, Tony Jiao, Jeremie Larouche, Joel Finkelstein, Michael Hardisty

General AI

Surgical training involves didactic teaching, mentor-led learning, surgical skills laboratories, and direct exposure to surgery; however, increasing clinical pressures have limited operating room (OR) exposure. This work leverages virtual reality (VR) to provide a safe and immersive training environment. Existing VR tr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.2

World2VLM: Distilling World Model Imagination into VLMs for Dynamic Spatial Reasoning

2026-04-29 · Wanyue Zhang, Wenxiang Wu, Wang Xu, Jiaxin Luo, Helu Zhi, Yibin Huang, Shuo Ren, Zitao Liu, Jiajun Zhang

General AI

Vision-language models (VLMs) have shown strong performance on static visual understanding, yet they still struggle with dynamic spatial reasoning that requires imagining how scenes evolve under egocentric motion. Recent efforts address this limitation either by scaling spatial supervision with synthetic data or by cou…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.2

CellxPert: Inference-Time MCMC Steering of a Multi-Omics Single-Cell Foundation Model for In-Silico Perturbation

2026-04-30 · Andac Demir, Erik W. Anderson, Jeremy L. Jenkins, Srayanta Mukherjee

General AI

In this work, we introduce CellxPert, a scalable multimodal foundation model that unifies single-cell and spatial multi-omics within a common representation space. CellxPert jointly encodes transcriptomic (scRNA-seq), chromatin-accessibility (ATAC-seq), and surface-proteomic (CITE-seq) measurements, while directly inco…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 11.2

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

2026-04-30 · Chenxin Li, Zhengyang Tang, Huangxin Lin, Yunlong Lin, Shijue Huang, Shengyuan Liu, Bowen Ye, Rang Li, Lei Li, Benyou Wang, Yixuan Yuan

General AI

LLM agents are expected to complete end-to-end units of work across software tools, business services, and local workspaces. Yet many agent benchmarks freeze a curated task set at release time and grade mainly the final response, making it difficult to evaluate agents against evolving workflow demand or verify whether …

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.2

Low Rank Adaptation for Adversarial Perturbation

2026-04-30 · Han Liu, Shanghao Shi, Yevgeniy Vorobeychik, Chongjie Zhang, Ning Zhang

General AI

Low-Rank Adaptation (LoRA), which leverages the insight that model updates typically reside in a low-dimensional space, has significantly improved the training efficiency of Large Language Models (LLMs) by updating neural network layers using low-rank matrices. Since the generation of adversarial examples is an optimiz…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.2

REBENCH: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names (Extended Version)

2026-04-30 · Jun Yeon Won, Xin Jin, Shiqing Ma, Zhiqiang Lin

General AI

Large Language Models (LLMs) have achieved remarkable progress in recent years, driving their adoption across a wide range of domains, including computer security. In reverse engineering, LLMs are increasingly applied to critical tasks such as function and variable name recovery and type inference. However, despite the…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.2

Towards Neuro-symbolic Causal Rule Synthesis, Verification, and Evaluation Grounded in Legal and Safety Principles

2026-04-30 · Zainab Rehan, Christian Medeiros Adriano, Sona Ghahremani, Holger Giese

General AI

Rule-based systems remain central in safety-critical domains but often struggle with scalability, brittleness, and goal misspecification. These limitations can lead to reward hacking and failures in formal verification, as AI systems tend to optimize for narrow objectives. In previous research, we developed a neuro-sym…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.2

FinSafetyBench: Evaluating LLM Safety in Real-World Financial Scenarios

2026-05-01 · Yutao Hou, Yihan Jiang, Yuhan Xie, Jian Yang, Liwen Zhang, Hailiang Huang, Guanhua Chen, Yun Chen

General AI

Large language models (LLMs) are increasingly applied in financial scenarios. However, they may produce harmful outputs, including facilitating illegal activities or unethical behavior, posing serious compliance risks. To systematically evaluate LLM safety in finance, we propose FinSafetyBench, a bilingual (English-Chi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.2

Position: agentic AI orchestration should be Bayes-consistent

2026-05-01 · Theodore Papamarkou, Pierre Alquier, Matthias Bauer, Wray Buntine, Andrew Davison, Gintare Karolina Dziugaite, Maurizio Filippone, Andrew Y. K. Foong, Vincent Fortuin, Dimitris Fouskakis, Jes Frellsen, Eyke Hüllermeier, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Nikita Kotelevskii, Salem Lahlou, Yingzhen Li, Fang Liu, Clare Lyle, Thomas Möllenhoff, Konstantina Palla, Maxim Panov, Yusuf Sale, Kajetan Schweighofer, Artem Shelmanov, Siddharth Swaroop, Martin Trapp, Willem Waegeman, Andrew Gordon Wilson, Alexey Zaytsev

General AI

LLMs excel at predictive tasks and complex reasoning tasks, but many high-value deployments rely on decisions under uncertainty, for example, which tool to call, which expert to consult, or how many resources to invest. While the usefulness and feasibility of Bayesian approaches remain unclear for LLM inference, this p…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.2

GIFT: Guided Fine-Tuning and Transfer for Enhancing Instruction-Tuned Language Models

2026-05-02 · Zhiwen Ruan, Yichao Du, Jianjie Zheng, Longyue Wang, Yun Chen, Peng Li, Jinsong Su, Yang Liu, Guanhua Chen

General AI

A promising paradigm for adapting instruction-tuned language models is to learn task-specific updates on a pretrained base model and subsequently merge them into the instruction-tuned model. However, existing approaches typically treat the instruction-tuned model as a passive target that is only involved at the final m…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.2

Flexi-LoRA with Input-Adaptive Ranks: Efficient Finetuning for Speech and Reasoning Tasks

2026-05-03 · Zongqian Li, Yixuan Su, Han Zhou, Zihao Fu, Nigel Collier

General AI

Parameter-efficient fine-tuning methods like Low-Rank Adaptation (LoRA) have become essential for deploying large language models, yet their static parameter allocation remains suboptimal for inputs of varying complexity. We present Flexi-LoRA, a novel framework that dynamically adjusts LoRA ranks based on input comple…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.2

TrajRAG: Retrieving Geometric-Semantic Experience for Zero-Shot Object Navigation

2026-05-03 · Yiyao Wang, Sixian Zhang, Keming Zhang, Xinhang Song, Songjie Du, Shuqiang Jiang

General AI

Existing zero-shot Object Goal Navigation (ObjectNav) methods often exploit commonsense knowledge from large language or vision-language models to guide navigation. However, such knowledge arises from internet-scale text rather than embodied 3D experience, and episodic observations collected during navigation are typic…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.2

AlbumFill: Album-Guided Reasoning and Retrieval for Personalized Image Completion

2026-05-04 · Yu-Ju Tsai, Brian Price, Qing Liu, Luis Figueroa, Daniil Pakhomov, Zhihong Ding, Scott Cohen, Ming-Hsuan Yang

General AI

Personalized image completion aims to restore occluded regions in personal photos while preserving identity and appearance. Existing methods either rely on generic inpainting models that often fail to maintain identity consistency, or assume that suitable reference images are explicitly provided. In practice, suitable …

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.2

Bolek: A Multimodal Language Model for Molecular Reasoning

2026-05-04 · Frederic Grabowski, Jacek Szczerbiński, Maciej Jaśkowski, Kalina Jasińska-Kobus, Paweł Dąbrowski-Tumański, Tomasz Jetka, Bartosz Topolski

General AI

Molecular property models increasingly support high-stakes drug-discovery decisions, but their outputs are often difficult to audit: classical predictors return scores without rationale, while language models can produce fluent explanations weakly grounded in the input molecule. We introduce Bolek, a compact multimodal…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.2

DynoSLAM: Dynamic SLAM with Generative Graph Neural Networks for Real-World Social Navigation

2026-05-04 · Danil Tokhchukov, Veronika Morozova, Gonzalo Ferrer

General AI

Traditional Simultaneous Localization and Mapping (SLAM) algorithms rely heavily on the static environment assumption, which severely limits their applicability in real-world spaces populated by moving entities, such as pedestrians. In this work, we propose DynoSLAM, a tightly-coupled Dynamic GraphSLAM architecture tha…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.0

Self-Execution Simulation Improves Coding Models

2026-03-11 · Gallil Maimon, Ori Yoran, Felix Kreuk, Michael Hassid, Gal Cohen, Pierre Chambon, Yossi Adi

General AI

A promising research direction in enabling LLMs to generate consistently correct code involves addressing their inability to properly estimate program execution, particularly for code they generate. In this work, we demonstrate that Code LLMs can be trained to simulate program execution in a step-by-step manner and tha…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.0

Evidence of an Emergent "Self" in Continual Robot Learning

2026-03-25 · Adidev Jhunjhunwala, Judah Goldfeder, Hod Lipson

Research Track A

A key challenge to understanding self-awareness has been a principled way of quantifying whether an intelligent system has a concept of a "self," and if so how to differentiate the "self" from other cognitive structures. We propose that the "self" can be isolated by seeking the invariant portion of cognitive process th…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 11.0

Ask or Assume? Uncertainty-Aware Clarification-Seeking in Coding Agents

2026-03-27 · Nicholas Edwards, Sebastian Schuster

General AI

As Large Language Model (LLM) agents are increasingly deployed in open-ended domains like software engineering, they frequently encounter underspecified instructions that lack crucial context. While human developers naturally resolve underspecification by asking clarifying questions, current agents are largely optimize…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.0

Story2Proposal: A Scaffold for Structured Scientific Paper Writing

2026-03-28 · Zhuoyang Qian, Wei Shi, Xu Lin, Li Ling, Meng Luo, Ziming Wang, Zhiwei Zhang, Tengyue Xu, Gaoge Liu, Zhentao Zhang, Shuo Zhang, Ziqi Wang, Zheng Feng, Yan Luo, Shu Xu, Yongjin Chen, Zhibo Feng, Zhuo Chen, Bruce Yuan, Biao Wu, Harry Wang, Kris Chen

General AI

Generating scientific manuscripts requires maintaining alignment between narrative reasoning, experimental evidence, and visual artifacts across the document lifecycle. Existing language-model generation pipelines rely on unconstrained text synthesis with validation applied only after generation, often producing struct…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.0

SNID-SAGE: A Modern Framework for Interactive Supernova Classification and Spectral Analysis

2026-03-30 · Fiorenzo Stoppa, Stephen J. Smartt

Research Track A

We present SNID-SAGE (SuperNova IDentification-Spectral Analysis and Guided Exploration), a framework for supernova spectral classification with both a fully interactive graphical interface and a scriptable command-line pipeline for large-scale processing. The pipeline combines deterministic spectral preprocessing, FFT…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.0

ASI-Evolve: AI Accelerates AI

2026-03-31 · Weixian Xu, Tiantian Mi, Yixiu Liu, Yang Nan, Zhimeng Zhou, Lyumanshan Ye, Lin Zhang, Yu Qiao, Pengfei Liu

General AI

Can AI accelerate the development of AI itself? While recent agentic systems have shown strong performance on well-scoped tasks with rapid feedback, it remains unclear whether they can tackle the costly, long-horizon, and weakly supervised research loops that drive real AI progress. We present ASI-Evolve, an agentic fr…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.0

SeGPruner: Semantic-Geometric Visual Token Pruner for 3D Question Answering

2026-03-31 · Wenli Li, Kai Zhao, Haoran Jiang, Enquan Yang, Yi Su, Dan Zeng

General AI

Vision-language models (VLMs) have been widely adopted for 3D question answering (3D QA). In typical pipelines, visual tokens extracted from multiple viewpoints are concatenated with language tokens and jointly processed by a large language model (LLM) for inference. However, aggregating multi-view observations inevita…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.0

Forecasting Supply Chain Disruptions with Foresight Learning

2026-04-01 · Benjamin Turtel, Paul Wilczewski, Kris Skotheim

General AI

Anticipating supply chain disruptions before they materialize is a core challenge for firms and policymakers alike. A key difficulty is learning to reason reliably about infrequent, high-impact events from noisy and unstructured inputs - a setting where general-purpose models struggle without task-specific adaptation. …

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.0

A Wasserstein Geometric Framework for Hebbian Plasticity

2026-04-17 · Ulrich Tan

Research Track A · General AI

We introduce the Tan-HWG framework (Hebbian-Wasserstein-Geometry), a geometric theory of Hebbian plasticity in which memory states are modeled as probability measures evolving through Wasserstein minimizing movements. Hebbian learning rules are formalized as Hebbian energies satisfying a sequential stability condition,…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.0

Seeing Isn't Believing: Uncovering Blind Spots in Evaluator Vision-Language Models

2026-04-23 · Mohammed Safi Ur Rahman Khan, Sanjay Suryanarayanan, Tushar Anand, Mitesh M. Khapra

General AI

Large Vision-Language Models (VLMs) are increasingly used to evaluate outputs of other models, for image-to-text (I2T) tasks such as visual question answering, and text-to-image (T2I) generation tasks. Despite this growing reliance, the reliability of these Evaluator VLMs remains under explored. In this work, we system…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.0

AnalogRetriever: Learning Cross-Modal Representations for Analog Circuit Retrieval

2026-04-25 · Yihan Wang, Lei Li, Yao Lai, Jing Wang, Yan Lu

General AI

Analog circuit design relies heavily on reusing existing intellectual property (IP), yet searching across heterogeneous representations such as SPICE netlists, schematics, and functional descriptions remains challenging. Existing methods are largely limited to exact matching within a single modality, failing to capture…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 11.0

AutoGUI-v2: A Comprehensive Multi-Modal GUI Functionality Understanding Benchmark

2026-04-27 · Hongxin Li, Xiping Wang, Jingran Su, Zheng Ju, Yuntao Chen, Qing Li, Zhaoxiang Zhang

Research Track B · General AI

Autonomous agents capable of navigating Graphical User Interfaces (GUIs) hold the potential to revolutionize digital productivity. However, achieving true digital autonomy extends beyond reactive element matching; it necessitates a predictive mental model of interface dynamics and the ability to foresee the "digital wo…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.0

Do not copy and paste! Rewriting strategies for code retrieval

2026-05-08 · Andrea Gurioli, Federico Pennino, Maurizio Gabbrielli

General AI

Embedding-based code retrieval often suffers when encoders overfit to surface syntax. Prior work mitigates this by using LLMs to rephrase queries and corpora into a normalized style, but leaves two questions open: how much representational shift helps, and when is the per-query LLM call justified? We study a hierarchy …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.8

SimGym: Traffic-Grounded Browser Agents for Offline A/B Testing in E-Commerce

2026-02-01 · Alberto Castelo, Zahra Zanjani Foumani, Ailin Fan, Keat Yang Koay, Vibhor Malik, Yuanzheng Zhu, Han Li, Meysam Feghhi, Ronie Uliana, Shuang Xie, Zhaoyu Zhang, Angelo Ocana Martins, Mingyu Zhao, Francis Pelland, Jonathan Faerman, Nikolas LeBlanc, Aaron Glazer, Andrew McNamara, Lingyun Wang, Zhong Wu

Research Track B · General AI

A/B testing remains the gold standard for evaluating e-commerce UI changes, yet it diverts traffic, takes weeks to reach significance, and risks harming user experience. We introduce SimGym, a scalable system for rapid offline A/B testing using traffic-grounded synthetic buyers powered by Large Language Model agents op…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.8

The Cognitive Firewall:Securing Browser Based AI Agents Against Indirect Prompt Injection Via Hybrid Edge Cloud Defense

2026-03-24 · Qianlong Lan, Anuj Kaul

Research Track B · General AI

Deploying large language models (LLMs) as autonomous browser agents exposes a significant attack surface in the form of Indirect Prompt Injection (IPI). Cloud-based defenses can provide strong semantic analysis, but they introduce latency and raise privacy concerns. We present the Cognitive Firewall, a three-stage spli…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

Natural-Language Agent Harnesses

2026-03-26 · Linyue Pan, Lexiao Zou, Shuo Guo, Jingchen Ni, Hai-Tao Zheng

General AI

Agent performance increasingly depends on \emph{harness engineering}, yet harness design is usually buried in controller code and runtime-specific conventions, making it hard to transfer, compare, and study as a scientific object. We ask whether the high-level control logic of an agent harness can instead be externaliz…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

Social Hippocampus Memory Learning

2026-03-26 · Liping Yi, Zhiming Zhao, Qinghua Hu

General AI

Social learning highlights that learning agents improve not in isolation, but through interaction and structured knowledge exchange with others. When introduced into machine learning, this principle gives rise to social machine learning (SML), where multiple agents collaboratively learn by sharing abstracted knowledge.…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

Wan-Weaver: Interleaved Multi-modal Generation via Decoupled Training

2026-03-26 · Jinbo Xing, Zeyinzi Jiang, Yuxiang Tuo, Chaojie Mao, Xiaotang Gai, Xi Chen, Jingfeng Zhang, Yulin Pan, Zhen Han, Jie Xiao, Keyu Yan, Chenwei Xie, Chongyang Zhong, Kai Zhu, Tong Shen, Lianghua Huang, Yu Liu, Yujiu Yang

General AI

Recent unified models have made unprecedented progress in both understanding and generation. However, while most of them accept multi-modal inputs, they typically produce only single-modality outputs. This challenge of producing interleaved content is mainly due to training data scarcity and the difficulty of modeling …

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

AMIGO: Agentic Multi-Image Grounding Oracle Benchmark

2026-03-30 · Min Wang, Ata Mahjoubfar

General AI

Agentic vision-language models increasingly act through extended interactions, but most evaluations still focus on single-image, single-turn correctness. We introduce AMIGO (Agentic Multi-Image Grounding Oracle Benchmark), a long-horizon benchmark for hidden-target identification over galleries of visually similar imag…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

C2RustXW: Program-Structure-Aware C-to-Rust Translation via Program Analysis and LLM

2026-03-30 · Yanyan Yan, Yang Feng, Jiangshan Liu, Di Liu, Zixi Liu, Hao Teng, Baowen Xu

General AI

The growing adoption of Rust for its memory safety and performance has increased the demand for effective migration of legacy C codebases. However, existing rule-based translators (e.g., \ctorust) often generate verbose, non-idiomatic code that preserves unsafe C semantics, limiting readability, maintainability, and pr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

Less Is More? Selective Visual Attention to High-Importance Regions for Multimodal Radiology Summarization

2026-03-31 · Mst. Fahmida Sultana Naznin, Adnan Ibney Faruq, Mushfiqur Rahman, Niloy Kumar Mondal, Md. Mehedi Hasan Shawon, Md Rakibul Hasan

General AI

Automated radiology report summarization aims to distill verbose findings into concise clinical impressions, but existing multimodal models often struggle with visual noise and fail to meaningfully improve over strong text-only baselines in the FINDINGS $\to$ IMPRESSION transformation. We challenge two prevailing assum…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs

2026-03-31 · Zhuowen Liang, Xiaotian Lin, Zhengxuan Zhang, Yuyu Luo, Haixun Wang, Nan Tang

General AI

Large language models (LLMs) are widely applied to data analytics over documents, yet direct reasoning over long, noisy documents remains brittle and error-prone. Hence, we study document question answering (QA) that consolidates dispersed evidence into a structured output (e.g., a table, graph, or chunks) to support r…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

Video Models Reason Early: Exploiting Plan Commitment for Maze Solving

2026-03-31 · Kaleb Newman, Tyler Zhu, Olga Russakovsky

General AI

Video diffusion models exhibit emergent reasoning capabilities like solving mazes and puzzles, yet little is understood about how they reason during generation. We take a first step towards understanding this and study the internal planning dynamics of video models using 2D maze solving as a controlled testbed. Our inv…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

UAV-Track VLA: Embodied Aerial Tracking via Vision-Language-Action Models

2026-04-02 · Qiyao Zhang, Shuhua Zheng, Jianli Sun, Chengxiang Li, Xianke Wu, Zihan Song, Zhiyong Cui, Yisheng Lv, Yonglin Tian

General AI

Embodied visual tracking is crucial for Unmanned Aerial Vehicles (UAVs) executing complex real-world tasks. In dynamic urban scenarios with complex semantic requirements, Vision-Language-Action (VLA) models show great promise due to their cross-modal fusion and continuous action generation capabilities. To benchmark mu…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

VISTA: Visualization of Token Attribution via Efficient Analysis

2026-04-02 · Syed Ahmed, Bharathi Vokkaliga Ganesh, Jagadish Babu P, Karthick Selvaraj, Praneeth Talluri, Sanket Hingne, Anubhav Kumar, Anushka Yadav, Pratham Kumar Verma, Kiranmayee Janardhan, Mandanna A N

General AI

Understanding how Large Language Models (LLMs) process information from prompts remains a significant challenge. To shed light on this "black box," attention visualization techniques have been developed to capture neuron-level perceptions and interpret how models focus on different parts of input data. However, many ex…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

Chart-RL: Policy Optimization Reinforcement Learning for Enhanced Visual Reasoning in Chart Question Answering with Vision Language Models

2026-04-03 · Yunfei Bai, Amit Dhanda, Shekhar Jain

General AI

The recent advancements in Vision Language Models (VLMs) have demonstrated progress toward true intelligence requiring robust reasoning capabilities. Beyond pattern recognition, linguistic reasoning must integrate with visual comprehension, particularly for Chart Question Answering (CQA) tasks involving complex data vi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

AnyUser: Translating Sketched User Intent into Domestic Robots

2026-04-06 · Songyuan Yang, Huibin Tan, Kailun Yang, Wenjing Yang, Shaowu Yang

General AI

We introduce AnyUser, a unified robotic instruction system for intuitive domestic task instruction via free-form sketches on camera images, optionally with language. AnyUser interprets multimodal inputs (sketch, vision, language) as spatial-semantic primitives to generate executable robot actions requiring no prior map…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

ClawBench: Can AI Agents Complete Everyday Online Tasks?

2026-04-09 · Yuxuan Zhang, Yubo Wang, Yipeng Zhu, Penghui Du, Junwen Miao, Xuan Lu, Wendong Xu, Yunzhuo Hao, Songcheng Cai, Xiaochen Wang, Huaisong Zhang, Xian Wu, Yi Lu, Minyi Lei, Kai Zou, Huifeng Yin, Ping Nie, Liang Chen, Dongfu Jiang, Wenhu Chen, Kelsey R. Allen

General AI

AI agents may be able to automate your inbox, but can they automate other routine aspects of your life? Everyday online tasks offer a realistic yet unsolved testbed for evaluating the next generation of AI agents. To this end, we introduce ClawBench, an evaluation framework of 153 simple tasks that people need to accom…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses

2026-04-28 · Jiahang Lin, Shichun Liu, Chengjun Pan, Lizhi Lin, Shihan Dou, Xuanjing Huang, Hang Yan, Zhenhua Han, Tao Gui

General AI

Harnesses have become a central determinant of coding-agent performance, shaping how models interact with repositories, tools, and execution environments. Yet automating harness engineering is hard: a heterogeneous action space, sparse and noisy evaluation signal, multi-million-token trajectories, and edits whose effec…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models

2026-04-28 · Ajmain Inqiad Alam, Palash Roy, Chanchal K. Roy, Banani Roy, Kevin A. Schneider

General AI

The accelerating adoption of Large Language Models (LLMs) in software engineering (SE) has brought with it a silent crisis: unsustainable computational cost. While these models demonstrate remarkable capabilities in different SE tasks, they are unmanageably large, slow to deploy, memory-intensive, and carbon-heavy. Thi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios

2026-04-28 · Jinxiang Meng, Shaoping Huang, Fangyu Lei, Jingyu Guo, Haoxiang Liu, Jiahao Su, Sihan Wang, Yao Wang, Enrui Wang, Ye Yang, Hongze Chai, Jinming Lv, Anbang Yu, Huangjing Zhang, Yitong Zhang, Yiming Huang, Zeyao Ma, Shizhu He, Jun Zhao, Kang Liu

General AI

Real-world data visualization (DV) requires native environmental grounding, cross-platform evolution, and proactive intent alignment. Yet, existing benchmarks often suffer from code-sandbox confinement, single-language creation-only tasks, and assumption of perfect intent. To bridge these gaps, we introduce DV-World, a…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

Explainable AI for Jet Tagging: A Comparative Study of GNNExplainer, GNNShap, and GradCAM for Jet Tagging in the Lund Jet Plane

2026-04-28 · Pahal D. Patel, Sanmay Ganguly

General AI

Graph neural networks such as ParticleNet and transformer based networks on point clouds such as ParticleTransformer achieve state-of-the-art performance on jet tagging benchmarks at the Large Hadron Collider, yet the physical reasoning behind their predictions remains opaque. We present different methods, i.e. perturb…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

From Threads to Trajectories: A Multi-LLM Pipeline for Community Knowledge Extraction from GitHub Issue Discussions

2026-04-28 · Nazia Shehnaz Joynab, Soneya Binta Hossain

General AI

Resolution of complex post-production issues in large-scale open-source software (OSS) projects requires significant cognitive effort, as developers need to go through long, unstructured and fragmented issue discussion threads before that. In this paper, we present SWE-MIMIC-Bench, an issue trajectory dataset generated…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

PSK at SemEval-2026 Task 9: Multilingual Polarization Detection Using Ensemble Gemma Models with Synthetic Data Augmentation

2026-05-06 · Srikar Kashyap Pulipaka

General AI

We present our system for SemEval-2026 Task 9: Multilingual Polarization Detection, a binary classification task spanning 22 languages. Our approach fine-tunes separate Gemma~3 models (12B and 27B parameters) per language using Low-Rank Adaptation (LoRA), augmented with synthetic data generated by a large language mode…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.8

SIGMA-ASL: Sensor-Integrated Multimodal Dataset for Sign Language Recognition

2026-05-07 · Xiaofang Xiao, Guangchao Li, Guangrong Zhao, Qi Lin, Wen Ma, Hongkai Wen, Yanxiang Wang, Yiran Shen

General AI

Automatic sign language recognition (SLR) has become a key enabler of inclusive human-computer interaction, fostering seamless communication between deaf individuals and hearing communities. Despite significant advances in multimodal learning, existing SLR research remains dominated by vision-based datasets, which are …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 10.5

SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

2026-02-10 · Talor Abramovich, Maor Ashkenazi, Carl, Putterman, Benjamin Chislett, Tiyasa Mitra, Bita Darvish Rouhani, Ran Zilberstein, Yonatan Geifman

General AI

Speculative Decoding (SD) has emerged as a critical technique for accelerating Large Language Model (LLM) inference. Unlike deterministic system optimizations, SD performance is inherently data-dependent, meaning that diverse and representative workloads are essential for accurately measuring its effectiveness. Existin…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.5

Safe and Scalable Web Agent Learning via Recreated Websites

2026-03-11 · Hyungjoo Chae, Jungsoo Park, Alan Ritter

Research Track B · General AI

Training autonomous web agents is fundamentally limited by the environments they learn from: real-world websites are unsafe to explore, hard to reset, and rarely provide verifiable feedback. We propose VeriEnv, a framework that treats language models as environment creators, automatically cloning real-world websites in…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.5

STEM Agent: A Self-Adapting, Tool-Enabled, Extensible Architecture for Multi-Protocol AI Agent Systems

2026-03-22 · Alfred Shen, Aaron Shen

Research Track A · General AI

Current AI agent frameworks commit early to a single interaction protocol, a fixed tool integration strategy, and static user models, limiting their deployment across diverse interaction paradigms. To address these constraints, we introduce STEM Agent (Self-adapting, Tool-enabled, Extensible, Multi-agent), a modular ar…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.5

Learn by Surprise, Commit by Proof

2026-04-02 · Kang-Sin Choi

Research Track A · General AI

We propose LSCP, a self-gated post-training framework for autonomous knowledge acquisition: learning only what a model does not already know, verified against what it does know, at a strength proportional to conviction, with no external oracle. When a passage produces anomalously high per-token loss, LSCP flags it, gen…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.5

Fast Heterogeneous Serving: Scalable Mixed-Scale LLM Allocation for SLO-Constrained Inference

2026-04-08 · Jiaming Cheng, Duong Tung Nguyen

Research Track A · General AI

Deploying large language model (LLM) inference at scale requires jointly selecting base models, provisioning heterogeneous GPUs, configuring parallelism, and distributing workloads under tight latency, accuracy, and budget constraints. Exact mixed-integer linear programming (MILP) approaches guarantee optimality but sc…

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.5

DiningBench: A Hierarchical Multi-view Benchmark for Perception and Reasoning in the Dietary Domain

2026-04-12 · Song Jin, Juntian Zhang, Xun Zhang, Zeying Tian, Fei Jiang, Guojun Yin, Wei Lin, Yong Liu, Rui Yan

General AI

Recent advancements in Vision-Language Models (VLMs) have revolutionized general visual understanding. However, their application in the food domain remains constrained by benchmarks that rely on coarse-grained categories, single-view imagery, and inaccurate metadata. To bridge this gap, we introduce DiningBench, a hie…

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.5

SciPredict: Can LLMs Predict the Outcomes of Scientific Experiments in Natural Sciences?

2026-04-12 · Udari Madhushani Sehwag, Elaine Lau, Haniyeh Ehsani Oskouie, Shayan Shabihi, Erich Liang, Andrea Toledo, Guillermo Mangialardi, Sergio Fonrouge, Ed-Yeremai Hernandez Cardona, Paula Vergara, Utkarsh Tyagi, Chen Bo Calvin Zhang, Pavi Bhatter, Nicholas Johnson, Furong Huang, Ernesto Gabriel Hernandez Montoya, Bing Liu

General AI

Accelerating scientific discovery requires the identification of which experiments would yield the best outcomes before committing resources to costly physical validation. While existing benchmarks evaluate LLMs on scientific knowledge and reasoning, their ability to predict experimental outcomes - a task where AI coul…

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.5

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

2026-04-15 · Team HY-World, Chenjie Cao, Xuhui Zuo, Zhenwei Wang, Yisu Zhang, Junta Wu, Zhenyang Liu, Yuning Gong, Yang Liu, Bo Yuan, Chao Zhang, Coopers Li, Dongyuan Guo, Fan Yang, Haiyu Zhang, Hang Cao, Jianchen Zhu, Jiaxin Lin, Jie Xiao, Jihong Zhang, Junlin Yu, Lei Wang, Lifu Wang, Lilin Wang, Linus, Minghui Chen, Peng He, Penghao Zhao, Qi Chen, Rui Chen, Rui Shao, Sicong Liu, Wangchen Qin, Xiaochuan Niu, Xiang Yuan, Yi Sun, Yifei Tang, Yifu Sun, Yihang Lian, Yonghao Tan, Yuhong Liu, Yuyang Yin, Zhiyuan Min, Tengfei Wang, Chunchao Guo

General AI

We introduce HY-World 2.0, a multi-modal world model framework that advances our prior project HY-World 1.0. HY-World 2.0 accommodates diverse input modalities, including text prompts, single-view images, multi-view images, and videos, and produces 3D world representations. With text or single-view image inputs, the mo…

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.5

An Optimal Transport-driven Approach for Cultivating Latent Space in Online Incremental Learning

2026-04-16 · Quyen Tran, Hai Nguyen, Hoang Phan, Quan Dao, Linh Ngo, Khoat Than, Dinh Phung, Dimitris Metaxas, Trung Le

General AI

In online incremental learning, data continuously arrives with substantial distributional shifts, creating a significant challenge because previous samples have limited replay value when learning a new task. Prior research has typically relied on either a single adaptive centroid or multiple fixed centroids to represen…

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.5

Beyond Text-Dominance: Understanding Modality Preference of Omni-modal Large Language Models

2026-04-18 · Xinru Yan, Boxi Cao, Yaojie Lu, Hongyu Lin, Weixiang Zhou, Le Sun, Xianpei Han

General AI

Native Omni-modal Large Language Models (OLLMs) have shifted from pipeline architectures to unified representation spaces. However, this native integration gives rise to a critical yet underexplored phenomenon: modality preference. To bridge this gap, we first systematically quantify modality preference of OLLMs using …

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.5

HSG: Hyperbolic Scene Graph

2026-04-19 · Liyang Wang, Zeyu Zhang, Hao Tang

General AI

Scene graph representations enable structured visual understanding by modeling objects and their relationships, and have been widely used for multiview and 3D scene reasoning. Existing methods such as MSG learn scene graph embeddings in Euclidean space using contrastive learning and attention based association. However…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.5

Mango: Multi-Agent Web Navigation via Global-View Optimization

2026-04-20 · Weixi Tong, Yifeng Di, Tianyi Zhang

Research Track B · General AI

Existing web agents typically initiate exploration from the root URL, which is inefficient for complex websites with deep hierarchical structures. Without a global view of the website's structure, agents frequently fall into navigation traps, explore irrelevant branches, or fail to reach target information within a lim…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 10.5

Mitigating Multimodal Hallucination via Phase-wise Self-reward

2026-04-20 · Yu Zhang, Chuyang Sun, Kehai Chen, Xuefeng Bai, Yang Xiang, Min Zhang

General AI

Large Vision-Language Models (LVLMs) still struggle with vision hallucination, where generated responses are inconsistent with the visual input. Existing methods either rely on large-scale annotated data for fine-tuning, which incurs massive computational overhead, or employ static post-hoc strategies that overlook the…

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.5

Chasing the Public Score: User Pressure and Evaluation Exploitation in Coding Agent Workflows

2026-04-22 · Hardy Chen, Nancy Lau, Haoqin Tu, Shuo Yan, Xiangyan Liu, Zijun Wang, Juncheng Wu, Michael Qizhe Shieh, Alvaro A. Cardenas, Cihang Xie, Yuyin Zhou

General AI

Frontier coding agents are increasingly used in workflows where users supervise progress primarily through repeated improvement of a public score, namely the reported score on a public evaluation file with labels in the workspace, rather than through direct inspection of the agent's intermediate outputs. We study wheth…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.5

ImageHD: Energy-Efficient On-Device Continual Learning of Visual Representations via Hyperdimensional Computing

2026-04-23 · Jebacyril Arockiaraj, Dhruv Parikh, Viktor Prasanna

Research Track A · General AI

On-device continual learning (CL) is critical for edge AI systems operating on non-stationary data streams, but most existing methods rely on backpropagation or exemplar-heavy classifiers, incurring substantial compute, memory, and latency overheads. Hyperdimensional computing (HDC) offers a lightweight alternative thr…

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.5

From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company

2026-04-24 · Zhengxu Yu, Yu Fu, Zhiyuan He, Yuxuan Huang, Lee Ka Yiu, Meng Fang, Weilin Luo, Jun Wang

General AI

Individual agent capabilities have advanced rapidly through modular skills and tool integrations, yet multi-agent systems remain constrained by fixed team structures, tightly coupled coordination logic, and session-bound learning. We argue that this reflects a deeper absence: a principled organisational layer that gove…

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.5

ProEval: Proactive Failure Discovery and Efficient Performance Estimation for Generative AI Evaluation

2026-04-25 · Yizheng Huang, Wenjun Zeng, Aditi Kumaresan, Zi Wang

General AI

Evaluating generative AI models is increasingly resource-intensive due to slow inference, expensive raters, and a rapidly growing landscape of models and benchmarks. We propose ProEval, a proactive evaluation framework that leverages transfer learning to efficiently estimate performance and identify failure cases. ProE…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.5

PageGuide: Browser extension to assist users in navigating a webpage and locating information

2026-04-26 · Tin Nguyen, Thang T. Truong, Runtao Zhou, Trung Bui, Chirag Agarwal, Anh Totti Nguyen

Research Track B · General AI

Users browsing the web daily struggle to quickly locate relevant information in cluttered pages, complete unfamiliar multi-step tasks, and stay focused amid distracting content. State-of-the-art AI assistants (e.g., ChatGPT, Gemini, Claude) and browser agents (e.g., OpenAI Operator, Browser Use) can answer questions an…

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.5

Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms

2026-04-26 · Qi Li, Bo Yin, Weiqi Huang, Ruhao Liu, Bojun Zou, Runpeng Yu, Jingwen Ye, Weihao Yu, Xinchao Wang

General AI

Vision-Language-Action (VLA) models are emerging as a unified substrate for embodied intelligence. This shift raises a new class of safety challenges, stemming from the embodied nature of VLA systems, including irreversible physical consequences, a multimodal attack surface across vision, language, and state, real-time…

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.5

ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning

2026-04-27 · Yiming Zhang, Jiacheng Chen, Jiaqi Tan, Yongsen Mao, Wenhu Chen, Angel X. Chang

General AI

Current evaluations of spatial intelligence can be systematically invalid under modern vision-language model (VLM) settings. First, many benchmarks derive question-answer (QA) pairs from point-cloud-based 3D annotations originally curated for traditional 3D perception. When such annotations are treated as ground truth …

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.5

WAAA! Web Adversaries Against Agentic Browsers

2026-05-06 · Sohom Datta, Alex Nahapetyan, William Enck, Alexandros Kapravelos

Research Track B · General AI

Large language models (LLMs) are increasingly being integrated into web browsers to create agentic browsing systems that execute actions on behalf of the user. Prior work considering the security of agentic browsers focuses exclusively on indirect prompt-injection attacks. However, by failing to consider traditional we…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 10.5

GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs

2026-05-07 · Pranav Mantini, Shishir K. Shah

Research Track A

We address the challenge of knowledge composition in Vision-Language Models (VLMs), where accumulating expertise across multiple domains or tasks typically leads to catastrophic forgetting. We introduce GeoStack (Geometric Stacking), a modular framework that allows independently trained domain experts to be composed in…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 10.5

One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue

2026-05-12 · Xinjie Shen, Rongzhe Wei, Peizhi Niu, Haoyu Wang, Ruihan Wu, Eli Chien, Bo Li, Pin-Yu Chen, Pan Li

General AI

Hidden malicious intent in multi-turn dialogue poses a growing threat to deployed large language models (LLMs). Rather than exposing a harmful objective in a single prompt, increasingly capable attackers can distribute their intent across multiple benign-looking turns. Recent studies show that even modern commercial mo…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.3

Agentic Compilation: Mitigating the LLM Rerun Crisis for Minimized-Inference-Cost Web Automation

2026-04-08 · Jagadeesh Chundru

Research Track B · General AI

LLM-driven web agents operating through continuous inference loops -- repeatedly querying a model to evaluate browser state and select actions -- exhibit a fundamental scalability constraint for repetitive tasks. We characterize this as the Rerun Crisis: the linear growth of token expenditure and API latency relative t…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.3

A Mechanistic Analysis of Looped Reasoning Language Models

2026-04-13 · Hugh Blayney, Álvaro Arroyo, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Michael M. Bronstein, Xiaowen Dong

General AI

Reasoning has become a central capability in large language models. Recent research has shown that reasoning performance can be improved by looping an LLM's layers in the latent dimension, resulting in looped reasoning language models. Despite promising results, few works have investigated how their internal dynamics d…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

Enhancing Program Repair with Specification Guidance and Intermediate Behavioral Signals

2026-04-13 · Minh Le-Anh, Cuong Chi Le, Tien N. Nguyen

General AI

Automated Program Repair (APR) has recently benefited from large language models (LLMs). However, most LLM-based APR approaches still rely primarily on coarse end-to-end signals from test-suite outcomes to guide repair, providing limited insight into where a program's internal logic deviates from its intended behavior.…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation

2026-04-13 · Donghao Zhou, Guisheng Liu, Hao Yang, Jiatong Li, Jingyu Lin, Xiaohu Huang, Yichen Liu, Xin Gao, Cunjian Chen, Shilei Wen, Chi-Wing Fu, Pheng-Ann Heng

General AI

In this work, we study Human-Object Interaction Video Generation (HOIVG), which aims to synthesize high-quality human-object interaction videos conditioned on text, reference images, audio, and pose. This task holds significant practical value for automating content creation in real-world applications, such as e-commer…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

Retrieval Is Not Enough: Why Organizational AI Needs Epistemic Infrastructure

2026-04-13 · Federico Bottino, Carlo Ferrero, Nicholas Dosio, Pierfrancesco Beneventano

General AI

Organizational knowledge used by AI agents typically lacks epistemic structure: retrieval systems surface semantically relevant content without distinguishing binding decisions from abandoned hypotheses, contested claims from settled ones, or known facts from unresolved questions. We argue that the ceiling on organizat…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

Causal Diffusion Models for Counterfactual Outcome Distributions in Longitudinal Data

2026-04-14 · Farbod Alinezhad, Jianfei Cao, Gary J. Young, Brady Post

General AI

Predicting counterfactual outcomes in longitudinal data, where sequential treatment decisions heavily depend on evolving patient states, is critical yet notoriously challenging due to complex time-dependent confounding and inadequate uncertainty quantification in existing methods. We introduce the Causal Diffusion Mode…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation

2026-04-14 · Yecheng Wu, Song Han, Hai Cai

General AI

On-policy distillation (OPD) has emerged as an efficient post-training paradigm for large language models. However, standard OPD requires a live teacher inference server throughout training, resulting in substantial infrastructure overhead. In this work, we investigate whether on-policy distillation can be performed of…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

LogicEval: A Systematic Framework for Evaluating Automated Repair Techniques for Logical Vulnerabilities in Real-World Software

2026-04-14 · Syed Md Mukit Rashid, Abdullah Al Ishtiaq, Kai Tu, Yilu Dong, Tianwei Wu, Ali Ranjbar, Tianchang Yang, Najrin Sultana, Shagufta Mehnaz, Syed Rafiul Hussain

General AI

Logical vulnerabilities in software stem from flaws in program logic rather than memory safety, which can lead to critical security failures. Although existing automated program repair techniques primarily focus on repairing memory corruption vulnerabilities, they struggle with logical vulnerabilities because of their …

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

AD4AD: Benchmarking Visual Anomaly Detection Models for Safer Autonomous Driving

2026-04-16 · Fabrizio Genilotti, Arianna Stropeni, Gionata Grotto, Francesco Borsatti, Manuel Barusco, Davide Dalle Pezze, Gian Antonio Susto

General AI

The reliability of a machine vision system for autonomous driving depends heavily on its training data distribution. When a vehicle encounters significantly different conditions, such as atypical obstacles, its perceptual capabilities can degrade substantially. Unlike many domains where errors carry limited consequence…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

ControlFoley: Unified and Controllable Video-to-Audio Generation with Cross-Modal Conflict Handling

2026-04-16 · Jianxuan Yang, Xinyue Guo, Zhi Cheng, Kai Wang, Lipan Zhang, Jinjie Hu, Qiang Ji, Yihua Cao, Yihao Meng, Zhaoyue Cui, Mengmei Liu, Meng Meng, Jian Luan

General AI

Recent advances in video-to-audio (V2A) generation enable high-quality audio synthesis from visual content, yet achieving robust and fine-grained controllability remains challenging. Existing methods suffer from weak textual controllability under visual-text conflict and imprecise stylistic control due to entangled tem…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

DPC: Training-Free Text-to-SQL Candidate Selection via Dual-Paradigm Consistency

2026-04-16 · Boyan Li, Ou Ocean Kun Hei, Yue Yu, Yuyu Luo

General AI

While Large Language Models (LLMs) demonstrate impressive proficiency in generating SQL queries, they fundamentally lack the capability to self-evaluate correctness without an execution oracle. This limitation creates a stark Generation-Selection Gap, where high potential accuracy (Pass@K) fails to translate into execu…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

Exploring Agentic Visual Analytics: A Co-Evolutionary Framework of Roles and Workflows

2026-04-17 · Tianqi Luo, Leixian Shen, Yuyu Luo

General AI

Agentic visual analytics (VA) represents an emerging class of systems in which large language model (LLM)-driven agents autonomously plan, execute, evaluate, and iterate across the full visual analytics pipeline. By shifting users from low-level tool operations to high-level analytical goals expressed through natural l…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

Using Large Language Models and Knowledge Graphs to Improve the Interpretability of Machine Learning Models in Manufacturing

2026-04-17 · Thomas Bayer, Alexander Lohr, Sarah Weiß, Bernd Michelberger, Wolfram Höpken

General AI

Explaining Machine Learning (ML) results in a transparent and user-friendly manner remains a challenging task of Explainable Artificial Intelligence (XAI). In this paper, we present a method to enhance the interpretability of ML models by using a Knowledge Graph (KG). We store domain-specific data along with ML results…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents

2026-04-19 · Ziao Zhang, Kou Shi, Shiting Huang, Avery Nie, Yu Zeng, Yiming Zhao, Zhen Fang, Qishen Su, Haibo Qiu, Wei Yang, Qingnan Ren, Shun Zou, Wenxuan Huang, Lin Chen, Zehui Chen, Feng Zhao

Research Track A · General AI

As the capability frontier of autonomous agents continues to expand, they are increasingly able to complete specialized tasks through plug-and-play external skills. Yet current benchmarks mostly test whether models can use provided skills, leaving open whether they can discover skills from experience, repair them after…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

BhashaSutra: A Task-Centric Unified Survey of Indian NLP Datasets, Corpora, and Resources

2026-04-20 · Raghvendra Kumar, Devankar Raj, Sriparna Saha

General AI

India's linguistic landscape, spanning 22 scheduled languages and hundreds of marginalized dialects, has driven rapid growth in NLP datasets, benchmarks, and pretrained models. However, no dedicated survey consolidates resources developed specifically for Indian languages. Existing reviews either focus on a few high-re…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

ReCap: Lightweight Referential Grounding for Coherent Story Visualization

2026-04-20 · Aditya Arora, Akshita Gupta, Pau Rodriguez, Marcus Rohrbach

General AI

Story Visualization aims to generate a sequence of images that faithfully depicts a textual narrative that preserve character identity, spatial configuration, and stylistic coherence as the narratives unfold. Maintaining such cross-frame consistency has traditionally relied on explicit memory banks, architectural expan…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

TLoRA: Task-aware Low Rank Adaptation of Large Language Models

2026-04-20 · Weicheng Lin, Yi Zhang, Jiawei Dang, Liang-Jie Zhang

General AI

Low-Rank Adaptation (LoRA) has become a widely adopted parameter-efficient fine-tuning method for large language models, with its effectiveness largely influenced by the allocation of ranks and scaling factors, as well as initialization. Existing LoRA variants typically address only one of these factors, often at the c…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

Low-Rank Adaptation for Critic Learning in Off-Policy Reinforcement Learning

2026-04-21 · Yuan Zhuang, Yuexin Bian, Sihong He, Jie Feng, Qing Su, Songyang Han, Jonathan Petit, Shihao Ji, Yuanyuan Shi, Fei Miao

General AI

Scaling critic capacity is a promising direction for enhancing off-policy reinforcement learning (RL). However, larger critics are prone to overfitting and unstable in replay-buffer-based bootstrap training. This paper leverages Low-Rank Adaptation (LoRA) as a structural-sparsity regularizer for off-policy critics. Our…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

PC2Model: ISPRS benchmark on 3D point cloud to model registration

2026-04-21 · Mehdi Maboudi, Said Harb, Jackson Ferrao, Kourosh Khoshelham, Yelda Turkan, Karam Mawas

General AI

Point cloud registration involves aligning one point cloud with another or with a three-dimensional (3D) model, enabling the integration of multimodal data into a unified representation. This is essential in applications such as construction monitoring, autonomous driving, robotics, and virtual or augmented reality (VR…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery

2026-04-21 · Abhinav Agarwal

General AI

LLM-assisted defect discovery has a precision crisis: plausible-but-wrong reports overwhelm maintainers and degrade credibility for real findings. We present Refute-or-Promote, an inference-time reliability pattern combining Stratified Context Hunting (SCH) for candidate generation, adversarial kill mandates, context a…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

EVENT5Ws: A Large Dataset for Open-Domain Event Extraction from Documents

2026-04-23 · Praval Sharma, Ashok Samal, Leen-Kiat Soh, Deepti Joshi

General AI

Event extraction identifies the central aspects of events from text. It supports event understanding and analysis, which is crucial for tasks such as informed decision-making in emergencies. Therefore, it is necessary to develop automated event extraction approaches. However, existing datasets for algorithm development…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

Grounding Video Reasoning in Physical Signals

2026-04-23 · Alibay Osmanli, Zixu Cheng, Shaogang Gong

General AI

Physical video understanding requires more than naming an event correctly. A model can answer a question about pouring, sliding, or collision from textual regularities while still failing to localize the event in time or space. We introduce a grounded benchmark for physical video understanding that extends the what--wh…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

Seeing Fast and Slow: Learning the Flow of Time in Videos

2026-04-23 · Yen-Siang Wu, Rundong Luo, Jingsen Zhu, Tao Tu, Ali Farhadi, Matthew Wallingford, Yu-Chiang Frank Wang, Steve Marschner, Wei-Chiu Ma

General AI

How can we tell whether a video has been sped up or slowed down? How can we generate videos at different speeds? Although videos have been central to modern computer vision research, little attention has been paid to perceiving and controlling the passage of time. In this paper, we study time as a learnable visual conc…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale

2026-04-23 · Jun Wang, Ziyin Zhang, Rui Wang, Hang Yu, Peng Di

General AI

Real-time detection and mitigation of technical anomalies are critical for large-scale cloud-native services, where even minutes of downtime can result in massive financial losses and diminished user trust. While customer incidents serve as a vital signal for discovering risks missed by monitoring, extracting actionabl…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

Can QPP Choose the Right Query Variant? Evaluating Query Variant Selection for RAG Pipelines

2026-04-24 · Negar Arabzadeh, Andrew Drozdov, Michael Bendersky, Matei Zaharia

General AI

Large Language Models (LLMs) have made query reformulation ubiquitous in modern retrieval and Retrieval-Augmented Generation (RAG) pipelines, enabling the generation of multiple semantically equivalent query variants. However, executing the full pipeline for every reformulation is computationally expensive, motivating …

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

EV-CLIP: Efficient Visual Prompt Adaptation for CLIP in Few-shot Action Recognition under Visual Challenges

2026-04-24 · Hyo Jin Jon, Longbin Jin, Eun Yi Kim

General AI

CLIP has demonstrated strong generalization in visual domains through natural language supervision, even for video action recognition. However, most existing approaches that adapt CLIP for action recognition have primarily focused on temporal modeling, often overlooking spatial perception. In real-world scenarios, visu…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

GazeVLA: Learning Human Intention for Robotic Manipulation

2026-04-24 · Chengyang Li, Kaiyi Xiong, Yuan Xu, Lei Qian, Yizhou Wang, Wentao Zhu

General AI

Embodied foundation models have achieved significant breakthroughs in robotic manipulation, yet they still depend heavily on large-scale robot demonstrations. Although recent works have explored leveraging human data to alleviate this dependency, effectively extracting transferable knowledge remains a significant chall…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

QuantClaw: Precision Where It Matters for OpenClaw

2026-04-24 · Manyi Zhang, Ji-Fu Li, Zhongao Sun, Xiaohao Liu, Zhenhua Dong, Xianzhi Yu, Haoli Bai, Xiaobo Xia

General AI

Autonomous agent systems such as OpenClaw introduce significant efficiency challenges due to long-context inputs and multi-turn reasoning. This results in prohibitively high computational and monetary costs in real-world development. While quantization is a standard approach for reducing cost and latency, its impact on…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

Can Current Agents Close the Discovery-to-Application Gap? A Case Study in Minecraft

2026-04-27 · Zhou Ziheng, Huacong Tang, Jinyuan Zhang, Haowei Lin, Bangcheng Yang, Qian Long, Fang Sun, Yizhou Sun, Yitao Liang, Ying Nian Wu, Demetri Terzopoulos, Xiaofeng Gao

Research Track A · General AI

Discovering causal regularities and applying them to build functional systems--the discovery-to-application loop--is a hallmark of general intelligence, yet evaluating this capacity has been hindered by the vast complexity gap between scientific discovery and real-world engineering. We introduce SciCrafter, a Minecraft…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

Defective Task Descriptions in LLM-Based Code Generation: Detection and Analysis

2026-04-27 · Amal Akli, Mike Papadakis, Maxime Cordy, Yves Le Traon

General AI

Large language models are widely used for code generation, yet they rely on an implicit assumption that the task descriptions are sufficiently detailed and well-formed. However, in practice, users may provide defective descriptions, which can have a strong effect on code correctness. To address this issue, we develop S…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

Meta-CoT: Enhancing Granularity and Generalization in Image Editing

2026-04-27 · Shiyi Zhang, Yiji Cheng, Tiankai Hang, Zijin Yin, Runze He, Yu Xu, Wenxun Dai, Yunlong Lin, Chunyu Wang, Qinglin Lu, Yansong Tang

General AI

Unified multi-modal understanding/generative models have shown improved image editing performance by incorporating fine-grained understanding into their Chain-of-Thought (CoT) process. However, a critical question remains underexplored: what forms of CoT and training strategy can jointly enhance both the understanding …

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

Images in Sentences: Scaling Interleaved Instructions for Unified Visual Generation

2026-05-12 · Yabo Zhang, Kunchang Li, Dewei Zhou, Xinyu Huang, Xun Wang

General AI

While recent advancements in multimodal language models have enabled image generation from expressive multi-image instructions, existing methods struggle to maintain performance under complex interleaved instructions. This limitation stems from the structural separation of images and text in current paradigms, which fo…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.2

Degree-dependent and distance-dependent contact rates interpolate between explosive, exponential and polynomial epidemic growth

2026-04-29 · Zylan Benjert, Júlia Komjáthy, Johannes Lengler, John Lapinskas, Ulysse Schaller

General AI

It is a fundamental question in epidemiology to estimate, model and predict the growth rate of a pandemic. Analogously, analysing the diffusion of innovation, (fake) news, memes, and rumours is of key importance in the social sciences. The resulting epidemic growth curves can be classified according to their growth rat…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.2

Domain-Adapted Small Language Models for Reliable Clinical Triage

2026-04-29 · Manar Aljohani, Brandon Ho, Kenneth McKinley, Dennis Ren, Xuan Wang

General AI

Accurate and consistent Emergency Severity Index (ESI) assignment remains a persistent challenge in emergency departments, where highly variable free-text triage documentation contributes to mistriage and workflow inefficiencies. This study evaluates whether open-source small language models (SLMs) can serve as reliabl…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.2

Exploring the Efficiency of 3D-Stacked AI Chip Architecture for LLM Inference with Voxel

2026-04-29 · Yiqi Liu, Noelle Crawford, Michael Wang, Jilong Xue, Jian Huang

General AI

To overcome the well-known memory bottleneck of AI chips, 3D stacked architectures that employ advanced packaging technology with high-density through-silicon vias (TSVs) pins have proven to be a promising solution. The 3D-stacked AI chip enables ultra-high memory bandwidth between compute and memory by stacking numero…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.2

LLM as Clinical Graph Structure Refiner: Enhancing Representation Learning in EEG Seizure Diagnosis

2026-04-30 · Lincan Li, Zheng Chen, Yushun Dong

General AI

Electroencephalogram (EEG) signals are vital for automated seizure detection, but their inherent noise makes robust representation learning challenging. Existing graph construction methods, whether correlation-based or learning-based, often generate redundant or irrelevant edges due to the noisy nature of EEG data. Thi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.2

HyCOP: Hybrid Composition Operators for Interpretable Learning of PDEs

2026-05-01 · Jinpai Zhao, Nishant Panda, Yen Ting Lin, Eirik Valseth, Diane Oyen, Clint Dawson

General AI

We introduce HyCOP, a modular framework that learns parametric PDE solution operators by composing simple modules (advection, diffusion, learned closures, boundary handling) in a query-conditioned way. Rather than learning a monolithic map, HyCOP learns a policy over short programs - which module to apply and for how l…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.2

Autonomous LLM Agent Worms: Cross-Platform Propagation, Automated Discovery and Temporal Re-Entry Defense

2026-05-04 · Mingming Zha, Xiaofeng Wang

General AI

Autonomous LLM agents operate as long-running processes with persistent workspaces, memory files, scheduled task state, and messaging integrations. These features create a new propagation risk: attacker-influenced content can be written into persistent agent state, re-enter the LLM decision context through scheduled au…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.2

FlexSQL: Flexible Exploration and Execution Make Better Text-to-SQL Agents

2026-05-04 · Quang Hieu Pham, Yang He, Ping Nie, Canwen Xu, Davood Rafiei, Yuepeng Wang, Xi Ye, Jocelyn Qiaochu Chen

General AI

Text-to-SQL over large analytical databases requires navigating complex schemas, resolving ambiguous queries, and grounding decisions in actual data. Most current systems follow a fixed pipeline where schema elements are retrieved once upfront and the database is only revisited for post-hoc repair, limiting recovery fr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.2

OphMAE: Bridging Volumetric and Planar Imaging with a Foundation Model for Adaptive Ophthalmological Diagnosis

2026-05-04 · Tienyu Chang, Zhen Chen, Renjie Liang, Jinyu Ding, Jie Xu, Sunu Mathew, Amir Reza Hajrasouliha, Andrew J. Saykin, Ruogu Fang, Yu Huang, Jiang Bian, Qingyu Chen

General AI

The advent of foundation models has heralded a new era in medical artificial intelligence (AI), enabling the extraction of generalizable representations from large-scale unlabeled datasets. However, current ophthalmic AI paradigms are predominantly constrained to single-modality inference, thereby creating a dissonance…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.2

TRACE: Temporal Reasoning over Context and Evidence for Activity Recognition in Smart Homes

2026-05-04 · Yingtian Shi, Abivishaq Balasubramanian, Jessica Herring, Jiachen Li, Juan Macias Romero, Rosemarie Santa Gonzalez, Varun Mishra, Agata Rozga, Xiang Zhi Tan, Thomas Plötz

General AI

Human activity recognition (HAR) in smart homes remains challenging because many daily activities exhibit similar local sensor patterns, while minimally intrusive sensing provides sparse and ambiguous observations. As a result, methods based on short temporal or event windows often fail to capture the broader temporal …

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.2

When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition

2026-05-04 · Pehuén Moure, Niclas Pokel, Bilal Bounajma, Yingqiang Gao, Roman Boehringer, Longbiao Cheng, Shih-Chii Liu

General AI

Automatic speech recognition (ASR) systems remain brittle on dysarthric and other atypical speech. Recent audio-language models raise the possibility of improving performance by conditioning on additional clinical context at inference time, but it is unclear whether these models can make use of such information. We int…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.0

In-Browser Agents for Search Assistance

2026-01-14 · Saber Zerhoudi, Michael Granitzer

Research Track B · General AI

A fundamental tension exists between the demand for sophisticated AI assistance in web search and the need for user data privacy. Current centralized models require users to transmit sensitive browsing data to external services, which limits user control. In this paper, we present a browser extension that provides a vi…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.0

RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots

2026-03-04 · Soroush Nasiriany, Sepehr Nasiriany, Abhiram Maddukuri, Yuke Zhu

Research Track A · General AI

Recent advances in robot learning have accelerated progress toward generalist robots that can perform everyday tasks in human environments. Yet it remains difficult to gauge how close we are to this vision. The field lacks a reproducible, large-scale benchmark for systematic evaluation. To fill this gap, we present Rob…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 10.0

Pixel-level Scene Understanding in One Token: Visual States Need What-is-Where Composition

2026-03-14 · Seokmin Lee, Yunghee Lee, Byeonghyun Pak, Byeongju Woo

General AI

For robotic agents operating in dynamic environments, learning visual state representations from streaming video observations is essential for sequential decision making. Recent self-supervised learning methods have shown strong transferability across vision tasks, but they do not explicitly address what a good visual …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 10.0

Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models

2026-03-15 · Lok-Lam Ieong, Chia-Chien Chen, Chih-Kai Yang, Yu-Han Huang, An-Yu Cheng, Hung-yi Lee

General AI

Chain-of-thought (CoT) prompting has been extended to large audio-language models (LALMs) to elicit reasoning, yet enhancing its effectiveness without training remains challenging. We study inference-time model steering as a training-free approach to improve LALM reasoning. We introduce three strategies using diverse i…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 10.0

How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation

2026-03-19 · Ke-Han Lu, Szu-Wei Fu, Chao-Han Huck Yang, Zhehuai Chen, Sung-Feng Huang, Chih-Kai Yang, Yi-Cheng Lin, Chi-Yuan Hsiao, Wenze Ren, En-Pei Hu, Yu-Han Huang, An-Yu Cheng, Cheng-Han Chiang, Yu Tsao, Yu-Chiang Frank Wang, Hung-yi Lee

General AI

Large language models (LLMs) have been widely used as knowledge backbones of Large Audio Language Models (LALMs), yet how much auditory knowledge they encode through text-only pre-training and how this affects downstream performance remains unclear. We study this gap by comparing different LLMs under two text-only and …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.0

AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling

2026-03-22 · Liang Ding

Research Track B · General AI

LLM agents fail on the majority of real-world tasks -- GPT-4o succeeds on fewer than 15% of WebArena navigation tasks and below 55% pass@1 on ToolBench (Zhou et al., 2024; Qin et al., 2024) -- yet every failed trajectory is routinely discarded, wasting the dominant source of collected experience. We introduce AgentHER,…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.0

IntentWeave: A Progressive Entry Ladder for Multi-Surface Browser Agents in Cloud Portals

2026-03-24 · Wanying Mo, Jijia Lai, Xiaoming Wang

Research Track B · General AI

Browser agents built on LLMs can act in web interfaces, yet most remain confined to a single chat surface (e.g., a sidebar). This mismatch with real browsing can increase context-switching and reduce user control. We introduce \textbf{IntentWeave}, a design space of ten spatial paradigms for embedding agentic assistanc…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.0

DIET: Learning to Distill Dataset Continually for Recommender Systems

2026-03-26 · Jiaqing Zhang, Hao Wang, Mingjia Yin, Bo Chen, Qinglin Jia, Rui Zhou, Ruiming Tang, ChaoYi Ma, Enhong Chen

Research Track A · General AI

Modern deep recommender models are trained under a continual learning paradigm, relying on massive and continuously growing streaming behavioral logs. In large-scale platforms, retraining models on full historical data for architecture comparison or iteration is prohibitively expensive, severely slowing down model deve…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 10.0

VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward

2026-03-27 · Zhaochong An, Orest Kupyn, Théo Uscidda, Andrea Colaco, Karan Ahuja, Serge Belongie, Mar Gonzalez-Franco, Marta Tintore Gazulla

General AI

Large-scale video diffusion models achieve impressive visual quality, yet often fail to preserve geometric consistency. Prior approaches improve consistency either by augmenting the generator with additional modules or applying geometry-aware alignment. However, architectural modifications can compromise the generaliza…

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.0

MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

2026-03-30 · Zhang Li, Zhibo Lin, Qiang Liu, Ziyang Zhang, Shuo Zhang, Zidun Guo, Jiajun Song, Jiarui Zhang, Xiang Bai, Yuliang Liu

General AI

We introduce Multilingual Document Parsing Benchmark, the first benchmark for multilingual digital and photographed document parsing. Document parsing has made remarkable strides, yet almost exclusively on clean, digital, well-formatted pages in a handful of dominant languages. No systematic benchmark exists to evaluat…

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.0

FlowPIE: Test-Time Scientific Idea Evolution with Flow-Guided Literature Exploration

2026-03-31 · Qiyao Wang, Hongbo Wang, Longze Chen, Zhihao Yang, Guhong Chen, Hamid Alinejad-Rokny, Hui Li, Yuan Lin, Min Yang

General AI

Scientific idea generation (SIG) is critical to AI-driven autonomous research, yet existing approaches are often constrained by a static retrieval-then-generation paradigm, leading to homogeneous and insufficiently divergent ideas. In this work, we propose FlowPIE, a tightly coupled retrieval-generation framework that …

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.0

Terminal Agents Suffice for Enterprise Automation

2026-03-31 · Patrice Bechard, Orlando Marquez Ayala, Emily Chen, Jordan Skelton, Sagar Davasam, Srinivas Sunkara, Vikas Yadav, Sai Rajeswar

Research Track B · General AI

There has been growing interest in building agents that can interact with digital platforms to execute meaningful enterprise tasks autonomously. Among the approaches explored are tool-augmented agents built on abstractions such as Model Context Protocol (MCP) and web agents that operate through graphical interfaces. Ye…

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.0

Less Detail, Better Answers: Degradation-Driven Prompting for VQA

2026-04-06 · Haoxuan Han, Weijie Wang, Zeyu Zhang, Yefei He, Bohan Zhuang

Research Track A · General AI

Recent advancements in Vision-Language Models (VLMs) have significantly pushed the boundaries of Visual Question Answering (VQA).However,high-resolution details can sometimes become noise that leads to hallucinations or reasoning errors. In this paper,we propose Degradation-Driven Prompting (DDP), a novel framework tha…

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.0

Appear2Meaning: A Cross-Cultural Benchmark for Structured Cultural Metadata Inference from Images

2026-04-08 · Yuechen Jiang, Enze Zhang, Md Mohsinul Kabir, Qianqian Xie, Stavroula Golfomitsou, Konstantinos Arvanitis, Sophia Ananiadou

General AI

Recent advances in vision-language models (VLMs) have improved image captioning for cultural heritage. However, inferring structured cultural metadata (e.g., creator, origin, period) from visual input remains underexplored. We introduce a multi-category, cross-cultural benchmark for this task and evaluate VLMs using an…

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.0

OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence

2026-04-08 · Jianhui Liu, Haoze Sun, Wenbo Li, Yanbing Zhang, Rui Yang, Zhiliang Zhu, Yijun Yang, Shenghe Zheng, Nan Jiang, Jiaxiu Jiang, Haoyang Huang, Tien-Tsin Wong, Nan Duan, Xiaojuan Qi

General AI

Spatial understanding is a fundamental cornerstone of human-level intelligence. Nonetheless, current research predominantly focuses on domain-specific data production, leaving a critical void: the absence of a principled, open-source engine capable of fully unleashing the potential of high-quality spatial data. To brid…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.0

TCL: Enabling Fast and Efficient Cross-Hardware Tensor Program Optimization via Continual Learning

2026-04-14 · Chaoyao Shen, Linfeng Jiang, Yixian Shen, Tao Xu, Guoqing Li, Anuj Pathania, Andy D. Pimentel, Meng Zhang

Research Track A

Deep learning (DL) compilers rely on cost models and auto-tuning to optimize tensor programs for target hardware. However, existing approaches depend on large offline datasets, incurring high collection costs and offering suboptimal transferability across platforms. In this paper, we introduce TCL, a novel efficient an…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.0

Adaptive Unknown Fault Detection and Few-Shot Continual Learning for Condition Monitoring in Ultrasonic Metal Welding

2026-04-15 · Ahmadreza Eslaminia, Kuan-Chieh Lu, Klara Nahrstedt, Chenhui Shao

Research Track A

Ultrasonic metal welding (UMW) is widely used in industrial applications but is sensitive to tool wear, surface contamination, and material variability, which can lead to unexpected process faults and unsatisfactory weld quality. Conventional monitoring systems typically rely on supervised learning models that assume a…

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.0

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

2026-04-27 · NVIDIA, Amala Sanjay Deshmukh, Kateryna Chumachenko, Tuomas Rintamaki, Matthieu Le, Tyler Poon, Danial Mohseni Taheri, Ilia Karmanov, Guilin Liu, Jarno Seppanen, Arushi Goel, Mike Ranzinger, Greg Heinrich, Guo Chen, Lukas Voegtle, Philipp Fischer, Timo Roman, Karan Sapra, Collin McCarthy, Shaokun Zhang, Fuxiao Liu, Hanrong Ye, Yi Dong, Mingjie Liu, Yifan Peng, Piotr Zelasko, Zhehuai Chen, Nithin Rao Koluguri, Nune Tadevosyan, Lilit Grigoryan, Ehsan Hosseini Asl, Pritam Biswas, Leili Tavabi, Yuanhang Su, Zhiding Yu, Peter Jin, Alexandre Milesi, Netanel Haber, Yao Xu, Sarah Amiraslani, Nabin Mulepati, Eric Tramel, Jaehun Jung, Ximing Lu, Brandon Cui, Jin Xu, Zhiqi Li, Shihao Wang, Yuanguo Kuang, Huck Yang, Boyi Li, Hongxu Yin, Song Han, Pavlo Molchanov, Adi Renduchintala, Charles Wang, David Mosallanezhad, Soumye Singhal, Luis Vega, Katherine Cheung, Sreyan Ghosh, Yian Zhang, Alexander Bukharin, Venkat Srinivasan, Johnny Greco, Andre Manoel, Maarten Van Segbroeck, Suseella Panguliri, Rohit Watve, Divyanshu Kakwani, Shubham Pachori, Jeffrey Glick, Radha Sri-Tharan, Aileen Zaman, Khanh Nguyen, Shi Chen, Jiaheng Fang, Qing Miao, Wenfei Zhou, Yu Wang, Zaid Pervaiz Bhat, Varun Praveen, Arihant Jain, Ramanathan Arunachalam, Tomasz Kornuta, Ashton Sharabiani, Amy Shen, Wei Huang, Yi-Fu Wu, Ali Roshan Ghias, Huiying Li, Brian Yu, Nima Tajbakhsh, Chen Cui, Wenwen Gao, Li Ding, Terry Kong, Manoj Kilaru, Anahita Bhiwandiwalla, Marek Wawrzos, Daniel Korzekwa, Pablo Ribalta, Grzegorz Chlebus, Besmira Nushi, Ewa Dobrowolska, Maciej Jakub Mikulski, Kunal Dhawan, Steve Huang, Jagadeesh Balam, Yongqiang Wang, Nikolay Karpov, Valentin Mendelev, George Zelenfroynd, Meline Mkrtchyan, Omri Almog, Bhavesh Pawar, Rameshwar Shivbhakta, Sudeep Sabnis, Ashrton Sharabiani, Negar Habibi, Geethapriya Venkataramani, Pamela Peng, Prerit Rodney, Serge Panev, Richard Mazzarese, Nicky Liu, Michael Fukuyama, Andrii Skliar, Roger Waleffe, Duncan Riach, Yunheng Zou, Jian Hu, Hao Zhang, Binfeng Xu, Yuhao Yang, Zuhair Ahmed, Carlo del Mundo, Chad Voegele, Zhiyu Cheng, Nave Assaf, Daniel Afrimi, Natan Bagrov, Ran Zilberstein, Ofri Masad, Eugene Khvedchenia, Borys Tymchenko, Tomer Asida, Parth Mannan, Victor Cui, Michael Evans, Katherine Luna, Jie Lou, Pinky Xu, Guyue Huang, Michael Boone, Pradeep Thalasta, Adeola Adesoba, Dina Yared, Christopher Parisien, Leon Derczynski, Shaona Ghosh, Wes Feely, Micah Schaffer, Barnaby Simkin, Tomasz Grzegorzek, Rishabh Garg, Aastha Jhunjhunwala, Sergei Kolchenko, Farzan Memarian, Haran Kumar, Shiv Kumar, Isabel Hulseman, Anjali Shah, Kari Briski, Padmavathy Subramanian, Joey Conway, Udi Karpas, Jane Polak Scowcroft, Annie Surla, Shilpa Ammireddy, Ellie Evans, Jesse Oliver, Tom Balough, Chia-Chih Chen, Sandip Bhaskar, Alejandra Rico, Bardiya Sadeghi, Seph Mard, Meredith Price, Laya Sleiman, Saori Kaji, Wesley Helmholz, Wendy Quan, Michael Lightstone, Jonathan Cohen, Jian Zhang, Oleksii Kuchaiev, Boris Ginsburg, Jan Kautz, Eileen Long, Mohammad Shoeybi, Mostofa Patwary, Oluwatobi Olabiyi, Andrew Tao, Bryan Catanzaro

Research Track B · General AI

We introduce Nemotron 3 Nano Omni, the latest model in the Nemotron multimodal series and the first to natively support audio inputs alongside text, images, and video. Nemotron 3 Nano Omni delivers consistent accuracy improvements over its predecessor, Nemotron Nano V2 VL, across all modalities, enabled by advances in …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 10.0

BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate

2026-04-28 · Arnon Mazza, Elad Levi

General AI

Deploying guardrails for custom policies remains challenging, as generic safety models fail to capture task-specific requirements, while prompting LLMs suffers from inconsistent boundary-case performance and high inference costs. Training custom classifiers achieves both accuracy and efficiency, yet demands substantial…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.8

Privacy Practices of Browser Agents

2025-12-08 · Alisha Ukani, Hamed Haddadi, Ali Shahin Shamsabadi, Peter Snyder

Research Track B · General AI

This paper presents a systematic evaluation of the privacy behaviors and attributes of eight recent, popular browser agents. Browser agents are software that automate Web browsing using large language models and ancillary tooling. However, the automated capabilities that make browser agents powerful also make them high…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.8

Cognitive Dark Matter: Measuring What AI Misses

2026-03-03 · Patrick J. Mineault, Thomas L. Griffiths, Sean Escola

Research Track A · General AI

We propose that the jagged intelligence landscape of modern AI systems arises from a missing training signal that we call "cognitive dark matter" (CDM): brain functions that meaningfully shape behavior yet are hard to infer from behavior alone. We identify key CDM domains-metacognition, cognitive flexibility, episodic …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.8

Back to Basics: Revisiting ASR in the Age of Voice Agents

2026-03-26 · Geeyang Tay, Wentao Ma, Jaewon Lee, Yuzhi Tang, Daniel Lee, Weisu Yin, Dongming Shen, Silin Meng, Yi Zhu, Mu Li, Alex Smola

General AI

Automatic speech recognition (ASR) systems have achieved near-human accuracy on curated benchmarks, yet still fail in real-world voice agents under conditions that current evaluations do not systematically cover. Without diagnostic tools that isolate specific failure factors, practitioners cannot anticipate which condi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.8

BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation

2026-03-26 · Yan Li, Zezi Zeng, Ziwei Zhou, Xin Gao, Muzhao Tian, Yifan Yang, Mingxi Cheng, Qi Dai, Yuqing Yang, Lili Qiu, Zhendong Wang, Zhengyuan Yang, Xue Yang, Lijuan Wang, Ji Li, Chong Luo

General AI

Recent advances in image generation models have expanded their applications beyond aesthetic imagery toward practical visual content creation. However, existing benchmarks mainly focus on natural image synthesis and fail to systematically evaluate models under the structured and multi-constraint requirements of real-wo…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.8

No Hard Negatives Required: Concept Centric Learning Leads to Compositionality without Degrading Zero-shot Capabilities of Contrastive Models

2026-03-26 · Hai X. Pham, David T. Hoffmann, Ricardo Guerrero, Brais Martinez

General AI

Contrastive vision-language (V&L) models remain a popular choice for various applications. However, several limitations have emerged, most notably the limited ability of V&L models to learn compositional representations. Prior methods often addressed this limitation by generating custom training data to obtain hard neg…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.8

Training the Knowledge Base through Evidence Distillation and Write-Back Enrichment

2026-03-26 · Yuxing Lu, Xukai Zhao, Wei Wu, Jinzhuo Wang

General AI

The knowledge base in a retrieval-augmented generation (RAG) system is typically assembled once and never revised, even though the facts a query requires are often fragmented across documents and buried in irrelevant content. We argue that the knowledge base should be treated as a trainable component and propose WriteB…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.8

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

2026-03-30 · Omer Dahary, Benaya Koren, Daniel Garibi, Daniel Cohen-Or

General AI

Modern Text-to-Image (T2I) diffusion models have achieved remarkable semantic alignment, yet they often suffer from a significant lack of variety, converging on a narrow set of visual solutions for any given prompt. This typicality bias presents a challenge for creative applications that require a wide range of generat…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.8

ContextClaim: A Context-Driven Paradigm for Verifiable Claim Detection

2026-03-31 · Yufeng Li, Rrubaa Panchendrarajan, Arkaitz Zubiaga

General AI

Verifiable claim detection asks whether a claim expresses a factual statement that can, in principle, be assessed against external evidence. As an early filtering stage in automated fact-checking, it plays an important role in reducing the burden on downstream verification components. However, existing approaches to cl…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.8

One-for-All: A Lightweight Stabilized and Parameter-Efficient Pre-trained LLM for Time Series Forecasting

2026-03-31 · Prasanjit Dey, Soumyabrata Dev, Bianca Schoen-Phelan

General AI

We address the challenge of adapting pre-trained Large Language Models (LLMs) for multivariate time-series analysis, where their deployment is often hindered by prohibitive computational and memory demands. Our solution, One-for-All, introduces Gaussian Rank-Stabilized Low-Rank Adapters (rsLoRA) to enable parameter-eff…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.8

Is One Token All It Takes? Graph Pooling Tokens for LLM-based GraphQA

2026-04-01 · Ankit Grover, Lodovico Giaretta, Rémi Bourgerie, Sarunas Girdzijauskas

General AI

The integration of Graph Neural Networks (GNNs) with Large Language Models (LLMs) has emerged as a promising paradigm for Graph Question Answering (GraphQA). However, effective methods for encoding complex structural information into the LLM's latent space remain an open challenge. Current state-of-the-art architecture…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.8

Beyond the Assistant Turn: User Turn Generation as a Probe of Interaction Awareness in Language Models

2026-04-02 · Sarath Shekkizhar, Romain Cosentino, Adam Earle

General AI

Standard LLM benchmarks evaluate the assistant turn: the model generates a response to an input, a verifier scores correctness, and the analysis ends. This paradigm leaves unmeasured whether the LLM encodes any awareness of what follows the assistant response. We propose user-turn generation as a probe of this gap: giv…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.8

Omni123: Exploring 3D Native Foundation Models with Limited 3D Data by Unifying Text to 2D and 3D Generation

2026-04-02 · Chongjie Ye, Cheng Cao, Chuanyu Pan, Yiming Hao, Yihao Zhi, Yuanming Hu, Xiaoguang Han

General AI

Recent multimodal large language models have achieved strong performance in unified text and image understanding and generation, yet extending such native capability to 3D remains challenging due to limited data. Compared to abundant 2D imagery, high-quality 3D assets are scarce, making 3D synthesis under-constrained. …

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.8

Generating Synthetic Doctor-Patient Conversations for Long-form Audio Summarization

2026-04-07 · Yanis Labrak, David Grünert, Séverin Baroudi, Jiyun Chun, Pawel Cyrta, Sergio Burdisso, Ahmed Hassoon, David Liu, Adam Rothschild, Reed Van Deusen, Petr Motlicek, Andrew Perrault, Ricard Marxer, Thomas Schaaf

General AI

Long-context audio reasoning is underserved in both training data and evaluation. Existing benchmarks target short-context tasks, and the open-ended generation tasks most relevant to long-context reasoning pose well-known challenges for automatic evaluation. We propose a synthetic data generation pipeline designed to s…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.8

Demystifying OPD: Length Inflation and Stabilization Strategies for Large Language Models

2026-04-09 · Feng Luo, Yu-Neng Chuang, Guanchu Wang, Zicheng Xu, Xiaotian Han, Tianyi Zhang, Vladimir Braverman

General AI

On-policy distillation (OPD) trains student models under their own induced distribution while leveraging supervision from stronger teachers. We identify a failure mode of OPD: as training progresses, on-policy rollouts can undergo abrupt length inflation, causing truncated trajectories to dominate the training data. Th…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.8

ParseBench: A Document Parsing Benchmark for AI Agents

2026-04-09 · Boyang Zhang, Sebastián G. Acosta, Preston Carlson, Sacha Bron, Pierre-Loïc Doulcet, Simon Suo

General AI

AI agents are changing the requirements for document parsing. What matters is \emph{semantic correctness}: parsed output must preserve the structure and meaning needed for autonomous decisions, including correct table structure, precise chart data, semantically meaningful formatting, and visual grounding. Existing benc…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.8

How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum

2026-04-28 · Chu-Cheng Lin, Eugene Ie

General AI

Adapting reasoning models to new tasks during post-training with only output-level supervision stalls under reinforcement learning from verifiable rewards (RLVR) when the initial success probability $p_0$ is small. Using the Tsallis $q$-logarithm, we define a loss family $J_Q$ that interpolates between RLVR (at $q{=}0$…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.8

Learning Generalizable Multimodal Representations for Software Vulnerability Detection

2026-04-28 · Zeming Dong, Yuejun Guo, Qiang Hu, Yao Zhang, Maxime Cordy, Hao Liu, Mike Papadakis, Yongqiang Lyu

General AI

Source code and its accompanying comments are complementary yet naturally aligned modalities-code encodes structural logic while comments capture developer intent. However, existing vulnerability detection methods mostly rely on single-modality code representations, overlooking the complementary semantic information em…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.8

Three Models of RLHF Annotation: Extension, Evidence, and Authority

2026-04-28 · Steve Coyne

General AI

Preference-based alignment methods, most prominently Reinforcement Learning with Human Feedback (RLHF), use the judgments of human annotators to shape large language model behaviour. However, the normative role of these judgments is rarely made explicit. I distinguish three conceptual models of that role. The first is …

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.8

Towards Agentic Investigation of Security Alerts

2026-04-28 · Even Eilertsen, Vasileios Mavroeidis, Gudmund Grov

General AI

Security analysts are overwhelmed by the volume of alerts and the low context provided by many detection systems. Early-stage investigations typically require manual correlation across multiple log sources, a task that is usually time-consuming. In this paper, we present an experimental, agentic workflow that leverages…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.8

Variational Neural Belief Parameterizations for Robust Dexterous Grasping under Multimodal Uncertainty

2026-04-28 · Clinton Enwerem, Shreya Kalyanaraman, John S. Baras, Calin Belta

General AI

Contact variability, sensing uncertainty, and external disturbances make grasp execution stochastic. Expected-quality objectives ignore tail outcomes and often select grasps that fail under adverse contact realizations. Risk-sensitive POMDPs address this failure mode, but many use particle-filter beliefs that scale poo…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.8

EMO: Pretraining Mixture of Experts for Emergent Modularity

2026-05-07 · Ryan Wang, Akshita Bhagia, Sewon Min

General AI

Large language models are typically deployed as monolithic systems, requiring the full model even when applications need only a narrow subset of capabilities, e.g., code, math, or domain-specific knowledge. Mixture-of-Experts (MoEs) seemingly offer a potential alternative by activating only a subset of experts per inpu…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.8

Quantifying Trade-Offs Between Stability and Goal-Obfuscation

2026-05-07 · Yixuan Wang, Dan Guralnik, Warren Dixon

General AI

Safety-critical autonomy in adversarial settings demands more than Lyapunov stability of tracking error signals. An agent executing a goal-directed trajectory is intrinsically legible to a passive observer running online Bayesian inference, because the contractive dynamics of any Lyapunov basin of attraction concentrat…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.8

Task-Aware Answer Preservation under Audio Compression for Large Audio Language Models

2026-05-07 · Amir Ivry

General AI

Large audio language models (LALMs) are increasingly used to reason over long audio clips, yet deployment often compresses audio before inference to reduce memory and latency. The risk is that compression can leave aggregate accuracy acceptable while sharply degrading answers for a deployment-critical query family. We …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.8

From Expansion to Consolidation: Socio-Spatial Contagion Dynamics in Off-Grid PV Adoption

2026-05-10 · Roni Blushtein-Livnon, Tal Svoray, Itay Fischhendler, Havatzelet Yahel, Emir Galilee

Research Track A

In traditional rural societies, where social ties are embedded in physical space, the diffusion of emerging technologies may be amplified through socio-spatial contagion (SSC). Such processes may play a key role in accelerating residential PV adoption in off-grid regions. Yet empirical evidence on SSC in PV adoption re…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.5

Similarity-Aware Mixture-of-Experts for Data-Efficient Continual Learning

2026-03-24 · Connor Mclaughlin, Nigel Lee, Lili Su

Research Track A

Machine learning models often need to adapt to new data after deployment due to structured or unstructured real-world dynamics. The Continual Learning (CL) framework enables continuous model adaptation, but most existing approaches either assume each task contains sufficiently many data samples or that the learning tas…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.5

Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding

2026-04-09 · Mu Nan, Muquan Yu, Weijian Mai, Jacob S. Prince, Hossein Adeli, Rui Zhang, Jiahang Cao, Benjamin Becker, John A. Pyles, Margaret M. Henderson, Chunfeng Song, Nikolaus Kriegeskorte, Michael J. Tarr, Xiaoqing Hu, Andrew F. Luo

Research Track A · General AI

Visual decoding from brain signals is a key challenge at the intersection of computer vision and neuroscience, requiring methods that bridge neural representations and computational models of vision. A field-wide goal is to achieve generalizable, cross-subject models. A major obstacle towards this goal is the substanti…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.5

CONSCIENTIA: Can LLM Agents Learn to Strategize? Emergent Deception and Trust in a Multi-Agent NYC Simulation

2026-04-10 · Aarush Sinha, Arion Das, Soumyadeep Nag, Charan Karnati, Shravani Nag, Chandra Vadhan Raj, Aman Chadha, Vinija Jain, Suranjana Trivedy, Amitava Das

General AI

As large language models (LLMs) are increasingly deployed as autonomous agents, understanding how strategic behavior emerges in multi-agent environments has become an important alignment challenge. We take a neutral empirical stance and construct a controlled environment in which strategic behavior can be directly obse…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.5

SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context

2026-04-13 · Shuquan Lian, Juncheng Liu, Yazhe Chen, Yuhong Chen, Hui Li

General AI

Prior representative ReAct-style approaches in autonomous Software Engineering (SWE) typically lack the explicit System-2 reasoning required for deep analysis and handling complex edge cases. While recent reasoning models demonstrate the potential of extended Chain-of-Thought (CoT), applying them to the multi-turn SWE …

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.5

Lyra 2.0: Explorable Generative 3D Worlds

2026-04-14 · Tianchang Shen, Sherwin Bahmani, Kai He, Sangeetha Grama Srinivasan, Tianshi Cao, Jiawei Ren, Ruilong Li, Zian Wang, Nicholas Sharp, Zan Gojcic, Sanja Fidler, Jiahui Huang, Huan Ling, Jun Gao, Xuanchi Ren

Research Track A

Recent advances in video generation enable a new paradigm for 3D scene creation: generating camera-controlled videos that simulate scene walkthroughs, then lifting them to 3D via feed-forward reconstruction techniques. This generative reconstruction approach combines the visual fidelity and creative capacity of video m…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.5

A Unified Model and Document Representation for On-Device Retrieval-Augmented Generation

2026-04-15 · Julian Killingback, Ofer Meshi, Henry Li, Hamed Zamani, Maryam Karimzadehgan

Research Track A · General AI

Traditional Retrieval-Augmented Generation (RAG) approaches generally assume that retrieval and generation occur on powerful servers removed from the end user. While this reduces local hardware constraints, it introduces significant drawbacks: privacy concerns regarding data access, recurring maintenance and storage co…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.5

Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips

2026-04-16 · Ido Galil, Moshe Kimhi, Ran El-Yaniv

General AI

Deep Neural Networks (DNNs) can be catastrophically disrupted by flipping only a handful of parameter bits. We introduce Deep Neural Lesion (DNL), a data-free and optimizationfree method that locates critical parameters, and an enhanced single-pass variant, 1P-DNL, that refines this selection with one forward and backw…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.5

WavAlign: Enhancing Intelligence and Expressiveness in Spoken Dialogue Models via Adaptive Hybrid Post-Training

2026-04-16 · Yifu Chen, Shengpeng Ji, Qian Chen, Tianle Liang, Yangzhuo Li, Ziqing Wang, Wen Wang, Jingyu Lu, Haoxiao Wang, Xueyi Pu, Fan Zhuo, Zhou Zhao

General AI

End-to-end spoken dialogue models have garnered significant attention because they offer a higher potential ceiling in expressiveness and perceptual ability than cascaded systems. However, the intelligence and expressiveness of current open-source spoken dialogue models often remain below expectations. Motivated by the…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.5

Code-Switching Information Retrieval: Benchmarks, Analysis, and the Limits of Current Retrievers

2026-04-19 · Qingcheng Zeng, Yuheng Lu, Zeqi Zhou, Heli Qi, Puxuan Yu, Fuheng Zhao, Hitomi Yanaka, Weihao Xuan, Naoto Yokoya

General AI

Code-switching is a pervasive linguistic phenomenon in global communication, yet modern information retrieval systems remain predominantly designed for, and evaluated within, monolingual contexts. To bridge this critical disconnect, we present a holistic study dedicated to code-switching IR. We introduce CSR-L (Code-Sw…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.5

Dual-View Training for Instruction-Following Information Retrieval

2026-04-20 · Qingcheng Zeng, Puxuan Yu, Aman Mehta, Fuheng Zhao, Rajhans Samdani

General AI

Instruction-following information retrieval (IF-IR) studies retrieval systems that must not only find documents relevant to a query, but also obey explicit user constraints such as required attributes, exclusions, or output preferences. However, most retrievers are trained primarily for semantic relevance and often fai…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.5

Hybrid Policy Distillation for LLMs

2026-04-22 · Wenhong Zhu, Ruobing Xie, Rui Wang, Pengfei Liu

General AI

Knowledge distillation (KD) is a powerful paradigm for compressing large language models (LLMs), whose effectiveness depends on intertwined choices of divergence direction, optimization strategy, and data regime. We break down the design of existing KD methods and present a unified view that establishes connections bet…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.5

Learning Hippo: Multi-attractor Dynamics and Stability Effects in a Biologically Detailed CA3 Extension of Hopfield Networks

2026-04-22 · Daniele Corradetti, Renato Corradetti

Research Track A · General AI

We present a biologically detailed extension of the classical Hopfield/Marr auto-associative memory model for CA3, implementing ten populations (two asymmetric pyramidal subtypes, eight GABAergic interneuron classes), forty-seven compartments, multi-rule plasticity (recurrent Hebb, BCM anti-saturation, mossy-fiber shor…

Review
pending
Role
unreviewed
Read
now
huggingface Score 9.5

Encoder-Free Human Motion Understanding via Structured Motion Descriptions

2026-04-23 · Yao Zhang, Zhuchenyang Liu, Thomas Ploetz, Yu Xiao

General AI

The world knowledge and reasoning capabilities of text-based large language models (LLMs) are advancing rapidly, yet current approaches to human motion understanding, including motion question answering and captioning, have not fully exploited these capabilities. Existing LLM-based methods typically learn motion-langua…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.5

StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition

2026-04-23 · Kwan Yun, Changmin Lee, Ayeong Jeong, Youngseo Kim, Seungmi Lee, Junyong Noh

General AI

Creative face stylization aims to render portraits in diverse visual idioms such as cartoons, sketches, and paintings while retaining recognizable identity. However, current identity encoders, which are typically trained and calibrated on natural photographs, exhibit severe brittleness under stylization. They often mis…

Review
pending
Role
unreviewed
Read
now
huggingface Score 9.5

dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model

2026-04-24 · Yaxuan Li, Zhongyi Zhou, Yefei Chen, Yaokai Xue, Yichen Zhu

General AI

Evaluating robotics policies across thousands of environments and thousands of tasks is infeasible with existing approaches. This motivates the need for a new methodology for scalable robotics policy evaluation. In this paper, we propose dWorldEval, which uses a discrete diffusion world model as a scalable evaluation p…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.5

Zero-to-CAD: Agentic Synthesis of Interpretable CAD Programs at Million-Scale Without Real Data

2026-04-27 · Mohammadmehdi Ataei, Farzaneh Askari, Kamal Rahimi Malekshan, Pradeep Kumar Jayaraman

General AI

Computer-Aided Design (CAD) models are defined by their construction history: a parametric recipe that encodes design intent. However, existing large-scale 3D datasets predominantly consist of boundary representations (B-Reps) or meshes, stripping away this critical procedural information. To address this scarcity, we …

Review
pending
Role
unreviewed
Read
now
huggingface Score 9.5

Generative Quantum-inspired Kolmogorov-Arnold Eigensolver

2026-05-06 · Yu-Cheng Lin, Yu-Chao Hsu, I-Shan Tsai, Chun-Hua Lin, Kuo-Chung Peng, Jiun-Cheng Jiang, Yun-Yuan Wang, Tzung-Chi Huang, Tai-Yue Li, Kuan-Cheng Chen, Samuel Yen-Chi Chen, Nan-Yow Chen

General AI

High-performance computing (HPC) is increasingly important for scalable quantum chemistry workflows that couple classical generative models, quantum circuit simulation, and selected configuration interaction postprocessing. We present the generative quantum-inspired Kolmogorov-Arnold eigensolver (GQKAE), a parameter-ef…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.5

Debiased Model-based Representations for Sample-efficient Continuous Control

2026-05-12 · Jiafei Lyu, Zichuan Lin, Scott Fujimoto, Kai Yang, Yangkun Chen, Saiyong Yang, Zongqing Lu, Deheng Ye

General AI

Model-based representations recently stand out as a promising framework that embeds latent dynamics information into the representations for downstream off-policy actor-critic learning. It implicitly combines the advantages of both model-free and model-based approaches while avoiding the training costs associated with …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.5

Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction

2026-05-12 · Haoyu Zhang, Zeyu Zhang, Zedong Zhou, Yang Zhao, Hao Tang

General AI

Transformer-based 3D reconstruction has emerged as a powerful paradigm for recovering geometry and appearance from multi-view observations, offering strong performance across challenging visual conditions. As these models scale to larger backbones and higher-resolution inputs, improving their efficiency becomes increas…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.4

Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling

2026-04-29 · Zhen Zhang, Changyi Yang, Zijie Xia, Zhen Yang, Chengzhi Liu, Zhaotiao Weng, Yepeng Liu, Haobo Chen, Jin Pan, Chenyang Zhao, Yuheng Bu, Alkesh Patel, Zhe Gan, Xin Eric Wang

General AI

Token serves as the fundamental unit of computation in modern autoregressive models, and generation length directly influences both inference cost and reasoning performance. Despite its importance, existing approaches lack fine-grained length modeling, operating primarily at the coarse-grained sequence level. We introd…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.4

Online Self-Calibration Against Hallucination in Vision-Language Models

2026-05-01 · Minghui Chen, Chenxu Yang, Hengjie Zhu, Dayan Wu, Zheng Lin, Qingyi Si

General AI

Large Vision-Language Models (LVLMs) often suffer from hallucinations, generating descriptions that include visual details absent from the input image. Recent preference alignment methods typically rely on supervision distilled from stronger models such as GPT. However, this offline paradigm introduces a Supervision-Pe…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.4

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

2026-05-01 · Houyuan Chen, Hong Li, Xianghao Kong, Tianrui Zhu, Shaocong Xu, Weiqing Xiao, Yuwei Guo, Chongjie Ye, Lvmin Zhang, Hao Zhao, Anyi Rao

General AI

Recent progress has shown that video diffusion models (VDMs) can be repurposed for diverse multimodal graphics tasks. However, existing methods often train separate models for each problem setting, which fixes the input-output mapping and limits the modeling of correlations across modalities. We present UniVidX, a unif…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.4

Perceptual Flow Network for Visually Grounded Reasoning

2026-05-04 · Yangfu Li, Yuning Gong, Hongjian Zhan, Teng Li, Yuanhuiyi Lyu, Tianyi Chen, Qi Liu, Ziyuan Huang, Zhihang Zhong, Dandan Zheng, Yue Lu

General AI

Despite the success of Large-Vision Language Models (LVLMs), general optimization objectives (e.g., standard MLE) fail to constrain visual trajectories, leading to language bias and hallucination. To mitigate this, current methods introduce geometric priors from visual experts as additional supervision. However, we obs…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.3

Polynomial Expansion Rank Adaptation: Enhancing Low-Rank Fine-Tuning with High-Order Interactions

2026-04-12 · Wenhao Zhang, Lin Mu, Li Ni, Peiquan Jin, Yiwen Zhang

General AI

Low-rank adaptation (LoRA) is a widely used strategy for efficient fine-tuning of large language models (LLMs), but its strictly linear structure fundamentally limits expressive capacity. The bilinear formulation of weight updates captures only first-order dependencies between low-rank factors, restricting the modeling…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.3

Autonomous Diffractometry Enabled by Visual Reinforcement Learning

2026-04-13 · J. Oppliger, M. Stifter, A. Rüegg, I. Biało, L. Martinelli, P. G. Freeman, D. Prabhakaran, J. Zhao, Q. Wang, J. Chang

General AI

Automation underpins progress across scientific and industrial disciplines. Yet, automating tasks requiring interpretation of abstract visual information remain challenging. For example, crystal alignment strongly relies on humans with the ability to comprehend diffraction patterns. Here we introduce an autonomous syst…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.3

C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts

2026-04-13 · Chenxi Qing, Junxi Wu, Zheng Liu, Yixiang Qiu, Hongyao Yu, Bin Chen, Hao Wu, Shu-Tao Xia

General AI

Recently, large language models (LLMs) are capable of generating highly fluent textual content. While they offer significant convenience to humans, they also introduce various risks, like phishing and academic dishonesty. Numerous research efforts have been dedicated to developing algorithms for detecting AI-generated …

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.3

ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection

2026-04-13 · Wei Zhao, Zhe Li, Peixin Zhang, Jun Sun

General AI

Tool-augmented Large Language Model (LLM) agents have demonstrated impressive capabilities in automating complex, multi-step real-world tasks, yet remain vulnerable to indirect prompt injection. Adversaries exploit this weakness by embedding malicious instructions within tool-returned content, which agents directly inc…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.3

Detecting Safety Violations Across Many Agent Traces

2026-04-13 · Adam Stein, Davis Brown, Hamed Hassani, Mayur Naik, Eric Wong

General AI

To identify safety violations, auditors often search over large sets of agent traces. This search is difficult because failures are often rare, complex, and sometimes even adversarially hidden and only detectable when multiple traces are analyzed together. These challenges arise in diverse settings such as misuse campa…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.3

Dynamic Summary Generation for Interpretable Multimodal Depression Detection

2026-04-13 · Shiyu Teng, Jiaqing Liu, Hao Sun, Yu Li, Shurong Chai, Ruibo Hou, Tomoko Tateyama, Lanfen Lin, Yen-Wei Chen

General AI

Depression remains widely underdiagnosed and undertreated because stigma and subjective symptom ratings hinder reliable screening. To address this challenge, we propose a coarse-to-fine, multi-stage framework that leverages large language models (LLMs) for accurate and interpretable detection. The pipeline performs bin…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.3

Multi-Task LLM with LoRA Fine-Tuning for Automated Cancer Staging and Biomarker Extraction

2026-04-14 · Jiahao Shao, Anam Nawaz Khan, Christopher Brett, Tom Berg, Xueping Li, Bing Yao

General AI

Pathology reports serve as the definitive record for breast cancer staging, yet their unstructured format impedes large-scale data curation. While Large Language Models (LLMs) offer semantic reasoning, their deployment is often limited by high computational costs and hallucination risks. This study introduces a paramet…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.3

Enhancing Large Language Models with Retrieval Augmented Generation for Software Testing and Inspection Automation

2026-04-16 · Zoe Fingleton, Nazanin Siavash, Armin Moin

General AI

In this paper, we focus on automating two of the widely used Verification and Validation (V&V) activities in the Software Development Lifecycle (SDLC): Software testing and software inspection (also known as review). Concerning the former, we concentrate on automated test case generation using Large Language Models (LL…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.3

RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography

2026-04-16 · Mélanie Roschewitz, Kenneth Styppa, Yitian Tao, Jiwoong Sohn, Jean-Benoit Delbrouck, Benjamin Gundersen, Nicolas Deperrois, Christian Bluethgen, Julia Vogt, Bjoern Menze, Farhad Nooralahzadeh, Michael Krauthammer, Michael Moor

General AI

Vision-language models (VLM) have markedly advanced AI-driven interpretation and reporting of complex medical imaging, such as computed tomography (CT). Yet, existing methods largely relegate clinicians to passive observers of final outputs, offering no interpretable reasoning trace for them to inspect, validate, or re…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.3

Towards Faster Language Model Inference Using Mixture-of-Experts Flow Matching

2026-04-16 · Aihua Li

General AI

Flow matching retains the generation quality of diffusion models while enabling substantially faster inference, making it a compelling paradigm for generative modeling. However, when applied to language modeling, it exhibits fundamental limitations in representing complex latent distributions with irregular geometries,…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.3

UrbanClipAtlas: A Visual Analytics Framework for Event and Scene Retrieval in Urban Videos

2026-04-16 · Joel Perca, Luis Sante, Juanpablo Heredia, Joao Rulff, Claudio Silva, Jorge Poco

General AI

Extracting actionable insights from long-duration urban videos is often labor-intensive: analysts must manually sift through raw footage to pinpoint target events or uncover broader behavioral trends. In this work, we present URBANCLIPATLAS, a visual analytics system for exploring long urban videos recorded at street i…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.3

ECLASS-Augmented Semantic Product Search for Electronic Components

2026-04-21 · Nico Baumgart, Markus Lange-Hegermann, Jan Henze

General AI

Efficient semantic access to industrial product data is a key enabler for factory automation and emerging LLM-based agent workflows, where both human engineers and autonomous agents must identify suitable components from highly structured catalogs. However, the vocabulary mismatch between natural-language queries and a…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.3

Epistemic orientation in parliamentary discourse is associated with deliberative democracy

2026-04-21 · Segun Aroyehun, Stephan Lewandowsky, David Garcia

General AI

The pursuit of truth is central to democratic deliberation and governance, yet political discourse reflects varying epistemic orientations, ranging from evidence-based reasoning grounded in verifiable information to intuition-based reasoning rooted in beliefs and subjective interpretation. We introduce a scalable appro…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.3

SAMoRA: Semantic-Aware Mixture of LoRA Experts for Task-Adaptive Learning

2026-04-21 · Boyan Shi, Wei Chen, Shuyuan Zhao, Junfeng Shen, Shengnan Guo, Shaojiang Wang, Huaiyu Wan

General AI

The combination of Mixture-of-Experts (MoE) and Low-Rank Adaptation (LoRA) has shown significant potential for enhancing the multi-task learning capabilities of Large Language Models. However, existing methods face two primary challenges: (1)Imprecise Routing in the current MoE-LoRA method fails to explicitly match inp…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.3

Safe Continual Reinforcement Learning in Non-stationary Environments

2026-04-21 · Austin Coursey, Abel Diaz-Gonzalez, Marcos Quinones-Grueiro, Gautam Biswas

Research Track A · General AI

Reinforcement learning (RL) offers a compelling data-driven paradigm for synthesizing controllers for complex systems when accurate physical models are unavailable; however, most existing control-oriented RL methods assume stationarity and, therefore, struggle in real-world non-stationary deployments where system dynam…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.3

Render-in-the-Loop: Vector Graphics Generation via Visual Self-Feedback

2026-04-22 · Guotao Liang, Zhangcheng Wang, Juncheng Hu, Haitao Zhou, Ziteng Xue, Jing Zhang, Dong Xu, Qian Yu

General AI

Multimodal Large Language Models (MLLMs) have shown promising capabilities in generating Scalable Vector Graphics (SVG) via direct code synthesis. However, existing paradigms typically adopt an open-loop "blind drawing" approach, where models generate symbolic code sequences without perceiving intermediate visual outco…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.3

Synthesizing Multi-Agent Harnesses for Vulnerability Discovery

2026-04-22 · Hanzhi Liu, Chaofan Shou, Xiaonan Liu, Hongbo Wen, Yanju Chen, Ryan Jingyang Fang, Yu Feng

General AI

LLM agents have begun to find real security vulnerabilities that human auditors and automated fuzzers missed for decades, in source-available targets where the analyst can build and instrument the code. In practice the work is split among several agents, wired together by a harness: the program that fixes which roles e…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.3

VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis

2026-04-23 · Songen Gu, Yuhang Zheng, Weize Li, Yupeng Zheng, Yating Feng, Xiang Li, Yilun Chen, Pengfei Li, Wenchao Ding

General AI

Recently, end-to-end robotic manipulation models have gained significant attention for their generalizability and scalability. However, they often suffer from limited robustness to camera viewpoint changes when training with a fixed camera. In this paper, we propose VistaBot, a novel framework that integrates feed-forw…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.3

When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs

2026-04-23 · Pegah Khayatan, Jayneel Parekh, Arnaud Dapogny, Mustafa Shukor, Alasdair Newson, Matthieu Cord

General AI

Despite impressive progress in capabilities of large vision-language models (LVLMs), these systems remain vulnerable to hallucinations, i.e., outputs that are not grounded in the visual input. Prior work has attributed hallucinations in LVLMs to factors such as limitations of the vision backbone or the dominance of the…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.3

An LLM-Driven Closed-Loop Autonomous Learning Framework for Robots Facing Uncovered Tasks in Open Environments

2026-04-24 · Hong Su

General AI

Autonomous robots operating in open environments need the ability to continuously handle tasks that are not covered by predefined local methods. However, existing approaches often rely on repeated large-language-model (LLM) interaction for uncovered tasks, and even successful executions or observed successful external …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.3

RealBench: A Repo-Level Code Generation Benchmark Aligned with Real-World Software Development Practices

2026-04-24 · Jia Li, Hongyi Deng, Yiran Zhang, Kechi Zhang, Tianqi Shao, Tiankuo Zhao, Weinan Wang, Zhi Jin, Ge Li, Yang Liu, Yingtao Fang, Yihong Dong

General AI

Writing code requires significant time and effort in software development. To automate this process, researchers have made substantial progress using Large Language Models (LLMs) for code generation. Many benchmarks like HumanEval and EvoCodeBench have been created to evaluate LLMs by requiring them to generate code fr…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.3

Don't Pause! Every prediction matters in a streaming video

2026-04-27 · Dibyadip Chatterjee, Zhanzhong Pang, Fadime Sener, Yale Song, Angela Yao

General AI

Streaming video models should respond the moment an event unfolds, not after the moment has passed. Yet existing online VideoQA benchmarks remain largely retrospective. They pause the video at fixed timestamps, pose questions about current or past events, and score models only at those moments. This protocol leaves str…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.3

PhysNote: Self-Knowledge Notes for Evolvable Physical Reasoning in Vision-Language Model

2026-04-27 · Sinin Zhang, Yunfei Xie, Yuxuan Cheng, Haoyu Zhang, Tong Zhang

General AI

Vision-Language Models (VLMs) have demonstrated strong performance on textbook-style physics problems, yet they frequently fail when confronted with dynamic real-world scenarios that require temporal consistency and causal reasoning across frames. We identify two fundamental challenges underlying these failures: (1) sp…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.3

SpecRLBench: A Benchmark for Generalization in Specification-Guided Reinforcement Learning

2026-04-27 · Zijian Guo, İlker Işık, H. M. Sabbir Ahmad, Wenchao Li

General AI

Specification-guided reinforcement learning (RL) provides a principled framework for encoding complex, temporally extended tasks using formal specifications such as linear temporal logic (LTL). While recent methods have shown promising results, their ability to generalize across unseen specifications and diverse enviro…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.3

QCalEval: Benchmarking Vision-Language Models for Quantum Calibration Plot Understanding

2026-04-28 · Shuxiang Cao, Zijian Zhang, Abhishek Agarwal, Grace Bratrud, Niyaz R. Beysengulov, Daniel C. Cole, Alejandro Gómez Frieiro, Elena O. Glen, Hao Hsu, Gang Huang, Raymond Jow, Greshma Shaji, Tom Lubowe, Ligeng Zhu, Luis Mantilla Calderón, Nicola Pancotti, Joel Pendleton, Brandon Severin, Charles Etienne Staub, Sara Sussman, Antti Vepsäläinen, Neel Rajeshbhai Vora, Yilun Xu, Varinia Bernales, Daniel Bowring, Elica Kyoseva, Ivan Rungger, Giulia Semeghini, Sam Stanwyck, Timothy Costa, Alán Aspuru-Guzik, Krysta Svore

Research Track A · General AI

Quantum computing calibration depends on interpreting experimental data, and calibration plots provide the most universal human-readable representation for this task, yet no systematic evaluation exists of how well vision-language models (VLMs) interpret them. We introduce QCalEval, the first VLM benchmark for quantum …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.2

DreamProver: Evolving Transferable Lemma Libraries via a Wake-Sleep Theorem-Proving Agent

2026-04-29 · Youyuan Zhang, Jialiang Sun, Hangrui Bi, Chuqin Geng, Wenjie Ma, Zhaoyu Li, Xujie Si

General AI

We introduce DreamProver, an agentic framework that leverages a "wake-sleep" program induction paradigm to discover reusable lemmas for formal theorem proving. Existing approaches either rely on fixed lemma libraries, which limit adaptability, or synthesize highly specific intermediate lemmas tailored to individual the…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.2

Graph-based Semantic Calibration Network for Unaligned UAV RGBT Image Semantic Segmentation and A Large-scale Benchmark

2026-04-29 · Fangqiang Fan, Zhicheng Zhao, Xiaoliang Ma, Chenglong Li, Jin Tang

General AI

Fine-grained RGBT image semantic segmentation is crucial for all-weather unmanned aerial vehicle (UAV) scene understanding. However, UAV RGBT semantic segmentation faces two coupled challenges: cross-modal spatial misalignment caused by sensor parallax and platform vibration, and severe semantic confusion among fine-gr…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.2

Select to Think: Unlocking SLM Potential with Local Sufficiency

2026-04-29 · Wenxuan Ye, Yangyang Zhang, Xueli An, Georg Carle, Yunpu Ma

General AI

Small language models (SLMs) offer computational efficiency for scalable deployment, yet they often fall short of the reasoning power exhibited by their larger counterparts (LLMs). To mitigate this gap, current approaches invoke an LLM to generate tokens at points of reasoning divergence, but these external calls intro…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.2

Walk With Me: Long-Horizon Social Navigation for Human-Centric Outdoor Assistance

2026-04-29 · Lingfeng Zhang, Xiaoshuai Hao, Xizhou Bu, Yingbo Tang, Hongsheng Li, Jinghui Lu, Xiu-shen Wei, Jiayi Ma, Yu Liu, Jing Zhang, Hangjun Ye, Xiaojun Liang, Long Chen, Wenbo Ding

General AI

Assisting humans in open-world outdoor environments requires robots to translate high-level natural-language intentions into safe, long-horizon, and socially compliant navigation behavior. Existing map-based methods rely on costly pre-built HD maps, while learning-based policies are mostly limited to indoor and short-h…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.2

Enhancing RL Generalizability in Robotics through SHAP Analysis of Algorithms and Hyperparameters

2026-05-04 · Lingxiao Kong, Cong Yang, Oya Deniz Beyan, Zeyd Boukhers

General AI

Despite significant advances in Reinforcement Learning (RL), model performance remains highly sensitive to algorithm and hyperparameter configurations, while generalization gaps across environments complicate real-world deployment. Although prior work has studied RL generalization, the relative contribution of specific…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.2

HAAS: A Policy-Aware Framework for Adaptive Task Allocation Between Humans and Artificial Intelligence Systems

2026-05-04 · Vicente Pelechanoa, Antoni Mestre, Manoli Albert, Miriam Gil

General AI

Deciding how to distribute work between humans and AI systems is a central challenge in organisational design. Most approaches treat this as a binary choice, yet the operational reality is richer: humans and AI routinely share tasks or take complementary roles depending on context, fatigue, and the stakes involved. Gov…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.2

MolmoAct2: Action Reasoning Models for Real-world Deployment

2026-05-04 · Haoquan Fang, Jiafei Duan, Donovan Clay, Sam Wang, Shuo Liu, Weikai Huang, Xiang Fan, Wei-Chuan Tsai, Shirui Chen, Yi Ru Wang, Shanli Xing, Jaemin Cho, Jae Sung Park, Ainaz Eftekhar, Peter Sushko, Karen Farley, Angad Wadhwa, Cole Harrison, Winson Han, Ying-Chun Lee, Eli VanderBilt, Rose Hendrix, Suveen Ellawela, Lucas Ngoo, Joyce Chai, Zhongzheng Ren, Ali Farhadi, Dieter Fox, Ranjay Krishna

General AI

Vision-Language-Action (VLA) models aim to provide a single generalist controller for robots, but today's systems fall short on the criteria that matter for real-world deployment. Frontier models are closed, open-weight alternatives are tied to expensive hardware, reasoning-augmented policies pay prohibitive latency fo…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.2

Virtual Scanning for NSCLC Histology: Investigating the Discriminatory Power of Synthetic PET

2026-05-04 · Fatih Aksu, Laura Ciuffetti, Francesco Di Feola, Filippo Ruffini, Giulia Romoli, Fabrizia Gelardi, Arturo Chiti, Valerio Guarrasi, Paolo Soda

General AI

Accurate histological differentiation between adenocarcinoma (ADC) and squamous cell carcinoma (SCC) is critical for personalized treatment in non-small cell lung cancer (NSCLC). While [$^{18}$F]FDG PET/CT is a standard tool for the clinical evaluation of lung cancer, its utility is often limited by high costs and radi…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.0

VectorGym: A Multitask Benchmark for SVG Code Generation, Sketching, and Editing

2026-02-22 · Juan Rodriguez, Haotian Zhang, Abhay Puri, Tianyang Zhang, Rishav Pramanik, Meng Lin, Xiaoqing Xie, Marco Terral, Darsh Kaushik, Aly Shariff, Perouz Taslakian, Spandana Gella, Sai Rajeswar, David Vazquez, Christopher Pal, Marco Pedersoli

General AI

We introduce VectorGym, a comprehensive benchmark suite for Scalable Vector Graphics (SVG) that spans generation from text and sketches, complex editing, and visual understanding. VectorGym addresses the lack of realistic, challenging benchmarks aligned with professional design workflows. Our benchmark comprises four t…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.0

Working Notes on Late Interaction Dynamics: Analyzing Targeted Behaviors of Late Interaction Models

2026-03-27 · Antoine Edy, Max Conti, Quentin Macé

General AI

While Late Interaction models exhibit strong retrieval performance, many of their underlying dynamics remain understudied, potentially hiding performance bottlenecks. In this work, we focus on two topics in Late Interaction retrieval: a length bias that arises when using multi-vector scoring, and the similarity distrib…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.0

Alertness Optimization for Shift Workers Using a Physiology-based Mathematical Model

2026-03-30 · Zidi Tao, A. Agung Julius, John T Wen

Research Track A

Sleep is vital for maintaining cognitive function, facilitating metabolic waste removal, and supporting memory consolidation. However, modern societal demands, particularly shift work, often disrupt natural sleep patterns. This can induce excessive sleepiness among shift workers in critical sectors such as healthcare a…

Review
pending
Role
unreviewed
Read
now
huggingface Score 9.0

DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing

2026-03-30 · Kailai Feng, Yuxiang Wei, Bo Chen, Yang Pan, Hu Ye, Songwei Liu, Chenqian Yan, Yuan Gao

General AI

Diffusion models have made significant progress in both text-to-image (T2I) generation and text-guided image editing. However, these models are typically built with billions of parameters, leading to high latency and increased deployment challenges. While on-device diffusion models improve efficiency, they largely focu…

Review
pending
Role
unreviewed
Read
now
huggingface Score 9.0

CutClaw: Agentic Hours-Long Video Editing via Music Synchronization

2026-03-31 · Shifang Zhao, Yihan Hu, Ying Shan, Yunchao Wei, Xiaodong Cun

General AI

Editing the video content with audio alignment forms a digital human-made art in current social media. However, the time-consuming and repetitive nature of manual video editing has long been a challenge for filmmakers and professional content creators alike. In this paper, we introduce CutClaw, an autonomous multi-agen…

Review
pending
Role
unreviewed
Read
now
huggingface Score 9.0

MedGemma 1.5 Technical Report

2026-04-06 · Andrew Sellergren, Chufan Gao, Fereshteh Mahvar, Timo Kohlberger, Fayaz Jamil, Madeleine Traverse, Alberto Tono, Bashir Sadjad, Lin Yang, Charles Lau, Liron Yatziv, Tiffany Chen, Bram Sterling, Kenneth Philbrick, Richa Tiwari, Yun Liu, Madhuram Jajoo, Chandrashekar Sankarapu, Swapnil Vispute, Harshad Purandare, Abhishek Bijay Mishra, Sam Schmidgall, Tao Tu, Anil Palepu, Chunjong Park, Tim Strother, Rahul Thapa, Yong Cheng, Preeti Singh, Kat Black, Yossi Matias, Katherine Chou, Avinatan Hassidim, Kavi Goel, Joelle Barral, Tris Warkentin, Shravya Shetty, Dale Webster, Sunny Virmani, David F. Steiner, Can Kirmizibayrak, Daniel Golden

General AI

We introduce MedGemma 1.5 4B, the latest model in the MedGemma collection. MedGemma 1.5 expands on MedGemma 1 by integrating additional capabilities: high-dimensional medical imaging (CT/MRI volumes and histopathology whole slide images), anatomical localization via bounding boxes, multi-timepoint chest X-ray analysis,…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.0

Small Vision-Language Models are Smart Compressors for Long Video Understanding

2026-04-09 · Junjie Fei, Jun Chen, Zechun Liu, Yunyang Xiong, Chong Zhou, Wei Wen, Junlin Han, Mingchen Zhuge, Saksham Suri, Qi Qian, Shuming Liu, Lemeng Wu, Raghuraman Krishnamoorthi, Vikas Chandra, Mohamed Elhoseiny, Chenchen Zhu

General AI

Adapting Multimodal Large Language Models (MLLMs) for hour-long videos is bottlenecked by context limits. Dense visual streams saturate token budgets and exacerbate the lost-in-the-middle phenomenon. Existing heuristics, like sparse sampling or uniform pooling, blindly sacrifice fidelity by discarding decisive moments …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.0

Failure Ontology: A Lifelong Learning Framework for Blind Spot Detection and Resilience Design

2026-04-12 · Yuan Sun, Hong Yi, Jinyuan Liu

Research Track A

Personalized learning systems are almost universally designed around a single objective: help people acquire knowledge and skills more efficiently. We argue this framing misses the more consequential problem. The most damaging failures in human life-financial ruin, health collapse, professional obsolescence-are rarely …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.0

Physics-Informed Neural Networks for Methane Sorption: Cross-Gas Transfer Learning, Ensemble Collapse Under Physics Constraints, and Monte Carlo Dropout Uncertainty Quantification

2026-04-15 · Mohammad Nooraiepour, Zezhang Song, Wei Li, Sarah Perez

Research Track A

Accurate methane sorption prediction across heterogeneous coal ranks requires models that combine thermodynamic consistency, efficient knowledge transfer across data-scarce geological systems, and calibrated uncertainty estimates, capabilities that are rarely addressed together in existing frameworks. We present a phys…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.0

MIPIC: Matryoshka Representation Learning via Self-Distilled Intra-Relational and Progressive Information Chaining

2026-04-27 · Phung Gia Huy, Hai An Vu, Minh-Phuc Truong, Thang Duc Tran, Linh Ngo Van, Thanh Hong Nguyen, Trung Le

Research Track A · General AI

Representation learning is fundamental to NLP, but building embeddings that work well at different computational budgets is challenging. Matryoshka Representation Learning (MRL) offers a flexible inference paradigm through nested embeddings; however, learning such structures requires explicit coordination of how inform…

Review
pending
Role
unreviewed
Read
now
huggingface Score 9.0

RaguTeam at SemEval-2026 Task 8: Meno and Friends in a Judge-Orchestrated LLM Ensemble for Faithful Multi-Turn Response Generation

2026-05-06 · Ivan Bondarenko, Roman Derunets, Oleg Sedukhin, Mikhail Komarov, Ivan Chernov, Mikhail Kulakov

General AI

We present our winning system for Task~B (generation with reference passages) in SemEval-2026 Task~8: MTRAGEval. Our method is a heterogeneous ensemble of seven LLMs with two prompting variants, where a GPT-4o-mini judge selects the best candidate per instance. We ranked 1st out of 26 teams, achieving a conditioned har…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.0

CXR-ContraBench: Benchmarking Negated-Option Attraction in Medical VLMs

2026-05-07 · Zhengru Fang, Yanan Ma, Yu Guo, Senkang Hu, Yixian Zhang, Hangcheng Cao, Wenbo Ding, Yuguang Fang

Research Track A · General AI

When a chest X-ray shows consolidation but the question asks which finding is present, a medical vision-language model may answer "No consolidation." This is more than an incorrect choice: it is a polarity reversal that emits a clinical statement contradicting the image. We study this failure as negated-option attracti…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.0

TIDE: Every Layer Knows the Token Beneath the Context

2026-05-07 · Ajay Jaiswal, Lauren Hannah, Han-Byul Kim, Duc Hoang, Mehrdad Farajtabar, Minsik Cho

General AI

We revisit a universally accepted but under-examined design choice in every modern LLM: a token index is looked up once at the input embedding layer and then permanently discarded. This single-injection assumption induces two structural failures: (i) the Rare Token Problem, where a Zipf-type distribution of vocabulary …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Not All Layers Are Created Equal: Adaptive LoRA Ranks for Personalized Image Generation

2026-03-23 · Donald Shenaj, Federico Errica, Antonio Carta

General AI

Low Rank Adaptation (LoRA) is the de facto fine-tuning strategy to generate personalized images from pre-trained diffusion models. Choosing a good rank is extremely critical, since it trades off performance and memory consumption, but today the decision is often left to the community's consensus, regardless of the pers…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

MetaKube: An Experience-Aware LLM Framework for Kubernetes Failure Diagnosis

2026-03-24 · Wei Sun, Ting Wang, Xinran Tian, Wanshun Lan, Xuhan Feng, Haoyue Li, Fangxin Wang

General AI

Existing LLM-based Kubernetes diagnostic systems cannot learn from operational experience, operating on static knowledge bases without improving from past resolutions. We present MetaKube, an experience-aware LLM framework through three synergistic innovations: (1) an Episodic Pattern Memory Network (EPMN) that abstrac…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Beyond Benchmarks: How Users Evaluate AI Chat Assistants

2026-03-26 · Moiz Sadiq Awan, Muhammad Haris Noor, Muhammad Salman Munaf

Research Track A · General AI

Automated benchmarks dominate the evaluation of large language models, yet no systematic study has compared user satisfaction, adoption motivations, and frustrations across competing platforms using a consistent instrument. We address this gap with a cross-platform survey of 388 active AI chat users, comparing satisfac…

Review
pending
Role
unreviewed
Read
now
arxiv Score 8.8

Hierarchy-Guided Multimodal Representation Learning for Taxonomic Inference

2026-03-26 · Sk Miraj Ahmed, Xi Yu, Yunqi Li, Yuewei Lin, Wei Xu

General AI

Accurate biodiversity identification from large-scale field data is a foundational problem with direct impact on ecology, conservation, and environmental monitoring. In practice, the core task is taxonomic prediction - inferring order, family, genus, or species from imperfect inputs such as specimen images, DNA barcode…

Review
pending
Role
unreviewed
Read
now
arxiv Score 8.8

RefAlign: Representation Alignment for Reference-to-Video Generation

2026-03-26 · Lei Wang, YuXin Song, Ge Wu, Haocheng Feng, Hang Zhou, Jingdong Wang, Yaxing Wang, jian Yang

General AI

Reference-to-video (R2V) generation is a controllable video synthesis paradigm that constrains the generation process using both text prompts and reference images, enabling applications such as personalized advertising and virtual try-on. In practice, existing R2V methods typically introduce additional high-level seman…

Review
pending
Role
unreviewed
Read
now
arxiv Score 8.8

KAT-Coder-V2 Technical Report

2026-03-29 · Fengxiang Li, Han Zhang, Haoyang Huang, Jinghui Wang, Jinhua Hao, Kun Yuan, Mengtong Li, Minglei Zhang, Pengcheng Xu, Wenhao Zhuang, Yizhen Shao, Zongxian Feng, Can Tang, Chao Wang, Chengxiao Tong, Fan Yang, Gang Xiong, Haixuan Gao, Han Gao, Hao Wang, Haochen Liu, Hongliang Sun, Jiabao Li, Jingwen Chang, Jun Du, Junyi Peng, Leizhen Cui, Meimei Jing, Mingqi Wu, Shangpeng Yan, Shaotong Qi, Suzhe Xu, Wenxuan Zhao, Xianda Sun, Xuan Xie, Yanbo Wang, Yao Xia, Yinghan Cui, Yingpeng Chen, Yong Wang, Yuze Shi, Zhiwei Shen, Ziyu Wang, Ming Sun, Lin Ye, Bin Chen

General AI

We present KAT-Coder-V2, an agentic coding model developed by the KwaiKAT team at Kuaishou. KAT-Coder-V2 adopts a "Specialize-then-Unify" paradigm that decomposes agentic coding into five expert domains - SWE, WebCoding, Terminal, WebSearch, and General - each undergoing independent supervised fine-tuning and reinforce…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

A Techno-Economic Framework for Cost Modeling and Revenue Opportunities in Open and Programmable AI-RAN

2026-03-30 · Gabriele Gemmi, Michele Polese, Tommaso Melodia

General AI

The large-scale deployment of 5G networks has not delivered the expected return on investment for mobile network operators, raising concerns about the economic viability of future 6G rollouts. At the same time, surging demand for Artificial Intelligence (AI) inference and training workloads is straining global compute …

Review
pending
Role
unreviewed
Read
now
arxiv Score 8.8

DRIVE-Nav: Directional Reasoning, Inspection, and Verification for Efficient Open-Vocabulary Navigation

2026-03-30 · Maoguo Gao, Zejun Zhu, Zhiming Sun, Zhengwei Ma, Longze Yuan, Zhongjing Ma, Zhigang Gao, Jinhui Zhang, Suli Zou

General AI

Open-Vocabulary Object Navigation (OVON) requires an embodied agent to locate a language-specified target in unknown environments. Existing zero-shot methods often reason over dense frontier points under incomplete observations, causing unstable route selection, repeated revisits, and unnecessary action overhead. We pr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 8.8

Dynamic Lookahead Distance via Reinforcement Learning-Based Pure Pursuit for Autonomous Racing

2026-03-30 · Mohamed Elgouhary, Amr S. El-Wakeel

General AI

Pure Pursuit (PP) is a widely used path-tracking algorithm in autonomous vehicles due to its simplicity and real-time performance. However, its effectiveness is sensitive to the choice of lookahead distance: shorter values improve cornering but can cause instability on straights, while longer values improve smoothness …

Review
pending
Role
unreviewed
Read
now
arxiv Score 8.8

Graphilosophy: Graph-Based Digital Humanities Computing with The Four Books

2026-03-30 · Minh-Thu Do, Quynh-Chau Le-Tran, Duc-Duy Nguyen-Mai, Thien-Trang Nguyen, Khanh-Duy Le, Minh-Triet Tran, Tam V. Nguyen, Trung-Nghia Le

General AI

The Four Books have shaped East Asian intellectual traditions, yet their multi-layered interpretive complexity limits their accessibility in the digital age. While traditional bilingual commentaries provide a vital pedagogical bridge, computational frameworks are needed to preserve and explore this wisdom. This paper b…

Review
pending
Role
unreviewed
Read
now
arxiv Score 8.8

GreenFLag: A Green Agentic Approach for Energy-Efficient Federated Learning

2026-03-31 · Theodora Panagea, Nikolaos Koursioumpas, Lina Magoula, Ramin Khalili

General AI

Progressing toward a new generation of mobile networks, a clear focus on integrating distributed intelligence across the system is observed to drive performance, autonomy, and real-time adaptability. Federated learning (FL) stands out as a key emerging technique, enabling on-device model training while preserving data …

Review
pending
Role
unreviewed
Read
now
arxiv Score 8.8

SkillReducer: Optimizing LLM Agent Skills for Token Efficiency

2026-03-31 · Yudong Gao, Zongjie Li, Yuanyuanyuan, Zimo Ji, Pingchuan Ma, Shuai Wang

General AI

LLM-based coding agents rely on \emph{skills}, pre-packaged instruction sets that extend agent capabilities, yet every token of skill content injected into the context window incurs both monetary cost and attention dilution. To understand the severity of this problem, we conduct a large-scale empirical study of 55,315 …

Review
pending
Role
unreviewed
Read
now
arxiv Score 8.8

Adapting Text LLMs to Speech via Multimodal Depth Up-Scaling

2026-04-01 · Kazuki Yano, Jun Suzuki, Shinji Watanabe

General AI

Adapting pre-trained text Large Language Models (LLMs) into Speech Language Models (Speech LMs) via continual pretraining on speech data is promising, but often degrades the original text capabilities. We propose Multimodal Depth Upscaling, an extension of an emerging strategy in continual LLM pre-training, where new t…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Answering the Wrong Question: Reasoning Trace Inversion for Abstention in LLMs

2026-04-02 · Abinitha Gourabathina, Inkit Padhi, Manish Nagireddy, Subhajit Chaudhury, Prasanna Sattigeri

General AI

For Large Language Models (LLMs) to be reliably deployed, models must effectively know when not to answer: abstain. Reasoning models, in particular, have gained attention for impressive performance on complex tasks. However, reasoning models have been shown to have worse abstention abilities. Taking the vulnerabilities…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Impact of Multimodal and Conversational AI on Learning Outcomes and Experience

2026-04-02 · Karan Taneja, Anjali Singh, Ashok K. Goel

General AI

Multimodal Large Language Models (MLLMs) offer an opportunity to support multimedia learning through conversational systems grounded in educational content. However, while conversational AI is known to boost engagement, its impact on learning in visually-rich STEM domains remains under-explored. Moreover, there is limi…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

VOID: Video Object and Interaction Deletion

2026-04-02 · Saman Motamed, William Harvey, Benjamin Klein, Luc Van Gool, Zhuoning Yuan, Ta-Ying Cheng

General AI

Existing video object removal methods excel at inpainting content "behind" the object and correcting appearance-level artifacts such as shadows and reflections. However, when the removed object has more significant interactions, such as collisions with other objects, current models fail to correct them and produce impl…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Debiasing LLMs by Fine-tuning

2026-04-03 · Zhenyu Gao, Wenxi Jiang, Yutong Yan

General AI

Prior research shows that large language models (LLMs) exhibit systematic extrapolation bias when forming predictions from both experimental and real-world data, and that prompt-based approaches appear limited in alleviating this bias. We propose a supervised fine-tuning (SFT) approach that uses Low-Rank Adaptation (Lo…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Scalable Variational Bayesian Fine-Tuning of LLMs via Orthogonalized Low-Rank Adapters

2026-04-03 · Haotian Xiang, Bingcong Li, Qin Lu

General AI

When deploying large language models (LLMs) to safety-critical applications, uncertainty quantification (UQ) is of utmost importance to self-assess the reliability of the LLM-based decisions. However, such decisions typically suffer from overconfidence, particularly after parameter-efficient fine-tuning (PEFT) for down…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Agentic Federated Learning: The Future of Distributed Training Orchestration

2026-04-06 · Rafael O. Jarczewski, Gabriel U. Talasso, Leandro Villas, Allan M. de Souza

General AI

Although Federated Learning (FL) promises privacy and distributed collaboration, its effectiveness in real-world scenarios is often hampered by the stochastic heterogeneity of clients and unpredictable system dynamics. Existing static optimization approaches fail to adapt to these fluctuations, resulting in resource un…

Review
pending
Role
unreviewed
Read
now
arxiv Score 8.8

Analyzing Symbolic Properties for DRL Agents in Systems and Networking

2026-04-06 · Mohammad Zangooei, Jannis Weil, Amr Rizk, Mina Tahmasbi Arashloo, Raouf Boutaba

General AI

Deep reinforcement learning (DRL) has shown remarkable performance on complex control problems in systems and networking, including adaptive video streaming, wireless resource management, and congestion control. For safe deployment, however, it is critical to reason about how agents behave across the range of system st…

Review
pending
Role
unreviewed
Read
now
arxiv Score 8.8

Gym-Anything: Turn any Software into an Agent Environment

2026-04-07 · Pranjal Aggarwal, Graham Neubig, Sean Welleck

General AI

Computer-use agents hold the promise of assisting in a wide range of digital economic activities. However, current research has largely focused on short-horizon tasks over a limited set of software with limited economic value, such as basic e-commerce and OS-configuration tasks. A key reason is that creating environmen…

Review
pending
Role
unreviewed
Read
now
arxiv Score 8.8

Lightweight Multimodal Adaptation of Vision Language Models for Species Recognition and Habitat Context Interpretation in Drone Thermal Imagery

2026-04-07 · Hao Chen, Fang Qiu, Fangchao Dong, Defei Yang, Eve Bohnett, Li An

General AI

This study proposes a lightweight multimodal adaptation framework to bridge the representation gap between RGB-pretrained VLMs and thermal infrared imagery, and demonstrates its practical utility using a real drone-collected dataset. A thermal dataset was developed from drone-collected imagery and was used to fine-tune…

Review
pending
Role
unreviewed
Read
now
arxiv Score 8.8

MAESTRO: Adapting GUIs and Guiding Navigation with User Preferences in Conversational Agents with GUIs

2026-04-07 · Sangwook Lee, Sang Won Lee, Adnan Abbas, Young-Ho Kim, Yan Chen

General AI

Modern task-oriented chatbots present GUI elements alongside natural-language dialogue, yet the agent's role has largely been limited to interpreting natural-language input as GUI actions and following a linear workflow. In preference-driven, multi-step tasks such as booking a flight or reserving a restaurant, earlier …

Review
pending
Role
unreviewed
Read
now
arxiv Score 8.8

PSI: Shared State as the Missing Layer for Coherent AI-Generated Instruments in Personal AI Agents

2026-04-09 · Zhiyuan Wang, Erzhen Hu, Mark Rucker, Laura E. Barnes

General AI

Personal AI tools can now be generated from natural-language requests, but they often remain isolated after creation. We present PSI, a shared-state architecture that turns independently generated modules into coherent instruments: persistent, connected, and chat-complementary artifacts accessible through both GUIs and…

Review
pending
Role
unreviewed
Read
now
arxiv Score 8.8

Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers

2026-04-28 · Jan Dubiński, Jan Betley, Anna Sztyber-Betley, Daniel Tan, Owain Evans

General AI

Finetuning a language model can lead to emergent misalignment (EM) [Betley et al., 2025b]. Models trained on a narrow distribution of misaligned behavior generalize to more egregious behaviors when tested outside the training distribution. We study a set of interventions proposed to reduce EM. We confirm that these int…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Luminol-AIDetect: Fast Zero-shot Machine-Generated Text Detection based on Perplexity under Text Shuffling

2026-04-28 · Lucio La Cava, Andrea Tagarelli

General AI

Machine-generated text (MGT) detection requires identifying structurally invariant signals across generation models, rather than relying on model-specific fingerprints. In this respect, we hypothesize that while large language models excel at local semantic consistency, their autoregressive nature results in a specific…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Semi-Markov Reinforcement Learning for City-Scale EV Ride-Hailing with Feasibility-Guaranteed Actions

2026-04-28 · An Nguyen, Hoang Nguyen, Phuong Le, Hung Pham, Cuong Do, Laurent El Ghaoui

General AI

We study city-scale control of electric-vehicle (EV) ride-hailing fleets where dispatch, repositioning, and charging decisions must respect charger and feeder limits under uncertain, spatially correlated demand and travel times. We formulate the problem as a hex-grid semi-Markov decision process (semi-MDP) with mixed a…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Relit-LiVE: Relight Video by Jointly Learning Environment Video

2026-05-07 · Weiqing Xiao, Hong Li, Xiuyu Yang, Houyuan Chen, Wenyi Li, Tianqi Liu, Shaocong Xu, Chongjie Ye, Hao Zhao, Beibei Wang

General AI

Recent advances have shown that large-scale video diffusion models can be repurposed as neural renderers by first decomposing videos into intrinsic scene representations and then performing forward rendering under novel illumination. While promising, this paradigm fundamentally relies on accurate intrinsic decompositio…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Rethinking Adapter Placement: A Dominant Adaptation Module Perspective

2026-05-07 · Suoxin Zhang, Run He, Di Fang, Xiang Tan, Kaixuan Chen, Huiping Zhuang

General AI

Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method that places trainable low-rank adapters into frozen pre-trained models. Recent studies show that using fewer LoRA adapters may still maintain or even improve performance, but existing methods still distribute adapters broadly, leaving wh…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation

2026-05-07 · Abdelrahman Zaian, Sheethal Bhat, Mohamed Abdalkader, Andreas Maier

General AI

Diabetic Retinopathy (DR) is a leading cause of preventable blindness among working-age adults worldwide, yet most automated screening systems are limited to image-level classification and lack clinically structured reporting. We propose Retina-RAG, a low-cost modular framework that jointly performs DR severity grading…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.6

xApp Empowered Resource Management for Non-Terrestrial Users in 5G O-RAN Networks

2026-05-11 · Mohammed M. H. Qazzaz, Syed Ali Zaidi, Aubida A. Al-Hameed, Abdelaziz Salama, Des Mclernon

General AI

This paper introduces a proactive Unmanned Aerial Vehicle (UAV) mobility management xApp for Open Radio Access Network (O-RAN) Near Real-Time Radio Intelligent Controller (Near-RT RIC) environments, employing Double Deep Q-Network (DDQN) reinforcement learning (RL) enhanced with transfer learning to optimise handover d…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.5

Anansi: Scalable Characterization of Message-Based Job Scams

2026-02-27 · Abisheka Pitumpe, Amir Rahmati

Research Track B · General AI

Job-based smishing scams, where victims are recruited under the guise of remote job opportunities, represent a rapidly growing and understudied threat within the broader landscape of online fraud. In this paper, we present Anansi, the first scalable, end-to-end measurement pipeline designed to systematically engage wit…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.5

Associative Constructive Evolution: Enhancing Metaheuristics through Hebbian-Learned Generative Guidance

2026-03-31 · Shanxian Lin, Yuichi Nagata, Haichuan Yang

Research Track A

Metaheuristic algorithms such as Particle Swarm Optimization (PSO) and Evolutionary Algorithms (EA) excel at exploring solution spaces but lack mechanisms to accumulate and reuse procedural knowledge from successful search trajectories. This paper proposes Associative Constructive Evolution (ACE), a framework that enha…

Review
pending
Role
unreviewed
Read
now
arxiv Score 8.5

NetSecBed: A Container-Native Testbed for Reproducible Cybersecurity Experimentation

2026-04-05 · Leonardo Bitzki, Diego Kreutz, Tiago Heinrich, Douglas Fideles, Leandro Bertholdo, Silvio Quincozes, Angelo Diniz

Research Track A

Cybersecurity research increasingly depends on reproducible evidence, such as traffic traces, logs, and labeled datasets, yet most public datasets remain static and offer limited support for controlled re-execution and traceability, especially in heterogeneous multi-protocol environments. This paper presents NetSecBed,…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.5

Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator

2026-04-09 · Luozheng Qin, Jia Gong, Qian Qiao, Tianjiao Li, Li Xu, Haoyu Pan, Chao Qu, Zhiyu Tan, Hao Li

General AI

Unified multimodal models integrating visual understanding and generation face a fundamental challenge: visual generation incurs substantially higher computational costs than understanding, particularly for video. This imbalance motivates us to invert the conventional paradigm: rather than extending understanding-centr…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.5

Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music

2026-04-13 · Sreyan Ghosh, Arushi Goel, Kaousheik Jayakumar, Lasha Koroshinadze, Nishit Anand, Zhifeng Kong, Siddharth Gururani, Sang-gil Lee, Jaehyeon Kim, Aya Aljafari, Chao-Han Huck Yang, Sungwon Kim, Ramani Duraiswami, Dinesh Manocha, Mohammad Shoeybi, Bryan Catanzaro, Ming-Yu Liu, Wei Ping

General AI

We present Audio Flamingo Next (AF-Next), the next-generation and most capable large audio-language model in the Audio Flamingo series, designed to advance understanding and reasoning over speech, environmental sounds and music. Compared to Audio Flamingo 3, AF-Next introduces: (i) a stronger foundational audio-languag…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.5

LightTune: Lightweight Forward-Only Online Fine-Tuning with Applications to Link Adaptation

2026-04-14 · Ramy E. Ali, Federico Penna

Research Track A

Deploying machine learning (ML) algorithms on mobile phones is bottlenecked by performance degradation under dynamic, real-world conditions that differ from the offline training conditions. While continual learning and adaptation are essential to mitigate this distributional shift, conventional online learning methods …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.5

GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification

2026-04-15 · Wangjie Gan, Miao Pan, Linbo Xi, Wenqi Zhang, Jintao Chen, Jianwei Yin, Xuhong Zhang

General AI

Large language models are typically post-trained using supervised fine-tuning (SFT) and reinforcement learning (RL), yet effectively unifying efficient knowledge injection with robust generalization remains challenging. In this work, we provide a training-dynamics analysis showing that SFT can be interpreted as a speci…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.5

Learning Adaptive Reasoning Paths for Efficient Visual Reasoning

2026-04-16 · Yixu Huang, Tinghui Zhu, Muhao Chen

General AI

Visual reasoning models (VRMs) have recently shown strong cross-modal reasoning capabilities by integrating visual perception with language reasoning. However, they often suffer from overthinking, producing unnecessarily long reasoning chains for any tasks. We attribute this issue to Reasoning Path Redundancy in visual…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.5

Switch-KD: Visual-Switch Knowledge Distillation for Vision-Language Models

2026-04-16 · Haoyi Sun, Xiaoxiao Wang, Ning Mao, Qian Wang, Lifu Mu, Wen Zheng, Tao Wei, Wei Chen

General AI

Vision-Language Models (VLMs) have shown remarkable capabilities in joint vision-language understanding, but their large scale poses significant challenges for deployment in resource-constrained scenarios. Knowledge Distillation (KD) offers a viable way to improve model capabilities without increasing model size or dat…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.5

Qwen3.5-Omni Technical Report

2026-04-17 · Qwen Team

General AI

In this work, we present Qwen3.5-Omni, the latest advancement in the Qwen-Omni model family. Representing a significant evolution over its predecessor, Qwen3.5-Omni scales to hundreds of billions of parameters and supports a 256k context length. By leveraging a massive dataset comprising heterogeneous text-vision pairs…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.5

Speculative Decoding for Autoregressive Video Generation

2026-04-19 · Yuezhou Hu, Jintao Zhang

General AI

Autoregressive video diffusion is emerging as a promising paradigm for streaming video synthesis, with step distillation serving as the primary means of accelerating inference. Whether speculative decoding, the dominant acceleration strategy for large language models, can be effectively adapted to autoregressive video …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.5

Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks

2026-04-20 · Rongyuan Tan, Jue Zhang, Zhuozhao Li, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang

General AI

Interpretability tools are increasingly used to analyze failures of Large Language Models (LLMs), yet prior work largely focuses on short prompts or toy settings, leaving their behavior on commonly used benchmarks underexplored. To address this gap, we study contrastive, LRP-based attribution as a practical tool for an…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.5

Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts

2026-04-21 · Chaitanya Dwivedi, Binxuan Huang, Himanshu Gupta, Pratik Jayarao, Neeraj Varshney, Bing Yin

General AI

Mixture-of-Experts (MoE) has become the dominant architecture for scaling large language models: frontier models routinely decouple total parameters from per-token computation through sparse expert routing. Scaling laws show that under fixed active computation, model quality scales predictably with total parameters, an…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.5

SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing

2026-04-21 · Ying Zeng, Miaosen Luo, Guangyuan Li, Yang Yang, Ruiyang Fan, Linxiao Shi, Qirui Yang, Jian Zhang, Chengcheng Liu, Siming Zheng, Jinwei Chen, Bo Li, Peng-Tao Jiang

General AI

Traditional photographic image editing typically requires users to possess sufficient aesthetic understanding to provide appropriate instructions for adjusting image quality and camera parameters. However, this paradigm relies on explicit human instruction of aesthetic intent, which is often ambiguous, incomplete, or i…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.5

Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets

2026-04-24 · Harshit Joshi, Priyank Shethia, Jadelynn Dao, Monica S. Lam

General AI

Real-world document question answering is challenging. Analysts must synthesize evidence across multiple documents and different parts of each document. However, any fixed LLM context window can be exceeded as document collections grow. A common workaround is to decompose documents into chunks and assemble answers from…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.5

Neural Recovery of Historical Lexical Structure in Bantu Languages from Modern Data

2026-04-24 · Hillary Mutisya, John Mugane

Research Track A · General AI

We investigate whether neural models trained exclusively on modern morphological data can recover cross-lingual lexical structure consistent with historical reconstruction. Using BantuMorph v7, a transformer over Bantu morphological paradigms, we analyze 14 Eastern and Southern Bantu languages, extract encoder embeddin…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.5

On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment

2026-05-12 · Bo Yin, Qi Li, Xinchao Wang

General AI

Tool-using LLM agents fail through trajectories rather than only final responses, as they may execute unsafe tool calls, follow injected instructions, comply with harmful requests, or over-refuse benign tasks despite producing a seemingly safe answer. Existing safety-alignment signals are largely response-level or off-…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.4

Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising

2026-04-29 · Jun Guo, Qiwei Li, Peiyan Li, Zilong Chen, Nan Sun, Yifei Su, Heyun Wang, Yuan Zhang, Xinghang Li, Huaping Liu

General AI

We propose X-WAM, a Unified 4D World Model that unifies real-time robotic action execution and high-fidelity 4D world synthesis (video + 3D reconstruction) in a single framework, addressing the critical limitations of prior unified world models (e.g., UWM) that only model 2D pixel-space and fail to balance action effic…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.4

Instruction-Guided Poetry Generation in Arabic and Its Dialects

2026-04-30 · Abdelrahman Sadallah, Kareem Elozeiri, Mervat Abassy, Rania Elbadry, Mohamed Anwar, Abed Alhakim Freihat, Preslav Nakov, Fajri Koto

General AI

Poetry has long been a central art form for Arabic speakers, serving as a powerful medium of expression and cultural identity. While modern Arabic speakers continue to value poetry, existing research on Arabic poetry within Large Language Models (LLMs) has primarily focused on analysis tasks such as interpretation or m…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.4

Code World Model Preparedness Report

2026-05-01 · Daniel Song, Peter Ney, Cristina Menghini, Faizan Ahmad, Aidan Boyd, Nathaniel Li, Ziwen Han, Jean-Christophe Testud, Saisuke Okabayashi, Maeve Ryan, Jinpeng Miao, Hamza Kwisaba, Felix Binder, Spencer Whitman, Jim Gust, Esteban Arcaute, Dhaval Kapil, Jacob Kahn, Ayaz Minhas, Tristan Goodman, Lauren Deason, Alexander Vaughan, Shengjia Zhao, Summer Yue

General AI

This report documents the preparedness assessment of Code World Model (CWM), a model for code generation and reasoning about code from Meta. We conducted pre-release testing across domains identified in our Frontier AI Framework as potentially presenting catastrophic risks, and also evaluated the model's misaligned pro…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.4

From Context to Skills: Can Language Models Learn from Context Skillfully?

2026-05-03 · Shuzheng Si, Haozhe Zhao, Yu Lei, Qingyi Wang, Dingwei Chen, Zhitong Wang, Zhenhailong Wang, Kangyang Luo, Zheng Wang, Gang Chen, Fanchao Qi, Minjia Zhang, Maosong Sun

General AI

Many real-world tasks require language models (LMs) to reason over complex contexts that exceed their parametric knowledge. This calls for context learning, where LMs directly learn relevant knowledge from the given context. An intuitive solution is inference-time skill augmentation: extracting the rules and procedures…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

The nextAI Solution to the NeurIPS 2023 LLM Efficiency Challenge

2026-04-10 · Gyuwon Park, DongIl Shin, SolGil Oh, SangGi Ryu, Byung-Hak Kim

General AI

The rapid evolution of Large Language Models (LLMs) has significantly impacted the field of natural language processing, but their growing complexity raises concerns about resource usage and transparency. Addressing these challenges, we participated in the NeurIPS LLM Efficiency Challenge, aiming to fine-tune a foundat…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

CLSGen: A Dual-Head Fine-Tuning Framework for Joint Probabilistic Classification and Verbalized Explanation

2026-04-13 · WonJin Yoon, Kangyu Zhu, Ian Bulovic, Autumn Sehy, Yanjun Gao, Dmitriy Dligach, Majid Afshar, Timothy A. Miller

Research Track A · General AI

With the recent progress of Large Language Models (LLMs), there is a growing interest in applying these models to solve complex and challenging problems. Modern LLMs, capable of processing long contexts and generating verbalized explanations, offer significant potential in addressing real-world applications. However, a…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

Disentangled Point Diffusion for Precise Object Placement

2026-04-13 · Lyuxing He, Eric Cai, Shobhit Aggarwal, Jianjun Wang, David Held

General AI

Recent advances in robotic manipulation have highlighted the effectiveness of learning from demonstration. However, while end-to-end policies excel in expressivity and flexibility, they struggle both in generalizing to novel object geometries and in attaining a high degree of precision. An alternative, object-centric a…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

SceneCritic: A Symbolic Evaluator for 3D Indoor Scene Synthesis

2026-04-14 · Kathakoli Sengupta, Kai Ao, Paola Cascante-Bonilla

General AI

Large Language Models (LLMs) and Vision-Language Models (VLMs) increasingly generate indoor scenes through intermediate structures such as layouts and scene graphs, yet evaluation still relies on LLM or VLM judges that score rendered views, making judgments sensitive to viewpoint, prompt phrasing, and hallucination. Wh…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

TLoRA+: A Low-Rank Parameter-Efficient Fine-Tuning Method for Large Language Models

2026-04-15 · Yarui Cao, Kai Liu

General AI

Fine-tuning large language models (LLMs) aims to adapt pre-trained models to specific tasks using relatively small and domain-specific datasets. Among Parameter-Efficient Fine-Tuning (PEFT) methods, Low-Rank Adaptation (LoRA) stands out by matching the performance of full fine-tuning while avoiding additional inference…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

CAVERS: Multimodal SLAM Data from a Natural Karstic Cave with Ground Truth Motion Capture

2026-04-16 · Giacomo Franchini, David Rodríguez-Martínez, Alfonso Martínez-Petersen, C. J. Pérez-del-Pulgar, Marcello Chiaberge

General AI

Autonomous robots operating in natural karstic caves face perception and navigation challenges that are qualitatively distinct from those encountered in mines or tunnels: irregular geometry, reflective wet surfaces, near-zero ambient light, and complex branching passages. Yet publicly available datasets targeting this …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

Generalization in LLM Problem Solving: The Case of the Shortest Path

2026-04-16 · Yao Tong, Jiayuan Ye, Anastasia Borovykh, Reza Shokri

General AI

Whether language models can systematically generalize remains actively debated. Yet empirical performance is jointly shaped by multiple factors such as training data, training paradigms, and inference-time strategies, making failures difficult to interpret. We introduce a controlled synthetic environment based on short…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

Prism: Symbolic Superoptimization of Tensor Programs

2026-04-16 · Mengdi Wu, Xiaoyu Jiang, Oded Padon, Zhihao Jia

General AI

This paper presents Prism, the first symbolic superoptimizer for tensor programs. The key idea is sGraph, a symbolic, hierarchical representation that compactly encodes large classes of tensor programs by symbolically representing some execution parameters. Prism organizes optimization as a two-level search: it constru…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

Think in Latent Thoughts: A New Paradigm for Gloss-Free Sign Language Translation

2026-04-16 · Yiyang Jiang, Li Zhang, Xiao-Yong Wei, Li Qing

General AI

Many SLT systems quietly assume that brief chunks of signing map directly to spoken-language words. That assumption breaks down because signers often create meaning on the fly using context, space, and movement. We revisit SLT and argue that it is mainly a cross-modal reasoning task, not just a straightforward video-to…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

Joint-Centric Dual Contrastive Alignment with Structure-Preserving and Information-Balanced Regularization

2026-04-17 · Habibeh Naderi, Behrouz Haji Soleimani, Stan Matwin

General AI

We propose HILBERT (HIerarchical Long-sequence Balanced Embedding with Reciprocal contrastive Training), a cross-attentive multimodal framework for learning document-level audio-text representations from long, segmented sequences in low-resource data settings. HILBERT leverages frozen pre-trained speech and language en…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

VEFX-Bench: A Holistic Benchmark for Generic Video Editing and Visual Effects

2026-04-17 · Xiangbo Gao, Sicong Jiang, Bangya Liu, Xinghao Chen, Minglai Yang, Siyuan Yang, Mingyang Wu, Jiongze Yu, Qi Zheng, Haozhi Wang, Jiayi Zhang, Jared Yang, Jie Yang, Zihan Wang, Qing Yin, Zhengzhong Tu

General AI

As AI-assisted video creation becomes increasingly practical, instruction-guided video editing has become essential for refining generated or captured footage to meet professional requirements. Yet the field still lacks both a large-scale human-annotated dataset with complete editing examples and a standardized evaluat…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

Agentic Forecasting using Sequential Bayesian Updating of Linguistic Beliefs

2026-04-20 · Kevin Murphy

General AI

We present BLF (Bayesian Linguistic Forecaster), an agentic system for binary forecasting that achieves state-of-the-art performance on the ForecastBench benchmark. The system is built on three ideas. (1) A Bayesian linguistic belief state: a semi-structured representation combining numerical probability estimates with…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

Learning Invariant Modality Representation for Robust Multimodal Learning from a Causal Inference Perspective

2026-04-20 · Sijie Mai, Shiqin Han

General AI

Multimodal affective computing aims to predict humans' sentiment, emotion, intention, and opinion using language, acoustic, and visual modalities. However, current models often learn spurious correlations that harm generalization under distribution shifts or noisy modalities. To address this, we propose a causal modali…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

RDP LoRA: Geometry-Driven Identification for Parameter-Efficient Adaptation in Large Language Models

2026-04-21 · Yusuf Çelebi, Yağız Asker, Özay Ezerceli, Mahmoud ElHussieni, Selva Taş, Reyhan Bayraktar, Fatma Betül Terzioğlu

General AI

Fine-tuning Large Language Models (LLMs) remains structurally uncertain despite parameter-efficient methods such as Low-Rank Adaptation (LoRA), as the layer-specific roles of internal representations are poorly understood, leading to heuristic decisions about where adaptation should be applied. We model the evolution o…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

SpanVLA: Efficient Action Bridging and Learning from Negative-Recovery Samples for Vision-Language-Action Model

2026-04-21 · Zewei Zhou, Ruining Yang, Xuewei, Qi, Yiluan Guo, Sherry X. Chen, Tao Feng, Kateryna Pistunova, Yishan Shen, Lili Su, Jiaqi Ma

General AI

Vision-Language-Action (VLA) models offer a promising autonomous driving paradigm for leveraging world knowledge and reasoning capabilities, especially in long-tail scenarios. However, existing VLA models often struggle with the high latency in action generation using an autoregressive generation framework and exhibit …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

Can "AI" Be a Doctor? A Study of Empathy, Readability, and Alignment in Clinical LLMs

2026-04-22 · Mariano Barone, Francesco Di Serio, Roberto Moio, Marco Postiglione, Giuseppe Riccio, Antonio Romano, Vincenzo Moscato

General AI

Large Language Models (LLMs) are increasingly deployed in healthcare, yet their communicative alignment with clinical standards remains insufficiently quantified. We conduct a multidimensional evaluation of general-purpose and domain-specialized LLMs across structured medical explanations and real-world physician-patie…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation

2026-04-22 · Hyeonwoo Kim, Jeonghwan Kim, Kyungwon Cho, Hanbyul Joo

General AI

Recent advances in video generative models enable the synthesis of realistic human-object interaction videos across a wide range of scenarios and object categories, including complex dexterous manipulations that are difficult to capture with motion capture systems. While the rich interaction knowledge embedded in these…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

SWE-chat: Coding Agent Interactions From Real Users in the Wild

2026-04-22 · Joachim Baumann, Vishakh Padmakumar, Xiang Li, John Yang, Diyi Yang, Sanmi Koyejo

General AI

AI coding agents are being adopted at scale, yet we lack empirical evidence on how people actually use them and how much of their output is useful in practice. We present SWE-chat, the first large-scale dataset of real coding agent sessions collected from open-source developers in the wild. The dataset currently contai…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

Stream-CQSA: Avoiding Out-of-Memory in Attention Computation via Flexible Workload Scheduling

2026-04-22 · Yiming Bian, Joshua M. Akey

General AI

The scalability of long-context large language models is fundamentally limited by the quadratic memory cost of exact self-attention, which often leads to out-of-memory (OOM) failures on modern hardware. Existing methods improve memory efficiency to near-linear complexity, while assuming that the full query, key, and va…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

OptiMat Alloys: A FAIR End-to-End Agent with Living Database for Computational Multi-Principal Alloy Exploration

2026-04-23 · Yang Hu, Vladyslav Turlo

General AI

The FAIR principles have transformed how computational data and workflows are shared in materials research, yet existing repositories can only serve pre-computed entries -- broad coverage is perpetually incomplete and cannot adapt to new questions on demand. To address these challenges, we present OptiMat Alloys, a lar…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

SpikingBrain2.0: Brain-Inspired Foundation Models for Efficient Long-Context and Cross-Platform Inference

2026-04-24 · Yuqi Pan, Jinghao Zhuang, Yupeng Feng, Fangzhi Zhong, Siyu Ding, Xuerui Qiu, Shaowei Gu, Bohan Sun, Zhiyong Qin, Yibo Zhong, Lingtao Ouyang, Kun Yang, Zehao Liu, Yuhong Chou, Shurong Wang, Anjie Hu, Han Xu, Bo Xu, Guoqi Li

General AI

Scaling context length is reshaping large-model development, yet full-attention Transformers suffer from prohibitive computation and inference bottlenecks at long sequences. A key challenge is to design foundation models that maintain performance and long-context efficiency with minimal training overhead. We introduce …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

Thinking Without Words: Efficient Latent Reasoning with Abstract Chain-of-Thought

2026-04-24 · Keshav Ramji, Tahira Naseem, Ramón Fernandez Astudillo

General AI

While long, explicit chains-of-thought (CoT) have proven effective on complex reasoning tasks, they are costly to generate during inference. Non-verbal reasoning methods have emerged with shorter generation lengths by leveraging continuous representations, yet their performance lags behind verbalized CoT. We propose $\…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

Case-Specific Rubrics for Clinical AI Evaluation: Methodology, Validation, and LLM-Clinician Agreement Across 823 Encounters

2026-04-27 · Aaryan Shah, Andrew Hines, Alexia Downs, Denis Bajet, Paulius Mui, Fabiano Araujo, Laura Offutt, Aida Rutledge, Elizabeth Jimenez

General AI

Objective. Clinical AI documentation systems require evaluation methodologies that are clinically valid, economically viable, and sensitive to iterative changes. Methods requiring expert review per scoring instance are too slow and expensive for safe, iterative deployment. We present a case-specific, clinician-authored…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

Contextual Linear Activation Steering of Language Models

2026-04-27 · Brandon Hsu, Daniel Beaglehole, Adityanarayanan Radhakrishnan, Mikhail Belkin

General AI

Linear activation steering is a powerful approach for eliciting the capabilities of large language models and specializing their behavior using limited labeled data. While effective, existing methods often apply a fixed steering strength to all tokens, resulting in inconsistent steering quality across diverse input pro…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

Majorization-Guided Test-Time Adaptation for Vision-Language Models under Modality-Specific Shift

2026-04-27 · Lixian Chen, Mingxuan Huang, Yanhui Chen, Junyi Lin, Yang Shi

General AI

Vision-language models transfer well in zero-shot settings, but at deployment the visual and textual branches often shift asymmetrically. Under this condition, entropy-based test-time adaptation can sharpen the fused posterior while increasing error, because an unreliable modality may still dominate fusion. We study th…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

2026-04-27 · Zhiheng Liu, Weiming Ren, Xiaoke Huang, Shoufa Chen, Tianhong Li, Mengzhao Chen, Yatai Ji, Sen He, Jonas Schult, Belinda Zeng, Tao Xiang, Wenhu Chen, Ping Luo, Luke Zettlemoyer, Yuren Cong

General AI

Unified multimodal models typically rely on pretrained vision encoders and use separate visual representations for understanding and generation, creating misalignment between the two tasks and preventing fully end-to-end optimization from raw pixels. We introduce Tuna-2, a native unified multimodal model that performs …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

Efficient and Adaptive Human Activity Recognition via LLM Backbones

2026-05-12 · Aleksandr Bredikhin, Philippe Lalanda, German Vega

General AI

Human Activity Recognition (HAR) is a core task in pervasive computing systems, where models must operate under strict computational constraints while remaining robust to heterogeneous and evolving deployment conditions. Recent advances based on Transformer architectures have significantly improved recognition performa…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

2026-05-12 · Kexuan Shi, Hanxuan Li, Zeju Qiu, Yandong Wen, Simon Buchholz, Weiyang Liu

General AI

We introduce Pion, a spectrum-preserving optimizer for large language model (LLM) training based on orthogonal equivalence transformation. Unlike additive optimizers such as Adam and Muon, Pion updates each weight matrix through left and right orthogonal transformations, preserving its singular values throughout traini…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

Predicting Disagreement with Human Raters in LLM-as-a-Judge Difficulty Assessment without Using Generation-Time Probability Signals

2026-05-12 · Yo Ehara

General AI

Automatic generation of educational materials using large language models (LLMs) is becoming increasingly common, but assigning difficulty levels to such materials still requires substantial human effort. LLM-as-a-Judge has therefore attracted attention, yet disagreement with human raters remains a major challenge. We …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

Task-Adaptive Embedding Refinement via Test-time LLM Guidance

2026-05-12 · Ariel Gera, Shir Ashury-Tahan, Gal Bloch, Ohad Eytan, Assaf Toledo

General AI

We explore the effectiveness of an LLM-guided query refinement paradigm for extending the usability of embedding models to challenging zero-shot search and classification tasks. Our approach refines the embedding representation of a user query using feedback from a generative LLM on a small set of documents, enabling e…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

U-STS-LLM A Unified Spatio-Temporal Steered Large Language Model for Traffic Prediction and Imputation

2026-05-12 · Yichen Zhang, Jun Li

General AI

The efficient operation of modern cellular networks hinges on the accurate analysis of spatio-temporal traffic data. Mastering these patterns is essential for core network functions, chiefly forecasting future load to pre-empt congestion and imputing missing values caused by sensor failures or transmission errors to en…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.2

Characterizing the Consistency of the Emergent Misalignment Persona

2026-04-30 · Anietta Weckauff, Yuchen Zhang, Maksym Andriushchenko

General AI

Fine-tuning large language models (LLMs) on narrowly misaligned data generalizes to broadly misaligned behavior, a phenomenon termed emergent misalignment (EM). While prior work has found a correlation between harmful behavior and self-assessment in emergently misaligned models, it remains unclear how consistent this c…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.2

Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists

2026-04-30 · Yujun Wu, Dongxu Zhang, Xinchen Li, Jinhang Xu, Yiling Duan, Yumou Liu, Jiabao Pan, Xuanhe Zhou, Jingxuan Wei, Siyuan Li, Jintao Chen, Conghui He, Cheng Tan

General AI

Existing research infrastructure is fundamentally document-centric, providing citation links between papers but lacking explicit representations of methodological evolution. In particular, it does not capture the structured relationships that explain how and why research methods emerge, adapt, and build upon one anothe…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.2

MIFair: A Mutual-Information Framework for Intersectionality and Multiclass Fairness

2026-04-30 · Jeanne Monnier, Thomas George, Frédéric Guyard, Christèle Tarnec, Marios Kountouris

General AI

Fairness in machine learning remains challenging due to its ethical complexity, the absence of a universal definition, and the need for context-specific bias metrics. Existing methods still struggle with intersectionality, multiclass settings, and limited flexibility and generality. To address these gaps, we introduce …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.2

Multisensory learning recruits visual neurons into an olfactory memory engram

2026-04-30 · Zeynep Okray, Nils Otto, Anna A. Cook, Clifford Talbot, Ashwin Miriyala, Martín Klappenbach, Ciara Stern, Kieran Desmond, Paola Vargas-Gutierrez, Scott Waddell

General AI

Associating multiple sensory cues with a single experience or object is a fundamental process that improves object recognition and memory performance. However, neural mechanisms that bind sensory features during learning and augment memory expression are unknown. Here we demonstrate multisensory appetitive and aversive…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.2

On the Proper Treatment of Units in Surprisal Theory

2026-04-30 · Samuel Kiegeland, Vésteinn Snæbjarnarson, Tim Vieira, Ryan Cotterell

General AI

Surprisal theory links human processing effort to the predictability of an upcoming linguistic unit, but empirical work often leaves the notion of a unit underspecified. In practice, experimental stimuli are segmented into linguistically motivated units (e.g., words), while pretrained language models assign probability…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.2

Modeling Subjective Urban Perception with Human Gaze

2026-05-01 · Lin Che, Xi Wang, Marc Pollefeys, Konrad Schindler, Martin Raubal, Peter Kiefer

General AI

Urban perception describes how people subjectively evaluate urban environments, shaping how cities are experienced and understood. Existing computational approaches primarily model urban perception directly from street view images, but largely ignore the human perceptual process through which such judgments are formed.…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.2

When RAG Chatbots Expose Their Backend: An Anonymized Case Study of Privacy and Security Risks in Patient-Facing Medical AI

2026-05-01 · Alfredo Madrid-García, Miguel Rujas

General AI

Background: Patient-facing medical chatbots based on retrieval-augmented generation (RAG) are increasingly promoted to deliver accessible, grounded health information. AI-assisted development lowers the barrier to building them, but they still demand rigorous security, privacy, and governance controls. Objective: To re…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.2

SCALE-LoRA: Auditing Post-Retrieval LoRA Composition with Residual Merging and View Reliability

2026-05-02 · Shuaipeng Zhou, Yu Zhang

General AI

Libraries of Low-Rank Adaptation (LoRA) adapters are becoming a practical by-product of parameter-efficient adaptation. Once such adapters accumulate, a natural question is no longer how to train one adapter for one task, but how to reuse an open pool of adapters for a new task given only a small support set. Prior wor…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.2

FunFuzz: An LLM-Powered Evolutionary Fuzzing Framework

2026-05-04 · Mario Rodríguez Béjar, B. Romera-Paredes, Jose L. Hernández-Ramos

General AI

Modern fuzzers increasingly use Large Language Models (LLMs) to generate structured inputs, but LLM-driven fuzzing is sensitive to prompt initialization and sampling variance, which can reduce exploration efficiency and lead to redundant inputs. We present FunFuzz, a multi-island evolutionary fuzzing framework that run…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.2

SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection

2026-05-04 · Shikhar Shukla

General AI

Speculative decoding accelerates large language model (LLM) inference by using a small draft model to propose candidate tokens that a larger target model verifies. A critical hyperparameter in this process is the speculation length~$γ$, which determines how many tokens the draft model proposes per step. Nearly all exis…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.0

Make Geometry Matter for Spatial Reasoning

2026-03-27 · Shihua Zhang, Qiuhong Shen, Shizun Wang, Tianbo Pan, Xinchao Wang

General AI

Empowered by large-scale training, vision-language models (VLMs) achieve strong image and video understanding, yet their ability to perform spatial reasoning in both static scenes and dynamic videos remains limited. Recent advances try to handle this limitation by injecting geometry tokens from pretrained 3D foundation…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.0

MMFace-DiT: A Dual-Stream Diffusion Transformer for High-Fidelity Multimodal Face Generation

2026-03-30 · Bharath Krishnamurthy, Ajita Rattani

General AI

Recent multimodal face generation models address the spatial control limitations of text-to-image diffusion models by augmenting text-based conditioning with spatial priors such as segmentation masks, sketches, or edge maps. This multimodal fusion enables controllable synthesis aligned with both high-level semantic int…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.0

Therefore I am. I Think

2026-04-02 · Esakkivel Esakkiraja, Sai Rajeswar, Denis Akhiyarov, Rajagopal Venkatesaramani

General AI

We consider the question: when a large language reasoning model makes a choice, did it think first and then decide to, or decide first and then think? In this paper, we present evidence that detectable, early-encoded decisions shape chain-of-thought in reasoning models. Specifically, we show that a simple linear probe …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.0

AURA: Always-On Understanding and Real-Time Assistance via Video Streams

2026-04-05 · Xudong Lu, Yang Bo, Jinpeng Chen, Shuhan Li, Xintong Guo, Huankang Guan, Fang Liu, Dunyuan Xu, Peiwen Sun, Heyang Sun, Rui Liu, Hongsheng Li

General AI

Video Large Language Models (VideoLLMs) have achieved strong performance on many video understanding tasks, but most existing systems remain offline and are not well-suited for live video streams that require continuous observation and timely response. Recent streaming VideoLLMs have made progress, yet current approach…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.0

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

2026-04-06 · Zhengqing Yuan, Hanchi Sun, Lichao Sun, Yanfang Ye

General AI

We present MegaTrain, a memory-centric system that efficiently trains 100B+ parameter large language models at full precision on a single GPU. Unlike traditional GPU-centric systems, MegaTrain stores parameters and optimizer states in host memory (CPU memory) and treats GPUs as transient compute engines. For each layer…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.0

Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

2026-04-08 · Qihan Ren, Peng Wang, Ruikun Cai, Shuai Shao, Dadi Guo, Yuejin Xie, Yafu Li, Quanshi Zhang, Xia Hu, Jing Shao, Dongrui Liu

General AI

A prevailing narrative in LLM post-training holds that supervised finetuning (SFT) memorizes while reinforcement learning (RL) generalizes. We revisit this claim for reasoning SFT with long chain-of-thought (CoT) supervision and find that cross-domain generalization is not absent but conditional, jointly shaped by opti…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.0

The Last Harness You'll Ever Build

2026-04-22 · Haebin Seong, Li Yin, Haoran Zhang

General AI

AI agents are increasingly deployed on complex, domain-specific workflows -- navigating enterprise web applications that require dozens of clicks and form fills, orchestrating multi-step research pipelines that span search, extraction, and synthesis, automating code review across unfamiliar repositories, and handling c…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.0

V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think

2026-04-25 · Bingda Tang, Yuhui Zhang, Xiaohan Wang, Jiayuan Mao, Ludwig Schmidt, Serena Yeung-Levy

General AI

Aligning denoising generative models with human preferences or verifiable rewards remains a key challenge. While policy-gradient online reinforcement learning (RL) offers a principled post-training framework, its direct application is hindered by the intractable likelihoods of these models. Prior work therefore either …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.0

Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling

2026-04-26 · Zhen Ye, Xu Tan, Aoxiong Yin, Hongzhan Lin, Guangyan Zhang, Peiwen Sun, Yiming Li, Chi-Min Chan, Wei Ye, Shikun Zhang, Wei Xue

General AI

Joint audio-video generation models have shown that unified generation yields stronger cross-modal coherence than cascaded approaches. However, existing models couple modalities throughout denoising via pervasive attention, treating high-level semantics and low-level details in a fully entangled manner. This is subopti…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.0

A Survey on LLM-based Conversational User Simulation

2026-04-27 · Bo Ni, Leyao Wang, Yu Wang, Branislav Kveton, Franck Dernoncourt, Yu Xia, Hongjie Chen, Reuben Leura, Samyadeep Basu, Subhojyoti Mukherjee, Puneet Mathur, Nesreen Ahmed, Junda Wu, Li Li, Huixin Zhang, Ruiyi Zhang, Tong Yu, Sungchul Kim, Jiuxiang Gu, Zhengzhong Tu, Alexa Siu, Zichao Wang, David Seunghyun Yoon, Nedim Lipka, Namyong Park, Zihao Lin, Trung Bui, Yue Zhao, Tyler Derr, Ryan A. Rossi

General AI

User simulation has long played a vital role in computer science due to its potential to support a wide range of applications. Language, as the primary medium of human communication, forms the foundation of social interaction and behavior. Consequently, simulating conversational behavior has become a key area of study.…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.0

A Systematic Post-Train Framework for Video Generation

2026-04-28 · Zeyue Xue, Siming Fu, Jie Huang, Shuai Lu, Haoran Li, Yijun Liu, Yuming Li, Xiaoxuan He, Mengzhao Chen, Haoyang Huang, Nan Duan, Ping Luo

General AI

While large-scale video diffusion models have demonstrated impressive capabilities in generating high-resolution and semantically rich content, a significant gap remains between their pretraining performance and real-world deployment requirements due to critical issues such as prompt sensitivity, temporal inconsistency…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.0

Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models

2026-04-28 · Jiayi Guo, Linqing Wang, Jiangshan Wang, Yang Yue, Zeyu Liu, Zhiyuan Zhao, Qinglin Lu, Gao Huang, Chunyu Wang

General AI

Unified multimodal models (UMMs) integrate visual understanding and generation within a single framework. For text-to-image (T2I) tasks, this unified capability allows UMMs to refine outputs after their initial generation, potentially extending the performance upper bound. Current UMM-based refinement methods primarily…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.0

KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

2026-05-06 · Han Wang, Jintao Zhang, Kai Jiang, Haoxu Wang, Jianfei Chen, Jun Zhu

General AI

LLM-based Triton kernel generation has attracted significant interest, yet a fundamental empirical question remains unanswered: where does this capability break down, and why? We present KernelBench-X, a benchmark designed to answer this question through category-aware evaluation of correctness and hardware efficiency …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.0

PianoCoRe: Combined and Refined Piano MIDI Dataset

2026-05-07 · Ilya Borovik

General AI

Symbolic music datasets with matched scores and performances are essential for many music information retrieval (MIR) tasks. Yet, existing resources often cover a narrow range of composers, lack performance variety, omit note-level alignments, or use inconsistent naming formats. This work presents PianoCoRe, a large-sc…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.0

Sparkle: Realizing Lively Instruction-Guided Video Background Replacement via Decoupled Guidance

2026-05-07 · Ziyun Zeng, Yiqi Lin, Guoqiang Liang, Mike Zheng Shou

General AI

In recent years, open-source efforts like Senorita-2M have propelled video editing toward natural language instruction. However, current publicly available datasets predominantly focus on local editing or style transfer, which largely preserve the original scene structure and are easier to scale. In contrast, Backgroun…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference

2026-03-26 · Xiaofeng Mao, Shaohao Rui, Kaining Ying, Bo Zheng, Chuanhao Li, Mingmin Chi, Kaipeng Zhang

General AI

Autoregressive video diffusion models have demonstrated remarkable progress, yet they remain bottlenecked by intractable linear KV-cache growth, temporal repetition, and compounding errors during long-video generation. To address these challenges, we present PackForcing, a unified framework that efficiently manages the…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

Aligned, Orthogonal or In-conflict: When can we safely optimize Chain-of-Thought?

2026-03-31 · Max Kaufmann, David Lindner, Roland S. Zimmermann, and Rohin Shah

General AI

Chain-of-Thought (CoT) monitoring, in which automated systems monitor the CoT of an LLM, is a promising approach for effectively overseeing AI systems. However, the extent to which a model's CoT helps us oversee the model - the monitorability of the CoT - can be affected by training, for instance by the model learning …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

Extending MONA in Camera Dropbox: Reproduction, Learned Approval, and Design Implications for Reward-Hacking Mitigation

2026-03-31 · Nathan Heath

General AI

Myopic Optimization with Non-myopic Approval (MONA) mitigates multi-step reward hacking by restricting the agent's planning horizon while supplying far-sighted approval as a training signal~\cite{farquhar2025mona}. The original paper identifies a critical open question: how the method of constructing approval -- partic…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

Phyelds: A Pythonic Framework for Aggregate Computing

2026-03-31 · Gianluca Aguzzi, Davide Domini, Nicolas Farabegoli, Mirko Viroli

General AI

Aggregate programming is a field-based coordination paradigm with over a decade of exploration and successful applications across domains including sensor networks, robotics, and IoT, with implementations in various programming languages, such as Protelis, ScaFi (Scala), and FCPP (C++). A recent research direction inte…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

Quantifying Cross-Modal Interactions in Multimodal Glioma Survival Prediction via InterSHAP: Evidence for Additive Signal Integration

2026-03-31 · Iain Swift, JingHua Ye, Ruairi O'Reilly

General AI

Multimodal deep learning for cancer prognosis is commonly assumed to benefit from synergistic cross-modal interactions, yet this assumption has not been directly tested in survival prediction settings. This work adapts InterSHAP, a Shapley interaction index-based metric, from classification to Cox proportional hazards …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

RAAP: Retrieval-Augmented Affordance Prediction with Cross-Image Action Alignment

2026-03-31 · Qiyuan Zhuang, He-Yang Xu, Yijun Wang, Xin-Yang Zhao, Yang-Yang Li, Xiu-Shen Wei

General AI

Understanding object affordances is essential for enabling robots to perform purposeful and fine-grained interactions in diverse and unstructured environments. However, existing approaches either rely on retrieval, which is fragile due to sparsity and coverage gaps, or on large-scale models, which frequently mislocaliz…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

Reward-Based Online LLM Routing via NeuralUCB

2026-03-31 · Ming-Hua Tsai, Phat Tran

General AI

This study investigates the use of NeuralUCB for cost-aware large language model (LLM) routing. Existing routing approaches can be broadly grouped into supervised routing methods and partial-feedback methods, each with different tradeoffs in efficiency and adaptivity. We implement a NeuralUCB-based routing policy and e…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

Trimodal Deep Learning for Glioma Survival Prediction: A Feasibility Study Integrating Histopathology, Gene Expression, and MRI

2026-03-31 · Iain Swift, JingHua Ye

General AI

Multimodal deep learning has improved prognostic accuracy for brain tumours by integrating histopathology and genomic data, yet the contribution of volumetric MRI within unified survival frameworks remains unexplored. This pilot study extends a bimodal framework by incorporating Fluid Attenuated Inversion Recovery (FLA…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

Best-Arm Identification with Noisy Actuation

2026-04-02 · Merve Karakas, Osama Hanna, Lin F. Yang, Christina Fragouli

General AI

In this paper, we consider a multi-armed bandit (MAB) instance and study how to identify the best arm when arm commands are conveyed from a central learner to a distributed agent over a discrete memoryless channel (DMC). Depending on the agent capabilities, we provide communication schemes along with their analysis, wh…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

Grounded Token Initialization for New Vocabulary in LMs for Generative Recommendation

2026-04-02 · Daiwei Chen, Zhoutong Fu, Chengming Jiang, Haichao Zhang, Ran Zhou, Tan Wang, Chunnan Yao, Guoyao Li, Rui Cai, Yihan Cao, Ruijie Jiang, Fedor Borisyuk, Jianqiang Shen, Jingwei Wu, Ramya Korlakai Vinayak

General AI

Language models (LMs) are increasingly extended with new learnable vocabulary tokens for domain-specific tasks, such as Semantic-ID tokens in generative recommendation. The standard practice initializes these new tokens as the mean of existing vocabulary embeddings, then relies on supervised fine-tuning to learn their …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

Modulate-and-Map: Crossmodal Feature Mapping with Cross-View Modulation for 3D Anomaly Detection

2026-04-02 · Alex Costanzino, Pierluigi Zama Ramirez, Giuseppe Lisanti, Luigi Di Stefano

General AI

We present ModMap, a natively multiview and multimodal framework for 3D anomaly detection and segmentation. Unlike existing methods that process views independently, our method draws inspiration from the crossmodal feature mapping paradigm to learn to map features across both modalities and views, while explicitly mode…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

How AI Aggregation Affects Knowledge

2026-04-06 · Daron Acemoglu, Tianyi Lin, Asuman Ozdaglar, James Siderius

General AI

Artificial intelligence (AI) changes social learning when aggregated outputs become training data for future predictions. To study this, we extend the DeGroot model by introducing an AI aggregator that trains on population beliefs and feeds synthesized signals back to agents. We define the learning gap as the deviation…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

Stratifying Reinforcement Learning with Signal Temporal Logic

2026-04-06 · Justin Curry, Alberto Speranzon

General AI

In this paper, we develop a stratification-based semantics for Signal Temporal Logic (STL) in which each atomic predicate is interpreted as a membership test in a stratified space. This perspective reveals a novel correspondence principle between stratification theory and STL, showing that most STL formulas can be view…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

ALTO: Adaptive LoRA Tuning and Orchestration for Heterogeneous LoRA Training Workloads

2026-04-07 · Jingwei Zuo, Xinze Feng, Zien Liu, Kaijian Wang, Fanjiang Ye, Ye Cao, Zhuang Wang, Yuke Wang

General AI

Low-Rank Adaptation (LoRA) is now the dominant method for parameter-efficient fine-tuning of large language models, but achieving a high-quality adapter often requires systematic hyperparameter tuning because LoRA performance is highly sensitive to configuration choices. In practice, this leads to many concurrent LoRA …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

CrowdVLA: Embodied Vision-Language-Action Agents for Context-Aware Crowd Simulation

2026-04-07 · Juyeong Hwang, Seong-Eun Hong, Jinhyun Kim, JaeYoung Seon, Giljoo Nam, Hanyoung Jang, HyeongYeop Kang

General AI

Crowds do not merely move; they decide. Human navigation is inherently contextual: people interpret the meaning of space, social norms, and potential consequences before acting. Sidewalks invite walking, crosswalks invite crossing, and deviations are weighed against urgency and safety. Yet most crowd simulation methods…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

Mixture-of-Modality-Experts with Holistic Token Learning for Fine-Grained Multimodal Visual Analytics in Driver Action Recognition

2026-04-07 · Tianyi Liu, Yiming Li, Wenqian Wang, Jiaojiao Wang, Chen Cai, Yi Wang, Kim-Hui Yap

General AI

Robust multimodal visual analytics remains challenging when heterogeneous modalities provide complementary but input-dependent evidence for decision-making.Existing multimodal learning methods mainly rely on fixed fusion modules or predefined cross-modal interactions, which are often insufficient to adapt to changing m…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

Density-Driven Optimal Control: Convergence Guarantees for Stochastic LTI Multi-Agent Systems

2026-04-09 · Kooktae Lee

General AI

This paper addresses the decentralized non-uniform area coverage problem for multi-agent systems, a critical task in missions with high spatial priority and resource constraints. While existing density-based methods often rely on computationally heavy Eulerian PDE solvers or heuristic planning, we propose Stochastic De…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

UniversalVTG: A Universal and Lightweight Foundation Model for Video Temporal Grounding

2026-04-09 · Joungbin An, Agrim Jain, Kristen Grauman

General AI

Video temporal grounding (VTG) is typically tackled with dataset-specific models that transfer poorly across domains and query styles. Recent efforts to overcome this limitation have adapted large multimodal language models (MLLMs) to VTG, but their high compute cost and limited video context still hinder long-video gr…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

What They Saw, Not Just Where They Looked: Semantic Scanpath Similarity via VLMs and NLP metric

2026-04-09 · Mohamed Amine Kerkouri, Marouane Tliba, Bin Wang, Aladine Chetouani, Ulas Bagci, Alessandro Bruno

General AI

Scanpath similarity metrics are central to eye-movement research, yet existing methods predominantly evaluate spatial and temporal alignment while neglecting semantic equivalence between attended image regions. We present a semantic scanpath similarity framework that integrates vision-language models (VLMs) into eye-tr…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

The Nonverbal Syntax Framework: An Evidence-Based Tiered System for Inferring Learner States from Observable Behavioral Cues

2026-04-28 · Sherzod Turaev, Mary John, Jaloliddin Rustamov, Zahiriddin Rustamov, Saja Aldabet, Nazar Zaki, Khaled Shuaib

General AI

Understanding learners' cognitive and affective states underpins adaptive educational systems and effective teaching. Although research links nonverbal cues to internal states, no framework calibrates them to evidence. We present the Nonverbal Syntax Framework, drawn from a systematic review of 908 studies and 17,043 c…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels

2026-05-07 · Sushant Gautam, Finn Schwall, Annika Willoch Olstad, Fernando Vallecillos Ruiz, Birk Torpmann-Hagen, Sunniva Maria Stordal Bjørklund, Leon Moonen, Klas Pettersen, Michael A. Riegler

General AI

Many deployments must compare candidate language models for safety before a labeled benchmark exists for the relevant language, sector, or regulatory regime. We formalize this setting as benchmarkless comparative safety scoring and specify the contract under which a scenario-based audit can be interpreted as deployment…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

Why Global LLM Leaderboards Are Misleading: Small Portfolios for Heterogeneous Supervised ML

2026-05-07 · Jai Moondra, Ayela Chughtai, Bhargavi Lanka, Swati Gupta

General AI

Ranking LLMs via pairwise human feedback underpins current leaderboards for open-ended tasks, such as creative writing and problem-solving. We analyze ~89K comparisons in 116 languages from 52 LLMs from Arena, and show that the best-fit global Bradley-Terry (BT) ranking is misleading. Nearly 2/3 of the decisive votes c…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 7.5

Introspective Diffusion Language Models

2026-04-13 · Yifan Yu, Yuqing Jian, Junxiong Wang, Zhongzhu Zhou, Donglin Zhuang, Xinyu Fang, Sri Yanamandra, Xiaoxia Wu, Qingyang Wu, Shuaiwen Leon Song, Tri Dao, Ben Athiwaratkun, James Zou, Fan Lai, Chenfeng Xu

General AI

Diffusion language models promise parallel generation, yet still lag behind autoregressive (AR) models in quality. We stem this gap to a failure of introspective consistency: AR models agree with their own generations, while DLMs often do not. We define the introspective acceptance rate, which measures whether a model …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 7.5

Representations Before Pixels: Semantics-Guided Hierarchical Video Prediction

2026-04-13 · Efstathios Karypidis, Spyros Gidaris, Nikos Komodakis

General AI

Accurate future video prediction requires both high visual fidelity and consistent scene semantics, particularly in complex dynamic environments such as autonomous driving. We present Re2Pix, a hierarchical video prediction framework that decomposes forecasting into two stages: semantic representation prediction and re…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.5

Statehood Without Capacity

2026-04-13 · Rok Spruk

Research Track A

This paper develops a political-economy theory of statehood without capacity. I argue that under specific institutional and geopolitical conditions, a polity can become trapped in an equilibrium of nominal statehood: a state in which claims to sovereignty, external recognition, and symbolic legitimacy persist or even s…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 7.5

C2: Scalable Rubric-Augmented Reward Modeling from Binary Preferences

2026-04-15 · Akira Kawabata, Saku Sugawara

General AI

Rubric-augmented verification guides reward models with explicit evaluation criteria, yielding more reliable judgments than single-model verification. However, most existing methods require costly rubric annotations, limiting scalability. Moreover, we find that rubric generation is vulnerable to a failure of cooperatio…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 7.5

Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning

2026-04-17 · Jiaxi Bi, Tongxu Luo, Wenyu Du, Zhengyang Tang, Benyou Wang

General AI

Parallel reasoning enhances Large Reasoning Models (LRMs) but incurs prohibitive costs due to futile paths caused by early errors. To mitigate this, path pruning at the prefix level is essential, yet existing research remains fragmented without a standardized framework. In this work, we propose the first systematic tax…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.5

From Papers to Progress: Rethinking Knowledge Accumulation in Software Engineering

2026-04-17 · Jason Cusati, Chris Brown

Research Track A

Software engineering research has experienced rapid growth in both output and participation over the past decades. Yet concerns persist about the field's ability to accumulate, integrate, and reuse knowledge in ways that support long-term progress. To better understand how the community itself perceives these challenge…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.5

Analyzing Process Data from Computer-Based Assessments: A Tutorial on Preprocessing, Feature Extraction, and Model-Based Inference

2026-04-18 · Daeun Hwangbo, Junyeong Park, Minjeong Jeon, Ick Hoon Jin

Research Track A

Computer-based assessments routinely generate detailed interaction logs -- commonly referred to as process data -- that record every action a respondent performs during task completion, yet systematic preprocessing guidance, integrated analytical workflows, and cross-method consistency checks remain scarce in the liter…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.5

Can Institutional Integration of Western Balkans Stock Exchanges Strengthen Monetary Transmission?

2026-04-20 · Stefan Tanevski

Research Track A

This paper asks how institutional stock-market integration reshapes the transmission of monetary policy through asset prices in small open economies. Motivated by the persistent segmentation of Western Balkan capital markets, we develop a two-stage counterfactual transmission framework to identify how stock-exchange co…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 7.4

ViPO: Visual Preference Optimization at Scale

2026-04-29 · Ming Li, Jie Wu, Justin Cui, Xiaojie Li, Rui Wang, Chen Chen

General AI

While preference optimization is crucial for improving visual generative models, how to effectively scale this paradigm remains largely unexplored. Current open-source preference datasets contain conflicting preference patterns, where winners excel in some dimensions but underperform in others. Naively optimizing on su…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 7.4

Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling

2026-04-30 · Ansar Aynetdinov, Patrick Haller, Alan Akbik

General AI

Recent research has shown that filtering massive English web corpora into high-quality subsets significantly improves training efficiency. However, for high-resource non-English languages like German, French, or Japanese, aggressive filtering creates a strategic dilemma: should practitioners prioritize diversity by tra…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 7.4

Generative Modeling with Orbit-Space Particle Flow Matching

2026-05-04 · Sinan Wang, Jinjin He, Shenyifan Lu, Ruicheng Wang, Greg Turk, Bo Zhu

General AI

We present Orbit-Space Geometric Probability Paths (OGPP), a particle-native flow-matching framework for generative modeling of particle systems. OGPP is motivated by two insights: (i) particles are defined up to permutation symmetries, so anonymous indexing inflates per-index target variance and yields curved, hard-to…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

A Little Rank Goes a Long Way: Random Scaffolds with LoRA Adapters Are All You Need

2026-04-09 · Hananel Hazan, Yanbo Zhang, Benedikt Hartl, Michael Levin

General AI

How many of a neural network's parameters actually encode task-specific information? We investigate this question with LottaLoRA, a training paradigm in which every backbone weight is drawn at random and frozen; only low-rank LoRA adapters are trained. Across nine benchmarks spanning diverse architecture families from …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Low-Data Supervised Adaptation Outperforms Prompting for Cloud Segmentation Under Domain Shift

2026-04-10 · Harshith Kethavath, Weiming Hu

General AI

Adapting vision-language models to remote sensing imagery presents a fundamental challenge: both the visual and linguistic distributions of satellite data lie far outside natural image pretraining corpora. Despite this, prompting remains the dominant deployment paradigm, driven by the assumption that domain-specific la…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

FishRoPE: Projective Rotary Position Embeddings for Omnidirectional Visual Perception

2026-04-12 · Rahul Ahuja, Mudit Jain, Bala Murali Manoghar Sai Sudhakar, Venkatraman Narayanan, Pratik Likhar, Varun Ravi Kumar, Senthil Yogamani

General AI

Vision foundation models (VFMs) and Bird's Eye View (BEV) representation have advanced visual perception substantially, yet their internal spatial representations assume the rectilinear geometry of pinhole cameras. Fisheye cameras, widely deployed on production autonomous vehicles for their surround-view coverage, exhi…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

A Mamba-Based Multimodal Network for Multiscale Blast-Induced Rapid Structural Damage Assessment

2026-04-13 · Wanli Ma, Sivasakthy Selvakumaran, Dain G. Farrimond, Adam A. Dennis, Samuel E. Rigby

General AI

Accurate and rapid structural damage assessment (SDA) is crucial for post-disaster management, helping responders prioritise resources, plan rescues, and support recovery. Traditional field inspections, though precise, are limited by accessibility, safety risks, and time constraints, especially after large explosions. …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Budget-Aware Uncertainty for Radiotherapy Segmentation QA Using nnU-Net

2026-04-13 · Ricardo Coimbra Brioso, Lorenzo Mondo, Damiano Dei, Nicola Lambri, Pietro Mancosu, Marta Scorsetti, Daniele Loiacono

General AI

Accurate delineation of the Clinical Target Volume (CTV) is essential for radiotherapy planning, yet remains time-consuming and difficult to assess, especially for complex treatments such as Total Marrow and Lymph Node Irradiation (TMLI). While deep learning-based auto-segmentation can reduce workload, safe clinical de…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Psychological Concept Neurons: Can Neural Control Bias Probing and Shift Generation in LLMs?

2026-04-13 · Yuto Harada, Hiro Taiyo Hamada

General AI

Using psychological constructs such as the Big Five, large language models (LLMs) can imitate specific personality profiles and predict a user's personality. While LLMs can exhibit behaviors consistent with these constructs, it remains unclear where and how they are represented inside the model and how they relate to b…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Generative Refinement Networks for Visual Synthesis

2026-04-14 · Jian Han, Jinlai Liu, Jiahuan Wang, Bingyue Peng, Zehuan Yuan

General AI

While diffusion models dominate the field of visual generation, they are computationally inefficient, applying a uniform computational effort regardless of different complexity. In contrast, autoregressive (AR) models are inherently complexity-aware, as evidenced by their variable likelihoods, but are often hindered by…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Representation geometry shapes task performance in vision-language modeling for CT enterography

2026-04-14 · Cristian Minoccheri, Emily Wittrup, Kayvan Najarian, Ryan Stidham

General AI

Computed tomography (CT) enterography is a primary imaging modality for assessing inflammatory bowel disease (IBD), yet the representational choices that best support automated analysis of this modality are unknown. We present the first study of vision-language transfer learning on abdominal CT enterography and identif…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

2026-04-14 · Yaxuan Li, Yuxin Zuo, Bingxiang He, Jinqian Zhang, Chaojun Xiao, Cheng Qian, Tianyu Yu, Huan-ang Gao, Wenkai Yang, Zhiyuan Liu, Ning Ding

General AI

On-policy distillation (OPD) has become a core technique in the post-training of large language models, yet its training dynamics remain poorly understood. This paper provides a systematic investigation of OPD dynamics and mechanisms. We first identify that two conditions govern whether OPD succeeds or fails: (i) the s…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

LLM attribution analysis across different fine-tuning strategies and model scales for automated code compliance

2026-04-16 · Jack Wei Lun Shi, Minghao Dang, Wawan Solihin, Justin K. W. Yeoh

General AI

Existing research on large language models (LLMs) for automated code compliance has primarily focused on performance, treating the models as black boxes and overlooking how training decisions affect their interpretive behavior. This paper addresses this gap by employing a perturbation-based attribution analysis to comp…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

MOMENTA: Mixture-of-Experts Over Multimodal Embeddings with Neural Temporal Aggregation for Misinformation Detection

2026-04-17 · Yeganeh Abdollahinejad, Ahmad Mousavi, Naeemul Hassan, Kai Shu, Nathalie Japkowicz, Shahriar Khosravi, Amir Karami

General AI

The widespread dissemination of multimodal content on social media has made misinformation detection increasingly challenging, as misleading narratives often arise not only from textual or visual content alone, but also from semantic inconsistencies between modalities and their evolution over time. Existing multimodal …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

No Universal Courtesy: A Cross-Linguistic, Multi-Model Study of Politeness Effects on LLMs Using the PLUM Corpus

2026-04-17 · Hitesh Mehta, Arjit Saxena, Garima Chhikara, Rohit Kumar

Research Track A · General AI

This paper explores the response of Large Language Models (LLMs) to user prompts with different degrees of politeness and impoliteness. The Politeness Theory by Brown and Levinson and the Impoliteness Framework by Culpeper form the basis of experiments conducted across three languages (English, Hindi, Spanish), five mo…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

PIIBench: A Unified Multi-Source Benchmark Corpus for Personally Identifiable Information Detection

2026-04-17 · Pritesh Jha

General AI

We present PIIBench, a unified benchmark corpus for Personally Identifiable Information (PII) detection in natural language text. Existing resources for PII detection are fragmented across domain-specific corpora with mutually incompatible annotation schemes, preventing systematic comparison of detection systems. We co…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Repurposing 3D Generative Model for Autoregressive Layout Generation

2026-04-17 · Haoran Feng, Yifan Niu, Zehuan Huang, Yang-Tian Sun, Chunchao Guo, Yuxin Peng, Lu Sheng

General AI

We introduce LaviGen, a framework that repurposes 3D generative models for 3D layout generation. Unlike previous methods that infer object layouts from textual descriptions, LaviGen operates directly in the native 3D space, formulating layout generation as an autoregressive process that explicitly models geometric rela…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Spinning Living Crystals of Run-and-Tumble Particles with Environmental Feedback

2026-04-17 · Maks Pečnik Bambič, Nuno A. M. Araújo, Giorgio Volpe

General AI

Collective rotations are common in active matter, enhancing cohesion, transport, and mixing. They are typically attributed to chiral non-reciprocal dynamics due to intrinsic particle chirality, torque-generating interactions among units, or geometric confinement. Here, we uncover a different mechanism for rotational or…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Beyond the Basics: Leveraging Large Language Model for Fine-Grained Medical Entity Recognition

2026-04-19 · Nwe Ni Win, Jim Basilakis, Steven Thomas, Seyhan Yazar, Laura Pierce, Stephanie Liu, Paul M. Middleton, Nasser Ghadiri, X. Rosalind Wang

General AI

Extracting clinically relevant information from unstructured medical narratives such as admission notes, discharge summaries, and emergency case histories remains a challenge in clinical natural language processing (NLP). Medical Entity Recognition (MER) identifies meaningful concepts embedded in these records. Recent …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Aligning Language Models for Lyric-to-Melody Generation with Rule-Based Musical Constraints

2026-04-20 · Hao Meng, Siyuan Zheng, Shuran Zhou, Qiangqiang Wang, Yang Song

General AI

Large Language Models (LLMs) show promise in lyric-to-melody generation, but models trained with Supervised Fine-Tuning (SFT) often produce musically implausible melodies with issues like poor rhythm and unsuitable vocal ranges, a phenomenon we term "constraint violation". To address this, we propose a novel alignment …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Bounded Ratio Reinforcement Learning

2026-04-20 · Yunke Ao, Le Chen, Bruce D. Lee, Assefa S. Wahd, Aline Czarnobai, Philipp Fürnstahl, Bernhard Schölkopf, Andreas Krause

General AI

Proximal Policy Optimization (PPO) has become the predominant algorithm for on-policy reinforcement learning due to its scalability and empirical robustness across domains. However, there is a significant disconnect between the underlying foundations of trust region methods and the heuristic clipped objective used in P…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

QRAFTI: An Agentic Framework for Empirical Research in Quantitative Finance

2026-04-20 · Terence Lim, Kumar Muthuraman, Michael Sury

General AI

We introduce a multi-agent framework intended to emulate parts of a quantitative research team and support equity factor research on large financial panel datasets. QRAFTI integrates a research toolkit for panel data with MCP servers that expose data access, factor construction, and custom coding operations as callable…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

A Gesture-Based Visual Learning Model for Acoustophoretic Interactions using a Swarm of AcoustoBots

2026-04-21 · Alex Lin, Lei Gao, Narsimlu Kemsaram, Sriram Subramanian

General AI

AcoustoBots are mobile acoustophoretic robots capable of delivering mid-air haptics, directional audio, and acoustic levitation, but existing implementations rely on scripted commands and lack an intuitive interface for real-time human control. This work presents a gesture-based visual learning framework for contactles…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

An AI Agent Execution Environment to Safeguard User Data

2026-04-21 · Robert Stanley, Avi Verma, Lillian Tsai, Konstantinos Kallas, Sam Kumar

General AI

AI agents promise to serve as general-purpose personal assistants for their users, which requires them to have access to private user data (e.g., personal and financial information). This poses a serious risk to security and privacy. Adversaries may attack the AI model (e.g., via prompt injection) to exfiltrate user da…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Global Hopf Bifurcation and Symmetric Periodic Solutions in Multi-Agent Systems with Neutral Distributed Delays

2026-04-22 · Casey Crane

General AI

We study the emergence of symmetric oscillatory behavior in multi-agent systems where each agent incorporates a continuous memory of its past states and past rates of change, modeled by distributed retarded and neutral delays. The closed-loop dynamics are described by a system of nonlinear neutral functional differenti…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Global Offshore Wind Infrastructure: Deployment and Operational Dynamics from Dense Sentinel-1 Time Series

2026-04-22 · Thorsten Hoeser, Felix Bachofer, Claudia Kuenzer

General AI

The offshore wind energy sector is expanding rapidly, increasing the need for independent, high-temporal-resolution monitoring of infrastructure deployment and operation at global scale. While Earth Observation based offshore wind infrastructure mapping has matured for spatial localization, existing open datasets lack …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Relative Principals, Pluralistic Alignment, and the Structural Value Alignment Problem

2026-04-22 · Travis LaCroix

General AI

The value alignment problem for artificial intelligence (AI) is often framed as a purely technical or normative challenge, sometimes focused on hypothetical future systems. I argue that the problem is better understood as a structural question about governance: not whether an AI system is aligned in the abstract, but w…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

SpeechParaling-Bench: A Comprehensive Benchmark for Paralinguistic-Aware Speech Generation

2026-04-22 · Ruohan Liu, Shukang Yin, Tao Wang, Dong Zhang, Weiji Zhuang, Shuhuai Ren, Ran He, Caifeng Shan, Chaoyou Fu

General AI

Paralinguistic cues are essential for natural human-computer interaction, yet their evaluation in Large Audio-Language Models (LALMs) remains limited by coarse feature coverage and the inherent subjectivity of assessment. To address these challenges, we introduce SpeechParaling-Bench, a comprehensive benchmark for para…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Evaluation of Automatic Speech Recognition Using Generative Large Language Models

2026-04-23 · Thibault Bañeras-Roux, Shashi Kumar, Driss Khalil, Sergio Burdisso, Petr Motlicek, Shiran Liu, Mickael Rouvier, Jane Wottawa, Richard Dufour

General AI

Automatic Speech Recognition (ASR) is traditionally evaluated using Word Error Rate (WER), a metric that is insensitive to meaning. Embedding-based semantic metrics are better correlated with human perception, but decoder-based Large Language Models (LLMs) remain underexplored for this task. This paper evaluates their …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

First measurement of wind line formation regions in an early O-type star

2026-04-23 · D. Pauli, T. N. Parsons, R. K. Prinja

General AI

Massive stars with their strong ionizing radiation and strong stellar winds are the key feedback agents of the universe. Stellar winds of massive stars are often measured by fitting resonance lines in the UV using non-LTE stellar atmosphere models. So far, the line formation regions of these lines have not been measure…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

From Research Question to Scientific Workflow: Leveraging Agentic AI for Science Automation

2026-04-23 · Bartosz Balis, Michal Orzechowski, Piotr Kica, Michal Dygas, Michal Kuszewski

General AI

Scientific workflow systems automate execution -- scheduling, fault tolerance, resource management -- but not the semantic translation that precedes it. Scientists still manually convert research questions into workflow specifications, a task requiring both domain knowledge and infrastructure expertise. We propose an a…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Revisiting Non-Verbatim Memorization in Large Language Models: The Role of Entity Surface Forms

2026-04-23 · Yuto Nishida, Naoki Shikoda, Yosuke Kishinami, Ryo Fujii, Makoto Morishita, Hidetaka Kamigaito, Taro Watanabe

General AI

Understanding what kinds of factual knowledge large language models (LLMs) memorize is essential for evaluating their reliability and limitations. Entity-based QA is a common framework for analyzing non-verbatim memorization, but typical evaluations query each entity using a single canonical surface form, making it dif…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Scalable Multimodal Beam Alignment in V2X: An Anti-Imbalance Graph Learning Approach

2026-04-23 · Jiahui Liang, Shuoyao Wang, Shijian Gao

General AI

Efficient beam alignment is fundamental to high-throughput and reliable connectivity in Vehicle-to-Everything (V2X) systems. However, conventional beam management in dynamic vehicular topologies incurs prohibitive alignment overhead and struggles to maintain robust links under rapid mobility. To overcome these challeng…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

TraceScope: Interactive URL Triage via Decoupled Checklist Adjudication

2026-04-23 · Haolin Zhang, William Reber, Yuxuan Zhang, Guofei Gu, Jeff Huang

General AI

Modern phishing campaigns increasingly evade snapshot-based URL classifiers using interaction gates (e.g., checkbox/slider challenges), delayed content rendering, and logo-less credential harvesters. This shifts URL triage from static classification toward an interactive forensics task: an analyst must actively navigat…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Dharma, Data and Deception: An LLM-Powered Rhetorical Analysis of Cow-Urine Health Claims on YouTube

2026-04-24 · Sheza Munir, Ratna Kandala, Anamta Khan, Deepti, Joyojeet Pal

General AI

Health misinformation remains one of the most pressing challenges on social media, particularly when cultural traditions intersect with scientific-sounding claims. These dynamics are not only global but also deeply local, manifesting in culturally specific controversies that require careful analysis. Motivated by this,…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

From Natural Language to Verified Code: Toward AI Assisted Problem-to-Code Generation with Dafny-Based Formal Verification

2026-04-24 · Md Erfan, Md Kamal Hossain Chowdhury, Ahmed Ryan, Md Rayhanur Rahman

General AI

Large Language Models (LLMs) show promise in automated software engineering, yet their guarantee of correctness is frequently undermined by erroneous or hallucinated code. To enforce model honesty, formal verification requires LLMs to synthesize implementation logic alongside formal specifications that are subsequently…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Inter-Stance: A Dyadic Multimodal Corpus for Conversational Stance Analysis

2026-04-24 · Xiang Zhang, Xiaotian Li, Taoyue Wang, Nan Bi, Xin Zhou, Cody Zhou, Zoie Wang, Andrew Yang, Yuming Su, Jeff Cohn, Qiang Ji, Lijun Yin

General AI

Social interactions dominate our perceptions of the world and shape our daily behavior by attaching social meaning to acts as simple and spontaneous as gestures, facial expressions, voice, and speech. People mimic and otherwise respond to each other's postures, facial expressions, mannerisms, and other verbal and nonve…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

PASS: A Provenanced Access Subaccount System for Blockchain Wallets

2026-04-24 · Jay Yu, Shunfan Zhou, Hang Yin, Brian Seong

General AI

Blockchain wallets conventionally follow an ownership model where possession of a private key grants unilateral control. However, this assumption is brittle for emerging settings such as AI agent wallets, organizational custody, and enterprise payroll, where multiple actors must coordinate without exposing secrets or l…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Representational Harms in LLM-Generated Narratives Against Global Majority Nationalities

2026-04-24 · Ilana Nguyen, Harini Suresh, Thema Monroe-White, Evan Shieh

General AI

Large language models (LLMs) are increasingly used for text generation tasks from everyday use to high-stakes enterprise and government applications, including simulated interviews with asylum seekers. While many works highlight the new potential applications of LLMs, there are risks of LLMs encoding and perpetuating h…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Governing What You Cannot Observe: Adaptive Runtime Governance for Autonomous AI Agents

2026-04-27 · German Marin, Jatin Chaudhary

General AI

Autonomous AI agents can remain fully authorized and still become unsafe as behavior drifts, adversaries adapt, and decision patterns shift without any code change. We propose the \textbf{Informational Viability Principle}: governing an agent reduces to estimating a bound on unobserved risk $\hat{B}(x) = U(x) + SB(x) +…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Sentiment and Emotion Classification of Indonesian E-Commerce Reviews via Multi-Task BiLSTM and AutoML Benchmarking

2026-04-27 · Hermawan Manurung, Ibrahim Al-Kahfi, Ahmad Rizqi, Martin Clinton Tosima Manullang

General AI

Indonesian marketplace reviews mix standard vocabulary with slang, regional loanwords, numeric shorthands, and emoji, making lexicon-based sentiment tools unreliable in practice. This paper describes a two-track classification pipeline applied to the PRDECT-ID dataset, which contains 5,400 product reviews from 29 Indon…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

2026-05-12 · Yuanda Xu, Hejian Sang, Zhengze Zhou, Ran He, Zhipeng Wang, Alborz Geramifard

General AI

In settings where labeled verifiable training data is the binding constraint, each checked example should be allocated carefully. The standard practice is to use this data directly on the model that will be deployed, for example by running GRPO on the deployment student. We argue that this is often an inefficient alloc…

Review
pending
Role
unreviewed
Read
later
arxiv Score 7.3

CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives

2026-05-12 · Yihao Meng, Zichen Liu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Yue Yu, Hanlin Wang, Haobo Li, Jiapeng Zhu, Yanhong Zeng, Xing Zhu, Yujun Shen, Qifeng Chen, Huamin Qu

General AI

Autoregressive video generation aims at real-time, open-ended synthesis. Yet, cinematic storytelling is not merely the endless extension of a single scene; it requires progressing through evolving events, viewpoint shifts, and discrete shot boundaries. Existing autoregressive models often struggle in this setting. Trai…

Review
pending
Role
unreviewed
Read
later
arxiv Score 7.3

EgoForce: Forearm-Guided Camera-Space 3D Hand Pose from a Monocular Egocentric Camera

2026-05-12 · Christen Millerdurai, Shaoxiang Wang, Yaxu Xie, Vladislav Golyanik, Didier Stricker, Alain Pagani

General AI

Reconstructing the absolute 3D pose and shape of the hands from the user's viewpoint using a single head-mounted camera is crucial for practical egocentric interaction in AR/VR, telepresence, and hand-centric manipulation tasks, where sensing must remain compact and unobtrusive. While monocular RGB methods have made pr…

Review
pending
Role
unreviewed
Read
later
arxiv Score 7.3

Letting the neural code speak: Automated characterization of monkey visual neurons through human language

2026-05-12 · Vedang Lad, Katrin Franke, Tamar Rott Shaham, Surya Ganguli, Andreas S. Tolias, Sophia Sanborn, Nikos Karantzas

General AI

Understanding what individual neurons encode is a core question in neuroscience. In primary visual cortex (V1), mathematical models (e.g., Gabor functions) capture neural selectivity, but no comparable framework exists for higher areas. We show that natural language can fill this role: across macaque V1 and V4, the sel…

Review
pending
Role
unreviewed
Read
later
arxiv Score 7.2

A well-motivated model of pedestrian dynamics

2026-04-29 · Ezel Üsten, Anna Sieben, Mohcine Chraibi, Armin Seyfried

General AI

In pedestrian dynamics, the internal drive that propels individuals toward their goals is typically captured by a single, fixed parameter, the desired walking speed. This simplification overlooks that motivation fluctuates in response to changing spatial and social conditions within a crowd. This paper proposes a dynam…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.2

ClassEval-Pro: A Cross-Domain Benchmark for Class-Level Code Generation

2026-04-29 · Yeheng Chen, Chaoxiang Xie, Yuling Shi, Wenhao Zeng, Yongpan Wang, Hongyu Zhang, Xiaodong Gu

General AI

LLMs have achieved strong results on both function-level code synthesis and repository-level code modification, yet a capability that falls between these two extremes -- compositional code creation, i.e., building a complete, internally structured class from a specification -- remains underserved. Current evaluations a…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.2

HealthNLP_Retrievers at ArchEHR-QA 2026: Cascaded LLM Pipeline for Grounded Clinical Question Answering

2026-04-29 · Md Biplob Hosen, Md Alomgeer Hussein, Md Akmol Masud, Omar Faruque, Tera L Reynolds, Lujie Karen Chen

General AI

Patient portals now give individuals direct access to their electronic health records (EHRs), yet access alone does not ensure patients understand or act on the complex clinical information contained in these records. The ArchEHR-QA 2026 shared task addresses this challenge by focusing on grounded question answering ov…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.2

MoRFI: Monotonic Sparse Autoencoder Feature Identification

2026-04-29 · Dimitris Dimakopoulos, Shay B. Cohen, Ioannis Konstas

General AI

Large language models (LLMs) acquire most of their factual knowledge during the pre-training stage, through next token prediction. Subsequent stages of post-training often introduce new facts outwith the parametric knowledge, giving rise to hallucinations. While it has been demonstrated that supervised fine-tuning (SFT…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.2

An adaptive wavelet-based PINN for problems with localized high-magnitude source

2026-04-30 · Himanshu Pandey, Ratikanta Behera

General AI

In recent years, physics-informed neural networks (PINNs) have gained significant attention for solving differential equations, although they suffer from two fundamental limitations, namely, spectral bias inherent in neural networks and loss imbalance arising from multiscale phenomena. This paper proposes an adaptive w…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.2

Generalizable Sparse-View 3D Reconstruction from Unconstrained Images

2026-04-30 · Vinayak Gupta, Chih-Hao Lin, Shenlong Wang, Anand Bhattad, Jia-Bin Huang

General AI

Reconstructing 3D scenes from sparse, unposed images remains challenging under real-world conditions with varying illumination and transient occlusions. Existing methods rely on scene-specific optimization using appearance embeddings or dynamic masks, which requires extensive per-scene training and fails under sparse v…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.2

EASE: Federated Multimodal Unlearning via Entanglement-Aware Anchor Closure

2026-05-01 · Zihao Ding, Beining Wu, Jun Huang

General AI

Federated Multimodal Learning (FML) trains multimodal models across decentralized clients while keeping their image-text pairs private. However, joint embedding training entangles forgotten knowledge across both modalities and client gradient subspaces, hindering federated unlearning. Previous federated unlearning appr…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.2

GMGaze: MoE-Based Context-Aware Gaze Estimation with CLIP and Multiscale Transformer

2026-05-01 · Xinyuan Zhao, Yihang Wu, Ahmad Chaddad, Sarah A. Alkhodair, Reem Kateb

General AI

Gaze estimation methods commonly use facial appearances to predict the direction of a person gaze. However, previous studies show three major challenges with convolutional neural network (CNN)-based, transformer-based, and contrastive language-image pre-training (CLIP)-based methods, including late fusion of image feat…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.2

Meritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Values

2026-05-01 · Shradha Sharma, Swapnil Dhamal, Shweta Jain

General AI

We propose a new framework for meritocratic fairness in budgeted combinatorial multi-armed bandits with full-bandit feedback (BCMAB-FBF). Unlike semi-bandit feedback, the contribution of individual arms is not received in full-bandit feedback, making the setting significantly more challenging. To compute arm contributi…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.2

ProPACT: A Proactive AI-Driven Adaptive Collaborative Tutor for Pair Programming

2026-05-04 · Anahita Golrang, Kshitij Sharma, olga viberg

General AI

Effective pair programming depends on coordination of attention, cognitive effort, and joint regulation over time, yet most adaptive learning systems remain individual-centric and reactive. This paper introduces ProPACT, a proactive AI-driven adaptive collaborative tutor that treats collaboration itself as the object o…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.2

PriorNet: Prior-Guided Engagement Estimation from Face Video

2026-05-05 · Alexander Vedernikov

General AI

Engagement estimation from face video remains challenging because facial evidence is often incomplete, labeled data are limited, and engagement annotations are subjective. We present PriorNet, a prior-guided framework that injects task-relevant priors at three stages of the pipeline: preprocessing, model adaptation, an…

Review
pending
Role
unreviewed
Read
later
huggingface Score 7.0

Extending Precipitation Nowcasting Horizons via Spectral Fusion of Radar Observations and Foundation Model Priors

2026-03-23 · Yuze Qin, Qingyong Li, Zhiqing Guo, Wen Wang, Yan Liu, Yangli-ao Geng

General AI

Precipitation nowcasting is critical for disaster mitigation and aviation safety. However, radar-only models frequently suffer from a lack of large-scale atmospheric context, leading to performance degradation at longer lead times. While integrating meteorological variables predicted by weather foundation models offers…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 7.0

AVO: Agentic Variation Operators for Autonomous Evolutionary Search

2026-03-25 · Terry Chen, Zhifan Ye, Bing Xu, Zihao Ye, Timmy Liu, Ali Hassani, Tianqi Chen, Andrew Kerr, Haicheng Wu, Yang Xu, Yu-Jung Chen, Hanfeng Chen, Aditya Kane, Ronny Krashinsky, Ming-Yu Liu, Vinod Grover, Luis Ceze, Roger Bringmann, John Tran, Wei Liu, Fung Xie, Michael Lightstone, Humphrey Shi

General AI

Agentic Variation Operators (AVO) are a new family of evolutionary variation operators that replace the fixed mutation, crossover, and hand-designed heuristics of classical evolutionary search with autonomous coding agents. Rather than confining a language model to candidate generation within a prescribed pipeline, AVO…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.0

GridVAD: Open-Set Video Anomaly Detection via Spatial Reasoning over Stratified Frame Grids

2026-03-26 · Mohamed Eltahir, Ahmed O. Ibrahim, Obada Siralkhatim, Tabarak Abdallah, Sondos Mohamed

Research Track A · General AI

Vision-Language Models (VLMs) are powerful open-set reasoners, yet their direct use as anomaly detectors in video surveillance is fragile: without calibrated anomaly priors, they alternate between missed detections and hallucinated false alarms. We argue the problem is not the VLM itself but how it is used. VLMs should…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 7.0

Lingshu-Cell: A generative cellular world model for transcriptome modeling toward virtual cells

2026-03-26 · Han Zhang, Guo-Hua Yuan, Chaohao Yuan, Tingyang Xu, Tian Bian, Hong Cheng, Wenbing Huang, Deli Zhao, Yu Rong

General AI

Modeling cellular states and predicting their responses to perturbations are central challenges in computational biology and the development of virtual cells. Existing foundation models for single-cell transcriptomics provide powerful static representations, but they do not explicitly model the distribution of cellular…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.0

Dual-Stage Invariant Continual Learning under Extreme Visual Sparsity

2026-03-27 · Rangya Zhang, Jiaping Xiao, Lu Bai, Yuhang Zhang, Mir Feroskhan

Research Track A

Continual learning seeks to maintain stable adaptation under non-stationary environments, yet this problem becomes particularly challenging in object detection, where most existing methods implicitly assume relatively balanced visual conditions. In extreme-sparsity regimes, such as those observed in space-based residen…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.0

HAD: Heterogeneity-Aware Distillation for Lifelong Heterogeneous Learning

2026-03-27 · Xuerui Zhang, Xuehao Wang, Zhan Zhuang, Linglan Zhao, Ziyue Li, Xinmin Zhang, Zhihuan Song, Yu Zhang

Research Track A

Lifelong learning aims to preserve knowledge acquired from previous tasks while incorporating knowledge from a sequence of new tasks. However, most prior work explores only streams of homogeneous tasks (\textit{e.g.}, only classification tasks) and neglects the scenario of learning across heterogeneous tasks that posse…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.0

Auto-Stabilized Weak Galerkin Finite Element Methods for Biot's consolidation model on Non-Convex Polytopal Meshes

2026-03-29 · Chunmei Wang, Shangyou Zhang

Research Track A

This paper presents an auto-stabilized weak Galerkin (WG) finite element method for the Biot's consolidation model within the classical displacement-pressure two-field formulation. Unlike traditional WG approaches, the proposed scheme achieves numerical stability without the requirement of traditional stabilizers. Spat…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 7.0

MultiGen: Level-Design for Editable Multiplayer Worlds in Diffusion Game Engines

2026-03-30 · Ryan Po, David Junhao Zhang, Amir Hertz, Gordon Wetzstein, Neal Wadhwa, Nataniel Ruiz

General AI

Video world models have shown immense promise for interactive simulation and entertainment, but current systems still struggle with two important aspects of interactivity: user control over the environment for reproducible, editable experiences, and shared inference where players hold influence over a common world. To …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 7.0

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

2026-04-06 · Zijun Wang, Haoqin Tu, Letian Zhang, Hardy Chen, Juncheng Wu, Xiangyan Liu, Zhenlong Yuan, Tianyu Pang, Michael Qizhe Shieh, Fengze Liu, Zeyu Zheng, Huaxiu Yao, Yuyin Zhou, Cihang Xie

General AI

OpenClaw, the most widely deployed personal AI agent in early 2026, operates with full local system access and integrates with sensitive services such as Gmail, Stripe, and the filesystem. While these broad privileges enable high levels of automation and powerful personalization, they also expose a substantial attack s…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 7.0

The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment

2026-04-07 · Rishab Balasubramanian, Pin-Jie Lin, Rituraj Sharma, Anjie Fang, Fardin Abdi, Viktor Rozgic, Zheng Du, Mohit Bansal, Tu Vu

General AI

We investigate whether post-trained capabilities can be transferred across models without retraining, with a focus on transfer across different model scales. We propose the Master Key Hypothesis, which states that model capabilities correspond to directions in a low-dimensional latent subspace that induce specific beha…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.0

How Do LLMs and VLMs Understand Viewpoint Rotation Without Vision? An Interpretability Study

2026-04-16 · Zhen Yang, Ping Jian, Zhongbin Guo, Zuming Zhang, Chengzhi Li, Yonghong Deng, Xinyue Zhang, Wenpeng Lu

Research Track A

Over the past year, spatial intelligence has drawn increasing attention. Many prior works study it from the perspective of visual-spatial intelligence, where models have access to visuospatial information from visual inputs. However, in the absence of visual information, whether linguistic intelligence alone is suffici…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 7.0

Diffusion Templates: A Unified Plugin Framework for Controllable Diffusion

2026-04-27 · Zhongjie Duan, Hong Zhang, Yingda Chen

General AI

Controllable diffusion methods have substantially expanded the practical utility of diffusion models, but they are typically developed as isolated, backbone-specific systems with incompatible training pipelines, parameter formats, and runtime hooks. This fragmentation makes it difficult to reuse infrastructure across t…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 7.0

Safety Drift After Fine-Tuning: Evidence from High-Stakes Domains

2026-04-27 · Emaan Bilal Khan, Amy Winecoff, Miranda Bogen, Dylan Hadfield-Menell

General AI

Foundation models are routinely fine-tuned for use in particular domains, yet safety assessments are typically conducted only on base models, implicitly assuming that safety properties persist through downstream adaptation. We test this assumption by analyzing the safety behavior of 100 models, including widely deploye…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 7.0

RemoteZero: Geospatial Reasoning with Zero Human Annotations

2026-05-06 · Liang Yao, Fan Liu, Shengxiang Xu, Chuanyi Zhang, Rui Min, Shimin Di, Yuhui Zheng

General AI

Geospatial reasoning requires models to resolve complex spatial semantics and user intent into precise target locations for Earth observation. Recent progress has liberated the reasoning path from manual curation, allowing models to generate their own inference chains. Yet a final dependency remains: they are still sup…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.8

Scaling DoRA: High-Rank Adaptation via Factored Norms and Fused Kernels

2026-03-23 · Alexandra Zelenin, Alexandra Zhuravlyova

General AI

Weight-Decomposed Low-Rank Adaptation (DoRA) extends LoRA by decoupling weight magnitude from direction, but its forward pass requires the row-wise norm of W + sBA, a computation that every major framework we surveyed implements by materializing the dense [d_out, d_in] product BA. At d_in = 8192 and rank r = 384, a sin…

Review
pending
Role
unreviewed
Read
now
arxiv Score 6.8

Enhancing Efficiency and Performance in Deepfake Audio Detection through Neuron-level Dropin & Neuroplasticity Mechanisms

2026-03-25 · Yupei Li, Shuaijie Shao, Manuel Milling, Björn Schuller

General AI

Current audio deepfake detection has achieved remarkable performance using diverse deep learning architectures such as ResNet, and has seen further improvements with the introduction of large models (LMs) like Wav2Vec. The success of large language models (LLMs) further demonstrates the benefits of scaling model parame…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

Beyond Via: Analysis and Estimation of the Impact of Large Language Models in Academic Papers

2026-03-26 · Mingmeng Geng, Yuhang Dong, Thierry Poibeau

General AI

Through an analysis of arXiv papers, we report several shifts in word usage that are likely driven by large language models (LLMs) but have not previously received sufficient attention, such as the increased frequency of "beyond" and "via" in titles and the decreased frequency of "the" and "of" in abstracts. Due to the…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

PixelSmile: Toward Fine-Grained Facial Expression Editing

2026-03-26 · Jiabin Hua, Hengyuan Xu, Aojie Li, Wei Cheng, Gang Yu, Xingjun Ma, Yu-Gang Jiang

General AI

Fine-grained facial expression editing has long been limited by intrinsic semantic overlap. To address this, we construct the Flex Facial Expression (FFE) dataset with continuous affective annotations and establish FFE-Bench to evaluate structural confusion, editing accuracy, linear controllability, and the trade-off b…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

Self-Improvement of Large Language Models: A Technical Overview and Future Outlook

2026-03-26 · Haoyan Yang, Mario Xerri, Solha Park, Huajian Zhang, Yiyang Feng, Sai Akhil Kogilathota, Jiawei Zhou

General AI

As large language models (LLMs) continue to advance, improving them solely through human supervision is becoming increasingly costly and limited in scalability. As models approach human-level capabilities in certain domains, human feedback may no longer provide sufficiently informative signals for further improvement. …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

Adaptive Block-Scaled Data Types

2026-03-30 · Jack Cook, Hyemin S. Lee, Kathryn Le, Junxian Guo, Giovanni Traverso, Anantha P. Chandrakasan, Song Han

General AI

NVFP4 has grown increasingly popular as a 4-bit format for quantizing large language models due to its hardware support and its ability to retain useful information with relatively few bits per parameter. However, the format is not without limitations: recent work has shown that NVFP4 suffers from its error distributio…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining

2026-03-30 · Anuj Diwan, Eunsol Choi, David Harwath

General AI

We introduce ParaSpeechCLAP, a dual-encoder contrastive model that maps speech and text style captions into a common embedding space, supporting a wide range of intrinsic (speaker-level) and situational (utterance-level) descriptors (such as pitch, texture and emotion) far beyond the narrow set handled by existing mode…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

SAGAI-MID: A Generative AI-Driven Middleware for Dynamic Runtime Interoperability

2026-03-30 · Oliver Aleksander Larsen, Mahyar T. Moghaddam

General AI

Modern distributed systems integrate heterogeneous services, REST APIs with different schema versions, GraphQL endpoints, and IoT devices with proprietary payloads that suffer from persistent schema mismatches. Traditional static adapters require manual coding for every schema pair and cannot handle novel combinations …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

Temporal Credit Is Free

2026-03-30 · Aur Shalev Merin

General AI

Recurrent networks do not need Jacobian propagation to adapt online. The hidden state already carries temporal credit through the forward pass; immediate derivatives suffice if you stop corrupting them with stale trace memory and normalize gradient scales across parameter groups. An architectural rule predicts when nor…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

XSPA: Crafting Imperceptible X-Shaped Sparse Adversarial Perturbations for Transferable Attacks on VLMs

2026-03-30 · Chengyin Hu, Jiaju Han, Xuemeng Sun, Qike Zhang, Yiwei Wei, Ang Li, Chunlei Meng, Xiang Chen, Jiahuan Long

General AI

Vision-language models (VLMs) rely on a shared visual-textual representation space to perform tasks such as zero-shot classification, image captioning, and visual question answering (VQA). While this shared space enables strong cross-task generalization, it may also introduce a common vulnerability: small visual pertur…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

Multimodal Higher-Order Brain Networks: A Topological Signal Processing Perspective

2026-03-31 · Breno C. Bispo, Stefania Sardellitti, Juliano B. Lima, Fernando A. N. Santos

General AI

Brain connectomics is still largely dominated by pairwise-based models, such as graphs, which cannot represent circulatory or higher-order functional interactions. In this paper, we propose a multimodal framework based on Topological Signal Processing (TSP) that models the brain as a higher-order topological domain and…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

Quantization with Unified Adaptive Distillation to enable multi-LoRA based one-for-all Generative Vision Models on edge

2026-03-31 · Sowmya Vajrala, Aakash Parmar, Prasanna R, Sravanth Kodavanti, Manjunath Arveti, Srinivas Soumitri Miriyala, Ashok Senapati

General AI

Generative Artificial Intelligence (GenAI) features such as image editing, object removal, and prompt-guided image transformation are increasingly integrated into mobile applications. However, deploying Large Vision Models (LVMs) for such tasks on resource-constrained devices remains challenging due to their high memor…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

Tucker Attention: A generalization of approximate attention mechanisms

2026-03-31 · Timon Klein, Jonas Kusch, Sebastian Sager, Stefan Schnake, Steffen Schotthöfer

General AI

The pursuit of reducing the memory footprint of the self-attention mechanism in multi-headed self attention (MHA) spawned a rich portfolio of methods, e.g., group-query attention (GQA) and multi-head latent attention (MLA). The methods leverage specialized low-rank factorizations across embedding dimensions or attentio…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

BVFLMSP : Bayesian Vertical Federated Learning for Multimodal Survival with Privacy

2026-04-02 · Abhilash Kar, Basisth Saha, Tanmay Sen, Biswabrata Pradhan

General AI

Multimodal time-to-event prediction often requires integrating sensitive data distributed across multiple parties, making centralized model training impractical due to privacy constraints. At the same time, most existing multimodal survival models produce single deterministic predictions without indicating how confiden…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

MiCA Learns More Knowledge Than LoRA and Full Fine-Tuning

2026-04-02 · Sten Rüdiger, Sebastian Raschka

General AI

Minor Component Adaptation (MiCA) is a novel parameter-efficient fine-tuning method for large language models that focuses on adapting underutilized subspaces of model representations. Unlike conventional methods such as Low-Rank Adaptation (LoRA), which target dominant subspaces, MiCA leverages Singular Value Decompos…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

Smoothing the Landscape: Causal Structure Learning via Diffusion Denoising Objectives

2026-04-02 · Hao Zhu, Di Zhou, Donna Slonim

General AI

Understanding causal dependencies in observational data is critical for informing decision-making. These relationships are often modeled as Bayesian Networks (BNs) and Directed Acyclic Graphs (DAGs). Existing methods, such as NOTEARS and DAG-GNN, often face issues with scalability and stability in high-dimensional data…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

ClickAIXR: On-Device Multimodal Vision-Language Interaction with Real-World Objects in Extended Reality

2026-04-06 · Dawar Khan, Alexandre Kouyoumdjian, Xinyu Liu, Omar Mena, Dominik Engel, Ivan Viola

General AI

We present ClickAIXR, a novel on-device framework for multimodal vision-language interaction with objects in extended reality (XR). Unlike prior systems that rely on cloud-based AI (e.g., ChatGPT) or gaze-based selection (e.g., GazePointAR), ClickAIXR integrates an on-device vision-language model (VLM) with a controlle…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

FAVE: Flow-based Average Velocity Establishment for Sequential Recommendation

2026-04-06 · Ke Shi, Yao Zhang, Feng Guo, Jinyuan Zhang, JunShuo Zhang, Shen Gao, Shuo Shang

General AI

Generative recommendation has emerged as a transformative paradigm for capturing the dynamic evolution of user intents in sequential recommendation. While flow-based methods improve the efficiency of diffusion models, they remain hindered by the ``Noise-to-Data'' paradigm, which introduces two critical inefficiencies: …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement

2026-04-07 · Qimin Zhong, Hao Liao, Haiming Qin, Mingyang Zhou, Rui Mao, Wei Chen, Naipeng Chao

General AI

Whether Large Language Models (LLMs) develop coherent internal world models remains a core debate. While conventional Next-Token Prediction (NTP) focuses on one-step-ahead supervision, Multi-Token Prediction (MTP) has shown promise in learning more structured representations. In this work, we provide a theoretical pers…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

Who Governs the Machine? A Machine Identity Governance Taxonomy (MIGT) for AI Systems Operating Across Enterprise and Geopolitical Boundaries

2026-04-07 · Andrew Kurtz, Klaudia Krawiecka

General AI

The governance of artificial intelligence has a blind spot: the machine identities that AI systems use to act. AI agents, service accounts, API tokens, and automated workflows now outnumber human identities in enterprise environments by ratios exceeding 80 to 1, yet no integrated framework exists to govern them. A sing…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

Bridging the Gap between Micro-scale Traffic Simulation and 4D Digital Cityscapes

2026-04-09 · Longxiang Jiao, Lukas Hofmann, Yiru Yang, Zhanyi Wu, Jonas Egeler

General AI

While micro-scale traffic simulations provide essential data for urban planning, they are rarely coupled with the high-fidelity visualization or auralization necessary for effective stakeholder communication. In this work, we present a real-time 4D visualization framework that couples the SUMO traffic with a photoreali…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

Fail2Drive: Benchmarking Closed-Loop Driving Generalization

2026-04-09 · Simon Gerstenecker, Andreas Geiger, Katrin Renz

General AI

Generalization under distribution shift remains a central bottleneck for closed-loop autonomous driving. Although simulators like CARLA enable safe and scalable testing, existing benchmarks rarely measure true generalization: they typically reuse training scenarios at test time. Success can therefore reflect memorizati…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

Scal3R: Scalable Test-Time Training for Large-Scale 3D Reconstruction

2026-04-09 · Tao Xie, Peishan Yang, Yudong Jin, Yingfeng Cai, Wei Yin, Weiqiang Ren, Qian Zhang, Wei Hua, Sida Peng, Xiaoyang Guo, Xiaowei Zhou

General AI

This paper addresses the task of large-scale 3D scene reconstruction from long video sequences. Recent feed-forward reconstruction models have shown promising results by directly regressing 3D geometry from RGB images without explicit 3D priors or geometric constraints. However, these methods often struggle to maintain…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

What Drives Representation Steering? A Mechanistic Case Study on Steering Refusal

2026-04-09 · Stephen Cheng, Sarah Wiegreffe, Dinesh Manocha

General AI

Applying steering vectors to large language models (LLMs) is an efficient and effective model alignment technique, but we lack an interpretable explanation for how it works-- specifically, what internal mechanisms steering vectors affect and how this results in different model outputs. To investigate the causal mechani…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

A systematic literature Review for Transformer-based Software Vulnerability detection

2026-04-27 · Fiza Naseer, Javed Ali Khan, Muhammad Yaqoob, Alexios Mylonas, Ishaya Gambo

General AI

Context: Software vulnerabilities pose significant security threats to software systems, especially as software is increasingly used across many areas of daily life, including health, government, and finance. Recently, transformer-based models have demonstrated promising results in automatic software vulnerability iden…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

From Review to Design: Ethical Multimodal Driver Monitoring Systems for Risk Mitigation, Incident Response, and Accountability in Automated Vehicles

2026-05-07 · Bilal Khana, Waseem Shariff, Rory Coyne, Muhammad Ali Farooq, Peter Corcoran

General AI

As vehicles transition toward higher levels of automation, Driver Monitoring Systems (DMS) have become essential for ensuring human oversight, safety, and regulatory compliance in a vehicle. These systems rely on multimodal sensing and AI-driven inference to assess driver attention, cognitive state, and readiness to ta…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.6

Filtering Memorization from Parameter-Space in Diffusion Models

2026-05-11 · Yu Zhe, Yang Jiayan, Wei Junhao, Yu-Lin Tsai, Wang Chen

General AI

Low-Rank Adaptation (LoRA) has become a widely used mechanism for customizing diffusion models, enabling users to inject new visual concepts or styles through lightweight parameter updates. However, LoRAs can memorize training images, causing generated outputs to reproduce copyrighted or sensitive content. This risk is…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.6

Per-Loss Adapters for Gradient Conflict in Physics-Informed Neural Networks

2026-05-11 · Bum Jun Kim, Gnankan Landry Regis N'guessan

General AI

Physics-informed neural networks (PINNs) train a single neural approximation by minimizing multiple physics- and data-derived losses, but the gradients of these losses often interfere and can stall optimization. Existing remedies typically treat this pathology either through scalar loss balancing or full-parameter-spac…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.5

Matching Accuracy, Different Geometry: Evolution Strategies vs GRPO in LLM Post-Training

2026-04-02 · William Hoy, Binxu Wang, Xu Pan

Research Track A · General AI

Evolution Strategies (ES) have emerged as a scalable gradient-free alternative to reinforcement learning based LLM fine-tuning, but it remains unclear whether comparable task performance implies comparable solutions in parameter space. We compare ES and Group Relative Policy Optimization (GRPO) across four tasks in bot…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.5

TRU: Targeted Reverse Update for Efficient Multimodal Recommendation Unlearning

2026-04-02 · Zhanting Zhou, KaHou Tam, Ziqiang Zheng, Zeyu Ma

Research Track A · General AI

Multimodal recommendation systems (MRS) jointly model user-item interaction graphs and rich item content, but this tight coupling makes user data difficult to remove once learned. Approximate machine unlearning offers an efficient alternative to full retraining, yet existing methods for MRS mainly rely on a largely uni…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.5

Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing

2026-04-03 · Anastasiia Filippova, David Grangier, Marco Cuturi, João Monteiro

General AI

Serving transformer language models with high throughput requires caching Key-Values (KVs) to avoid redundant computation during autoregressive generation. The memory footprint of KV caching is significant and heavily impacts serving costs. This work proposes to lessen these memory requirements. While recent work has l…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.5

Exclusive Unlearning

2026-04-07 · Mutsumi Sasaki, Kouta Nakayama, Yusuke Miyao, Yohei Oseki, Masaru Isonuma

Research Track A · General AI

When introducing Large Language Models (LLMs) into industrial applications, such as healthcare and education, the risk of generating harmful content becomes a significant challenge. While existing machine unlearning methods can erase specific harmful knowledge and expressions, diverse harmful content makes comprehensiv…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.5

Beyond Perception Errors: Semantic Fixation in Large Vision-Language Models

2026-04-13 · Md Tanvirul Alam

General AI

Large vision-language models (VLMs) often rely on familiar semantic priors, but existing evaluations do not cleanly separate perception failures from rule-mapping failures. We study this behavior as semantic fixation: preserving a default interpretation even when the prompt specifies an alternative, equally valid mappi…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.5

Continuous Adversarial Flow Models

2026-04-13 · Shanchuan Lin, Ceyuan Yang, Zhijie Lin, Hao Chen, Haoqi Fan

General AI

We propose continuous adversarial flow models, a type of continuous-time flow model trained with an adversarial objective. Unlike flow matching, which uses a fixed mean-squared-error criterion, our approach introduces a learned discriminator to guide training. This change in objective induces a different generalized di…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.5

LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment

2026-04-13 · Dujun Nie, Fengjiao Chen, Qi Lv, Jun Kuang, Xiaoyu Li, Xuezhi Cao, Xunliang Cai

General AI

While the shortage of explicit action data limits Vision-Language-Action (VLA) models, human action videos offer a scalable yet unlabeled data source. A critical challenge in utilizing large-scale human video datasets lies in transforming visual signals into ontology-independent representations, known as latent actions…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.5

Seeing Through Touch: Tactile-Driven Visual Localization of Material Regions

2026-04-13 · Seongyu Kim, Seungwoo Lee, Hyeonggon Ryu, Joon Son Chung, Arda Senocak

General AI

We address the problem of tactile localization, where the goal is to identify image regions that share the same material properties as a tactile input. Existing visuo-tactile methods rely on global alignment and thus fail to capture the fine-grained local correspondences required for this task. The challenge is amplifi…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.5

TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text Alignment

2026-04-13 · Bingyi Cao, Koert Chen, Kevis-Kokitsi Maninis, Kaifeng Chen, Arjun Karpur, Ye Xia, Sahil Dua, Tanmaya Dabral, Guangxing Han, Bohyung Han, Joshua Ainslie, Alex Bewley, Mithun Jacob, René Wagner, Washington Ramos, Krzysztof Choromanski, Mojtaba Seyedhosseini, Howard Zhou, André Araujo

General AI

Recent progress in vision-language pretraining has enabled significant improvements to many downstream computer vision applications, such as classification, retrieval, segmentation and depth prediction. However, a fundamental capability that these models still struggle with is aligning dense patch representations with …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.5

Self-Adversarial One Step Generation via Condition Shifting

2026-04-14 · Deyuan Liu, Peng Sun, Yansen Han, Zhenglin Cheng, Chuyan Chen, Tao Lin

General AI

The push for efficient text to image synthesis has moved the field toward one step sampling, yet existing methods still face a three way tradeoff among fidelity, inference speed, and training efficiency. Approaches that rely on external discriminators can sharpen one step performance, but they often introduce training …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.5

ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics

2026-04-17 · Heewon Oh

General AI

We present ArtifactNet, a lightweight framework that detects AI-generated music by reframing the problem as forensic physics -- extracting and analyzing the physical artifacts that neural audio codecs inevitably imprint on generated audio. A bounded-mask UNet (ArtifactUNet, 3.6M parameters) extracts codec residuals fro…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.5

TwinTrack: Post-hoc Multi-Rater Calibration for Medical Image Segmentation

2026-04-17 · Tristan Kirscher, Alexandra Ertl, Klaus Maier-Hein, Xavier Coubez, Philippe Meyer, Sylvain Faisan

General AI

Pancreatic ductal adenocarcinoma (PDAC) segmentation on contrast-enhanced CT is inherently ambiguous: inter-rater disagreement among experts reflects genuine uncertainty rather than annotation noise. Standard deep learning approaches assume a single ground truth, producing probabilistic outputs that can be poorly calib…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.5

Test-Time Adaptation for EEG Foundation Models: A Systematic Study under Real-World Distribution Shifts

2026-04-18 · Gabriel Jason Lee, Jathurshan Pradeepkumar, Jimeng Sun

General AI

Electroencephalography (EEG) foundation models have shown strong potential for learning generalizable representations from large-scale neural data, yet their clinical deployment is hindered by distribution shifts across clinical settings, devices, and populations. Test-time adaptation (TTA) offers a promising solution …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.5

LLM Safety From Within: Detecting Harmful Content with Internal Representations

2026-04-20 · Difan Jiao, Yilun Liu, Ye Yuan, Zhenwei Tang, Linfeng Du, Haolun Wu, Ashton Anderson

General AI

Guard models are widely used to detect harmful content in user prompts and LLM responses. However, state-of-the-art guard models rely solely on terminal-layer representations and overlook the rich safety-relevant features distributed across internal layers. We present SIREN, a lightweight guard model that harnesses the…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.5

ATTN-FIQA: Interpretable Attention-based Face Image Quality Assessment with Vision Transformers

2026-04-21 · Guray Ozgur, Tahar Chettaoui, Eduarda Caldeira, Jan Niklas Kolf, Marco Huber, Andrea Atzori, Naser Damer, Fadi Boutros

General AI

Face Image Quality Assessment (FIQA) aims to assess the recognition utility of face samples and is essential for reliable face recognition (FR) systems. Existing approaches require computationally expensive procedures such as multiple forward passes, backpropagation, or additional training, and only recent work has foc…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.5

EX-FIQA: Leveraging Intermediate Early eXit Representations from Vision Transformers for Face Image Quality Assessment

2026-04-21 · Guray Ozgur, Tahar Chettaoui, Eduarda Caldeira, Jan Niklas Kolf, Andrea Atzori, Fadi Boutros, Naser Damer

General AI

Face Image Quality Assessment is crucial for reliable face recognition systems, yet existing Vision Transformer-based approaches rely exclusively on final-layer representations, ignoring quality-relevant information captured at intermediate network depths. This paper presents the first comprehensive investigation of ho…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.5

Cortex 2.0: Grounding World Models in Real-World Industrial Deployment

2026-04-22 · Adriana Aida, Walida Amer, Katarina Bankovic, Dhruv Behl, Fabian Busch, Annie Bhalla, Minh Duong, Florian Gienger, Rohan Godse, Denis Grachev, Ralf Gulde, Elisa Hagensieker, Junpeng Hu, Shivam Joshi, Tobias Knoblauch, Likith Kumar, Damien LaRocque, Keerthana Lokesh, Omar Moured, Khiem Nguyen, Christian Preyss, Ranjith Sriganesan, Vikram Singh, Carsten Sponner, Anh Tong, Dominik Tuscher, Marc Tuscher, Pavan Upputuri

General AI

Industrial robotic manipulation demands reliable long-horizon execution across embodiments, tasks, and changing object distributions. While Vision-Language-Action models have demonstrated strong generalization, they remain fundamentally reactive. By optimizing the next action given the current observation without evalu…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.5

WorldMark: A Unified Benchmark Suite for Interactive Video World Models

2026-04-23 · Xiaojie Xu, Zhengyuan Lin, Kang He, Yukang Feng, Xiaofeng Mao, Yuanyang Yin, Kaipeng Zhang, Yongtao Ge

General AI

Interactive video generation models such as Genie, YUME, HY-World, and Matrix-Game are advancing rapidly, yet every model is evaluated on its own benchmark with private scenes and trajectories, making fair cross-model comparison impossible. Existing public benchmarks offer useful metrics such as trajectory error, aesth…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.4

Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding

2026-04-29 · Hayate Iso, Tiyasa Mitra, Sudipta Mondal, Rasoul Shafipour, Venmugil Elango, Terry Kong, Yuki Huang, Seonjin Na, Izzy Putterman, Benjamin Chislett, Maor Ashkenazi, Joseph Guman, Gerald Shen, Tugrul Konuk, Ashwath Aithal, Ritika Borkar, Ran Zilberstein, Bita Rouhani

General AI

RL post-training of frontier language models is increasingly bottlenecked by autoregressive rollout generation, making rollout acceleration a central systems challenge. Many existing efficiency methods improve throughput by changing the rollout or optimization regime, for example, through off-policy execution, replay, …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.4

Co-Evolving Policy Distillation

2026-04-29 · Naibin Gu, Chenxu Yang, Qingyi Si, Chuanyu Qin, Dingyu Yao, Peng Fu, Zheng Lin, Weiping Wang, Nan Duan, Jiaqi Wang

General AI

RLVR and OPD have become standard paradigms for post-training. We provide a unified analysis of these two paradigms in consolidating multiple expert capabilities into a single model, identifying capability loss in different ways: mixed RLVR suffers from inter-capability divergence cost, while the pipeline of first trai…

Review
pending
Role
unreviewed
Read
later
huggingface Score 6.4

FASH-iCNN: Making Editorial Fashion Identity Inspectable Through Multimodal CNN Probing

2026-04-29 · Morayo Danielle Adeyemi, Ryan A. Rossi, Franck Dernoncourt

General AI

Fashion AI systems routinely encode the aesthetic logic of specific houses, editors, and historical moments without disclosing it. We present FASH-iCNN, a multimodal system trained on 87,547 Vogue runway images across 15 fashion houses spanning 1991-2024 that makes this cultural logic inspectable. Given a photograph of…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.4

The Last Human-Written Paper: Agent-Native Research Artifacts

2026-04-29 · Jiachen Liu, Jiaxin Pei, Jintao Huang, Chenglei Si, Ao Qu, Xiangru Tang, Runyu Lu, Lichang Chen, Xiaoyan Bai, Haizhong Zheng, Carl Chen, Zhiyang Chen, Haojie Ye, Yujuan Fu, Zexue He, Zijian Jin, Zhenyu Zhang, Shangquan Sun, Maestro Harmon, John Dianzhuo Wang, Jianqiao Zeng, Jiachen Sun, Mingyuan Wu, Baoyu Zhou, Chenyu You, Shijian Lu, Yiming Qiu, Fan Lai, Yuan Yuan, Yao Li, Junyuan Hong, Ruihao Zhu, Beidi Chen, Alex Pentland, Ang Chen, Mosharaf Chowdhury, Zechen Zhang

General AI

Scientific publication compresses a branching, iterative research process into a linear narrative, discarding the majority of what was discovered along the way. This compilation imposes two structural costs: a Storytelling Tax, where failed experiments, rejected hypotheses, and the branching exploration process are dis…

Review
pending
Role
unreviewed
Read
later
huggingface Score 6.4

MASCing: Configurable Mixture-of-Experts Behavior via Activation Steering Masks

2026-04-30 · Jona te Lintelo, Lichao Wu, Marina Krček, Sengim Karayalçin, Stjepan Picek

General AI

Mixture-of-Experts (MoE) architectures in Large Language Models (LLMs) have significantly reduced inference costs through sparse activation. However, this sparse activation paradigm also introduces new safety challenges. Since only a subset of experts is engaged for each input, model behavior becomes coupled to routing…

Review
pending
Role
unreviewed
Read
later
huggingface Score 6.4

Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance

2026-05-01 · Minchan Kwon, Sunghyun Baek, Minseo Kim, Jaemyung Yu, Dongyoon Han, Junmo Kim

General AI

Large Language Model (LLM) Red-Teaming, which proactively identifies vulnerabilities of LLMs, is an essential process for ensuring safety. Finding effective and diverse attacks in red-teaming is important, but achieving both is challenging. Generative Flow Networks (GFNs) that perform distribution matching are a promis…

Review
pending
Role
unreviewed
Read
later
huggingface Score 6.4

The Scaling Properties of Implicit Deductive Reasoning in Transformers

2026-05-05 · Enrico Vompa, Tanel Tammet

General AI

We investigate the scaling properties of implicit deductive reasoning over Horn clauses in depth-bounded Transformers. By systematically decorrelating provability from spurious features and enforcing algorithmic alignment, we find that in sufficiently deep models with a bidirectional prefix mask, implicit reasoning app…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.3

$λ_A$: A Typed Lambda Calculus for LLM Agent Composition

2026-04-13 · Qin Liu

General AI

Existing LLM agent frameworks lack formal semantics: there is no principled way to determine whether an agent configuration is well-formed or will terminate. We present $λ_A$, a typed lambda calculus for agent composition that extends the simply-typed lambda calculus with oracle calls, bounded fixpoints (the ReAct loop…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

Who Handles Orientation? Investigating Invariance in Feature Matching

2026-04-13 · David Nordström, Johan Edstedt, Fredrik Kahl, Georg Bökman

General AI

Finding matching keypoints between images is a core problem in 3D computer vision. However, modern matchers struggle with large in-plane rotations. A straightforward mitigation is to learn rotation invariance via data augmentation. However, it remains unclear at which stage rotation invariance should be incorporated. I…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

Bilevel Late Acceptance Hill Climbing for the Electric Capacitated Vehicle Routing Problem

2026-04-14 · Yinghao Qin, Mosab Bazargani, Edmund K. Burke, Carlos A. Coello Coello, Zhongmin Song, Jun Chen

General AI

This paper tackles the Electric Capacitated Vehicle Routing Problem (E-CVRP) through a bilevel optimization framework that handles routing and charging decisions separately or jointly depending on the search stage. By analyzing their interaction, we introduce a surrogate objective at the upper level to guide the search…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

SEATrack: Simple, Efficient, and Adaptive Multimodal Tracker

2026-04-14 · Junbin Su, Ziteng Xue, Shihui Zhang, Kun Chen, Weiming Hu, Zhipeng Zhang

General AI

Parameter-efficient fine-tuning (PEFT) in multimodal tracking reveals a concerning trend where recent performance gains are often achieved at the cost of inflated parameter budgets, which fundamentally erodes PEFT's efficiency promise. In this work, we introduce SEATrack, a Simple, Efficient, and Adaptive two-stream mu…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

Abstract Sim2Real through Approximate Information States

2026-04-16 · Yunfu Deng, Yuhao Li, Josiah P. Hanna

General AI

In recent years, reinforcement learning (RL) has shown remarkable success in robotics when a fast and accurate simulator is available for a given task. When using RL and simulation, more simulator realism is generally beneficial but becomes harder to obtain as robots are deployed in increasingly complex and widescale d…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

ASMR-Bench: Auditing for Sabotage in ML Research

2026-04-17 · Eric Gan, Aryan Bhatt, Buck Shlegeris, Julian Stastny, Vivek Hebbar

General AI

As AI systems are increasingly used to conduct research autonomously, misaligned systems could introduce subtle flaws that produce misleading results while evading detection. We introduce ASMR-Bench (Auditing for Sabotage in ML Research), a benchmark for evaluating the ability of auditors to detect sabotage in ML resea…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

CIG: Measuring Conversational Information Gain in Deliberative Dialogues with Semantic Memory Dynamics

2026-04-17 · Ming-Bin Chen, Jey Han Lau, Lea Frermann

General AI

Measuring the quality of public deliberation requires evaluating not only civility or argument structure, but also the informational progress of a conversation. We introduce a framework for Conversational Information Gain (CIG) that evaluates each utterance in terms of how it advances collective understanding of the ta…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

Cross-Modal Bayesian Low-Rank Adaptation for Uncertainty-Aware Multimodal Learning

2026-04-17 · Habibeh Naderi, Behrouz Haji Soleimani, Stan Matwin

General AI

Large pre-trained language models are increasingly adapted to downstream tasks using parameter-efficient fine-tuning (PEFT), but existing PEFT methods are typically deterministic and unimodal, making them poorly suited for low-resource multimodal settings where predictive uncertainty and cross-modal reliability both ma…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.3

Investigating Conversational Agents to Support Secondary School Students Learning CSP

2026-04-17 · Matthew Frazier, Kostadin Damevski, Lori Pollock

General AI

Secondary school students enrolled in the AP Computer Science Principles (CSP) course commonly utilize web resources (e.g., tutorials, Q\&A sites) to better understand key concepts in the curriculum. The primary obstacle to using these resources is finding information appropriate for the learning task and student's bac…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

T-REN: Learning Text-Aligned Region Tokens Improves Dense Vision-Language Alignment and Scalability

2026-04-20 · Savya Khosla, Sethuraman T, Aryan Chadha, Alex Schwing, Derek Hoiem

General AI

Despite recent progress, vision-language encoders struggle with two core limitations: (1) weak alignment between language and dense vision features, which hurts tasks like open-vocabulary semantic segmentation; and (2) high token counts for fine-grained visual representations, which limits scalability to long videos. T…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

InvestChat: Exploring Multimodal Interaction via Natural Language, Touch, and Pen in an Investment Dashboard

2026-04-21 · Sarah Lykke Tost, Adson Lucas de Paiva Sales, Henrik Østergaard, Vaishali Dhanoa, Gabriela Molina León

General AI

We designed and implemented InvestChat, a multimodal tablet-based application that supports stock market exploration with multiple coordinated views and an LLM-powered chat. We evaluated the application with 12 novice investors. Our findings suggest that combining natural language, touch, and pen input during stock mar…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items

2026-04-21 · Mengting Chen, Zhengrui Chen, Yongchao Du, Zuan Gao, Taihang Hu, Jinsong Lan, Chao Lin, Yefeng Shen, Xingjian Wang, Zhao Wang, Zhengtao Wu, Xiaoli Xu, Zhengze Xu, Hao Yan, Mingzhou Zhang, Jun Zheng, Qinye Zhou, Xiaoyong Zhu, Bo Zheng

General AI

Recent advances in image generation and editing have opened new opportunities for virtual try-on. However, existing methods still struggle to meet complex real-world demands. We present Tstars-Tryon 1.0, a commercial-scale virtual try-on system that is robust, realistic, versatile, and highly efficient. First, our syst…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

VLA Foundry: A Unified Framework for Training Vision-Language-Action Models

2026-04-21 · Jean Mercat, Sedrick Keh, Kushal Arora, Isabella Huang, Paarth Shah, Haruki Nishimura, Shun Iwase, Katherine Liu

General AI

We present VLA Foundry, an open-source framework that unifies LLM, VLM, and VLA training in a single codebase. Most open-source VLA efforts specialize on the action training stage, often stitching together incompatible pretraining pipelines. VLA Foundry instead provides a shared training stack with end-to-end control, …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

FedSIR: Spectral Client Identification and Relabeling for Federated Learning with Noisy Labels

2026-04-22 · Sina Gholami, Abdulmoneam Ali, Tania Haghighi, Ahmed Arafa, Minhaj Nur Alam

General AI

Federated learning (FL) enables collaborative model training without sharing raw data; however, the presence of noisy labels across distributed clients can severely degrade the learning performance. In this paper, we propose FedSIR, a multi-stage framework for robust FL under noisy labels. Different from existing appro…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

2026-04-23 · Zihan Wang, Rui Zhang, Yu Liu, Chi Liu, Qingchuan Zhao, Hongwei Li, Guowen Xu

General AI

LLM agents increasingly rely on skills to encapsulate reusable capabilities via progressively disclosed instructions. High-quality skills inject expert knowledge into general-purpose models, improving performance on specialized tasks. This quality and ease of dissemination drive the emergence of a skill economy: free s…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

MathDuels: Evaluating LLMs as Problem Posers and Solvers

2026-04-23 · Zhiqiu Xu, Shibo Jin, Shreya Arya, Mayur Naik

General AI

As frontier language models attain near-ceiling performance on static mathematical benchmarks, existing evaluations are increasingly unable to differentiate model capabilities, largely because they cast models solely as solvers of fixed problem sets. We introduce MathDuels, a self-play benchmark in which models occupy …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

Code for All: Educational Applications of the "Vibe Coding" Hackathon in Programming Education across All Skill Levels

2026-04-24 · Ashley J. Chen, Yijia Cao, Minghao Shao, Ramesh Karri, Muhammad Shafique

General AI

The emergence of large language models has enabled vibe coding, a natural language approach to programming in which users describe intent and AI generates or revises code, potentially broadening access to programming while preserving meaningful learning outcomes. We investigate its educational value through a month-lon…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

CosmicDancePro -- Measuring LEO satellite's orbital decay and network connectivity implications during solar storms

2026-04-24 · Suvam Basak, Amitangshu Pal, Debopam Bhattacherjee

General AI

The May 2024 solar superstorm highlighted the vulnerability of rapidly expanding low Earth orbit (LEO) satellite networks to severe space weather events. To systematically evaluate LEO network resilience, we introduce an open-source tool, CosmicDancePro. It enables a comprehensive analysis of the effects of solar storm…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

FAIR_XAI: Improving Multimodal Foundation Model Fairness via Explainability for Wellbeing Assessment

2026-04-26 · Sophie Chiang, Tom Brennan, Fethiye Irmak Dogan, Jiaee Cheong, Hatice Gunes

General AI

In recent years, the integration of multimodal machine learning in wellbeing assessment has offered transformative potential for monitoring mental health. However, with the rapid advancement of Vision-Language Models (VLMs), their deployment in clinical settings has raised concerns due to their lack of transparency and…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

Aligned Multi-View Scripts for Universal Chart-to-Code Generation

2026-04-27 · Zhihan Zhang, Lizi Liao

General AI

Chart-to-code generation converts a chart image into an executable plotting script, enabling faithful reproduction and editable visualizations. Existing methods are largely Python-centric, limiting practical use and overlooking a critical source of supervision: the same chart can be expressed by semantically equivalent…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

MIMIC: A Generative Multimodal Foundation Model for Biomolecules

2026-04-27 · Siavash Golkar, Jake Kovalic, Irina Espejo Morales, Samuel Sledzieski, Minhuan Li, Ksenia Sokolova, Geraud Krawezik, Alberto Bietti, Claudia Skok Gibbs, Roman Klypa, Shengwei Xiong, Francois Lanusse, Liam Parker, Kyunghyun Cho, Miles Cranmer, Tom Hehir, Michael McCabe, Lucas Meyer, Rudy Morel, Payel Mukhopadhyay, Mariel Pettee, Helen Qu, Jeff Shen, David Fouhey, Hadi Sotoudeh, Vikram Mulligan, Pilar Cossio, Sonya M. Hanson, Alisha N. Jones, Olga G. Troyanskaya, Shirley Ho

General AI

Biological function emerges from coupled constraints across sequence, structure, regulation, evolution, and cellular context, yet most foundation models in biology are trained within one modality or for a fixed forward task. We present MIMIC, a generative multimodal foundation model trained on our newly curated and ali…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer

2026-04-27 · Boyang Wang, Guangyi Xu, Zhipeng Tang, Jiahui Zhang, Zezhou Cheng

General AI

Shot Boundary Detection (SBD) aims to automatically identify shot changes and divide a video into coherent shots. While SBD was widely studied in the literature, existing state-of-the-art methods often produce non-interpretable boundaries on transitions, miss subtle yet harmful discontinuities, and rely on noisy, low-d…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

World-R1: Reinforcing 3D Constraints for Text-to-Video Generation

2026-04-27 · Weijie Wang, Xiaoxuan He, Youping Gu, Yifan Yang, Zeyu Zhang, Yefei He, Yanbo Ding, Xirui Hu, Donny Y. Chen, Zhiyuan He, Yuqing Yang, Bohan Zhuang

General AI

Recent video foundation models demonstrate impressive visual synthesis but frequently suffer from geometric inconsistencies. While existing methods attempt to inject 3D priors via architectural modifications, they often incur high computational costs and limit scalability. We propose World-R1, a framework that aligns v…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.2

Artistic Practice Opportunities in CST Evaluations: A Longitudinal Group Deployment of ArtKrit

2026-04-29 · Catherine Liu, Tao Long, Asya Vaisberg, Chau Vu, Jiaju Ma, Jingyi Li

General AI

Creativity support tools (CSTs) aim to elevate the quality of artists' creative processes and artifacts. Yet most current CST evaluations overlook temporal and social aspects of tool use. To address this gap, we present a longitudinal, group-based CST evaluation through a three-week deployment of ArtKrit, a computation…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.2

Causal Learning with Neural Assemblies

2026-04-29 · Evangelia Kopadi, Dimitris Kalles

General AI

Can Neural Assemblies -- groups of neurons that fire together and strengthen through co-activation -- learn the direction of causal influence between variables? While established as a computationally general substrate for classification, parsing, and planning, neural assemblies have not yet been shown to internalize ca…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.2

STARRY: Spatial-Temporal Action-Centric World Modeling for Robotic Manipulation

2026-04-29 · Yuxuan Tian, Yurun Jin, Bin Yu, Yukun Shi, Hao Wu, Chi Harold Liu, Kai Chen, Cong Huang

General AI

Robotic manipulation critically requires reasoning about future spatial-temporal interactions, yet existing VLA policies and world-model-enhanced policies do not fully model action-relevant spatial-temporal interaction structure. We propose STARRY, a world-model-enhanced action-generation policy that aligns spatial-tem…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.2

Uncertainty-Aware Pedestrian Attribute Recognition via Evidential Deep Learning

2026-04-29 · Zhuofan Lou, Shihang Zhang, Fangle Zhu, Shengjie Ye, Pingyu Wang

General AI

We propose UAPAR, an Uncertainty-Aware Pedestrian Attribute Recognition framework. To the best of our knowledge, this is the first EDL-based uncertainty-aware framework for pedestrian attribute recognition (PAR). Unlike conventional deterministic methods, which fail to assess prediction reliability on low-quality sampl…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.2

An Empirical Evaluation of Code Smell Detection in Angular Applications

2026-04-30 · Maykon Nunes, Emanuel Coutinho, Carla Bezerra, Ivan Machado

General AI

Angular is one of the most widely adopted frameworks for developing large-scale, dynamic web applications. As projects increase in scope and complexity, developers face growing challenges in managing architecture and maintaining clean, modular code. These challenges often lead to design flaws, commonly referred to as c…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.2

Crab: A Semantics-Aware Checkpoint/Restore Runtime for Agent Sandboxes

2026-04-30 · Tianyuan Wu, Chaokun Chang, Lunxi Cao, Wei Gao, Wei Wang

General AI

Autonomous agents act through sandboxed containers and microVMs whose state spans filesystems, processes, and runtime artifacts. Checkpoint and restore (C/R) of this state is needed for fault tolerance, spot execution, RL rollout branching, and safe rollback-yet existing approaches fall into two extremes: application-l…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.2

MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons

2026-04-30 · Kehong Gong, Zhengyu Wen, Dao Thien Phong, Mingxi Xu, Weixia He, Qi Wang, Ning Zhang, Zhengyu Li, Guanli Hou, Dongze Lian, Xiaoyu He, Mingyuan Zhang, Hanwang Zhang

General AI

Recent methods for arbitrary-skeleton motion capture from monocular video follow a factorized pipeline, where a Video-to-Pose network predicts joint positions and an analytical inverse-kinematics (IK) stage recovers joint rotations. While effective, this design is inherently limited, since joint positions do not fully …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.2

OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction

2026-04-30 · Junyoung Lee, Sookwan Han, Jeonghwan Kim, Inhee Lee, Mingi Choi, Jisoo Kim, Wonjung Woo, Hanbyul Joo

General AI

Human-robot collaboration has been studied primarily in dyadic or sequential settings. However, real homes require multiadic collaboration, where multiple humans and robots share a workspace, acting concurrently on interleaved subtasks with tight spatial and temporal coupling. This regime remains underexplored because …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.2

Deep Kernel Learning for Stratifying Glaucoma Trajectories

2026-05-01 · Bruce Rushing, Angela Danquah, Alireza Namazi, Arjun Dirghangi, Heman Shakeri

General AI

Effectively stratifying patient risk in chronic diseases like glaucoma is a major clinical challenge. Clinicians need tools to identify patients at high risk of progression from sparse and irregularly-sampled electronic health records (EHRs). We propose a novel deep kernel learning (DKL) architecture that leverages a G…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.2

NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search

2026-05-01 · Sizhe Tang, Zuyuan Zhang, Mahdi Imani, Tian Lan

General AI

Monte Carlo Tree Search (MCTS) scales poorly in cooperative multi-agent domains because expansion must consider an exponentially large set of joint actions, severely limiting exploration under realistic search budgets. We propose NonZero, which keeps multi-agent MCTS tractable by running surrogate-guided selection over…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.2

PhysEdit: Physically-Consistent Region-Aware Image Editing via Adaptive Spatio-Temporal Reasoning

2026-05-01 · Guandong Li, Mengxia Ye

General AI

Image editing instructions are heterogeneous: a color swap, an object insertion, and a physical-action edit all demand different spatial coverage and different reasoning depth, yet existing reasoning-based editors apply a single fixed inference recipe to every instruction. We argue that adaptivity along both the spatia…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.2

SignVerse-2M: A Two-Million-Clip Pose-Native Universe of 25+ Sign Languages

2026-05-03 · Sen Fang, Hongbin Zhong, Yanxin Zhang, Dimitris N. Metaxas

General AI

Existing large-scale sign language resources typically provide supervision only at the level of raw video-text alignment and are often produced in laboratory settings. While such resources are important for semantic understanding, they do not directly provide a unified interface for open-world recognition and translati…

Review
pending
Role
unreviewed
Read
later
huggingface Score 6.0

AVControl: Efficient Framework for Training Audio-Visual Controls

2026-03-25 · Matan Ben-Yosef, Tavi Halperin, Naomi Ken Korem, Mohammad Salama, Harel Cain, Asaf Joseph, Anthony Chen, Urska Jelercic, Ofir Bibi

General AI

Controlling video and audio generation requires diverse modalities, from depth and pose to camera trajectories and audio transformations, yet existing approaches either train a single monolithic model for a fixed set of controls or introduce costly architectural changes for each new modality. We introduce AVControl, a …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.0

VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

2026-03-25 · Qijia He, Xunmei Liu, Hammaad Memon, Ziang Li, Zixian Ma, Jaemin Cho, Jason Ren, Daniel S Weld, Ranjay Krishna

General AI

Scalable Vector Graphics (SVG) are an essential format for technical illustration and digital design, offering precise resolution independence and flexible semantic editability. In practice, however, original vector source files are frequently lost or inaccessible, leaving only "flat" rasterized versions (e.g., PNG or …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.0

Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development

2026-03-29 · Zhongying Deng, Cheng Tang, Ziyan Huang, Jiashi Lin, Ying Chen, Junzhi Ning, Chenglong Ma, Jiyao Liu, Wei Li, Yinghao Zhu, Shujian Gao, Yanyan Huang, Sibo Ju, Yanzhou Su, Pengcheng Chen, Wenhao Tang, Tianbin Li, Haoyu Wang, Yuanfeng Ji, Hui Sun, Shaobo Min, Liang Peng, Feilong Tang, Haochen Xue, Rulin Zhou, Chaoyang Zhang, Wenjie Li, Shaohao Rui, Weijie Ma, Xingyue Zhao, Yibin Wang, Kun Yuan, Zhaohui Lu, Shujun Wang, Jinjie Wei, Lihao Liu, Dingkang Yang, Lin Wang, Yulong Li, Haolin Yang, Yiqing Shen, Lequan Yu, Xiaowei Hu, Yun Gu, Yicheng Wu, Benyou Wang, Minghui Zhang, Angelica I. Aviles-Rivero, Qi Gao, Hongming Shan, Xiaoyu Ren, Fang Yan, Hongyu Zhou, Haodong Duan, Maosong Cao, Shanshan Wang, Bin Fu, Xiaomeng Li, Zhi Hou, Chunfeng Song, Lei Bai, Yuan Cheng, Yuandong Pu, Xiang Li, Wenhai Wang, Hao Chen, Jiaxin Zhuang, Songyang Zhang, Huiguang He, Mengzhang Li, Bohan Zhuang, Zhian Bai, Rongshan Yu, Liansheng Wang, Yukun Zhou, Xiaosong Wang, Xin Guo, Guanbin Li, Xiangru Lin, Dakai Jin, Mianxin Liu, Wenlong Zhang, Qi Qin, Conghui He, Yuqiang Li, Ye Luo, Nanqing Dong, Jie Xu, Wenqi Shao, Bo Zhang, Qiujuan Yan, Yihao Liu, Jun Ma, Zhi Lu, Yuewen Cao, Zongwei Zhou, Jianming Liang, Shixiang Tang, Qi Duan, Dongzhan Zhou, Chen Jiang, Yuyin Zhou, Yanwu Xu, Jiancheng Yang, Shaoting Zhang, Xiaohong Liu, Siqi Luo, Yi Xin, Chaoyu Liu, Haochen Wen, Xin Chen, Alejandro Lozano, Min Woo Sun, Yuhui Zhang, Yue Yao, Xiaoxiao Sun, Serena Yeung-Levy, Xia Li, Jing Ke, Chunhui Zhang, Zongyuan Ge, Ming Hu, Jin Ye, Zhifeng Li, Yirong Chen, Yu Qiao, Junjun He

Research Track A

Foundation models have demonstrated remarkable success across diverse domains and tasks, primarily due to the thrive of large-scale, diverse, and high-quality datasets. However, in the field of medical imaging, the curation and assembling of such medical datasets are highly challenging due to the reliance on clinical e…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.0

GEditBench v2: A Human-Aligned Benchmark for General Image Editing

2026-03-30 · Zhangqi Jiang, Zheng Sun, Xianfang Zeng, Yufeng Yang, Xuanyang Zhang, Yongliang Wu, Wei Cheng, Gang Yu, Xu Yang, Bihan Wen

General AI

Recent advances in image editing have enabled models to handle complex instructions with impressive realism. However, existing evaluation frameworks lag behind: current benchmarks suffer from narrow task coverage, while standard metrics fail to adequately capture visual consistency, i.e., the preservation of identity, …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.0

NearID: Identity Representation Learning via Near-identity Distractors

2026-04-02 · Aleksandar Cvejic, Rameen Abdal, Abdelrahman Eldesokey, Bernard Ghanem, Peter Wonka

General AI

When evaluating identity-focused tasks such as personalized generation and image editing, existing vision encoders entangle object identity with background context, leading to unreliable representations and metrics. We introduce the first principled framework to address this vulnerability using Near-identity (NearID) d…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.0

ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation

2026-04-05 · Hui Sun, Yun-Ji Zhang, Zheng Xie, Ren-Biao Liu, Yali Du, Xin-Ye Li, Ming Li

General AI

Selecting LLM-generated code candidates using LLM-generated tests is challenging because the tests themselves may be incorrect. Existing methods either treat all tests equally or rely on ad-hoc heuristics to filter unreliable tests. Yet determining test correctness requires knowing which codes are correct, creating a c…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.0

Demystifying When Pruning Works via Representation Hierarchies

2026-04-06 · Shwai He, Guoheng Sun, Haichao Zhang, Yun Fu, Ang Li

General AI

Network pruning, which removes less important parameters or architectures, is often expected to improve efficiency while preserving performance. However, this expectation does not consistently hold across language tasks: pruned models can perform well on non-generative tasks but frequently fail in generative settings. …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.0

SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing

2026-04-06 · Yicheng Xiao, Wenhu Zhang, Lin Song, Yukang Chen, Wenbo Li, Nan Jiang, Tianhe Ren, Haokun Lin, Wei Huang, Haoyang Huang, Xiu Li, Nan Duan, Xiaojuan Qi

General AI

Image spatial editing performs geometry-driven transformations, allowing precise control over object layout and camera viewpoints. Current models are insufficient for fine-grained spatial manipulations, motivating a dedicated assessment suite. Our contributions are listed: (i) We introduce SpatialEdit-Bench, a complete…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.0

IAM: Identity-Aware Human Motion and Shape Joint Generation

2026-04-28 · Wenqi Jia, Zekun Li, Abhay Mittal, Chengcheng Tang, Chuan Guo, Lezi Wang, James Matthew Rehg, Lingling Tao, Size An

General AI

Recent advances in text-driven human motion generation enable models to synthesize realistic motion sequences from natural language descriptions. However, most existing approaches assume identity-neutral motion and generate movements using a canonical body representation, ignoring the strong influence of body morpholog…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.0

Toward Scalable Terminal Task Synthesis via Skill Graphs

2026-04-28 · Zhiyuan Fan, Tinghao Yu, Yuanjun Cai, Jiangtao Guan, Yun Yang, Dingxin Hu, Jiang Zhou, Xing Wu, Zhuo Han, Feng Zhang, Lilin Wang

General AI

Terminal agents have demonstrated strong potential for autonomous command-line execution, yet their training remains constrained by the scarcity of high-quality and diverse execution trajectories. Existing approaches mitigate this bottleneck by synthesizing large-scale terminal task instances for trajectory sampling. H…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.0

When to Trust Imagination: Adaptive Action Execution for World Action Models

2026-05-07 · Rui Wang, Yue Zhang, Jiehong Lin, Kuncheng Luo, Jianan Wang, Zhongrui Wang, Xiaojuan Qi

General AI

World Action Models (WAMs) have recently emerged as a promising paradigm for robotic manipulation by jointly predicting future visual observations and future actions. However, current WAMs typically execute a fixed number of predicted actions after each model inference, leaving the robot blind to whether the imagined f…

Review
pending
Role
unreviewed
Read
later
huggingface Score 6.0

Implicit Preference Alignment for Human Image Animation

2026-05-08 · Yuanzhi Wang, Xuhua Ren, Jiaxiang Cheng, Bing Ma, Kai Yu, Tianxiang Zheng, Qinglin Lu, Zhen Cui

General AI

Human image animation has witnessed significant advancements, yet generating high-fidelity hand motions remains a persistent challenge due to their high degrees of freedom and motion complexity. While reinforcement learning from human feedback, particularly direct preference optimization, offers a potential solution, i…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.8

Parameter-Efficient Fine-Tuning for Medical Text Summarization: A Comparative Study of Lora, Prompt Tuning, and Full Fine-Tuning

2026-03-23 · Ulugbek Shernazarov, Rostislav Svitsov, Bin Shi

General AI

Fine-tuning large language models for domain-specific tasks such as medical text summarization demands substantial computational resources. Parameter-efficient fine-tuning (PEFT) methods offer promising alternatives by updating only a small fraction of parameters. This paper compares three adaptation approaches-Low-Ran…

Review
pending
Role
unreviewed
Read
now
arxiv Score 5.8

Rethinking Multimodal Fusion for Time Series: Auxiliary Modalities Need Constrained Fusion

2026-03-23 · Seunghan Lee, Jun Seo, Jaehoon Lee, Sungdong Yoo, Minjae Kim, Tae Yoon Lim, Dongwan Kang, Hwanil Choi, SoonYoung Lee, Wonbin Ahn

General AI

Recent advances in multimodal learning have motivated the integration of auxiliary modalities such as text or vision into time series (TS) forecasting. However, most existing methods provide limited gains, often improving performance only in specific datasets or relying on architecture-specific designs that limit gener…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.8

A Mentalistic Interface for Probing Folk-Psychological Attribution to Non-Humanoid Robots

2026-03-26 · Giulio Pisaneschi, Pierpaolo Serio, Estelle Gerbier, Andrea Dan Ryals, Lorenzo Pollini, Mario G. C. A. Cimino

General AI

This paper presents an experimental platform for studying intentional-state attribution toward a non-humanoid robot. The system combines a simulated robot, realistic task environments, and large language model-based explanatory layers that can express the same behavior in mentalistic, teleological, or mechanistic terms…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.8

Measuring What Matters -- or What's Convenient?: Robustness of LLM-Based Scoring Systems to Construct-Irrelevant Factors

2026-03-26 · Cole Walsh, Rodica Ivan

General AI

Automated systems have been widely adopted across the educational testing industry for open-response assessment and essay scoring. These systems commonly achieve performance levels comparable to or superior than trained human raters, but have frequently been demonstrated to be vulnerable to the influence of construct-i…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.8

Constitutive parameterized deep energy method for solid mechanics problems with random material parameters

2026-03-27 · Zhangyong Liang, Huanhuan Gao

General AI

In practical structural design and solid mechanics simulations, material properties inherently exhibit random variations within bounded intervals. However, evaluating mechanical responses under continuous material uncertainty remains a persistent challenge. Traditional numerical approaches, such as the Finite Element M…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.8

Preference-Aligned LoRA Merging: Preserving Subspace Coverage and Addressing Directional Anisotropy

2026-03-27 · Wooseong Jeong, Wonyoung Lee, Kuk-Jin Yoon

General AI

Merging multiple Low-Rank Adaptation (LoRA) modules is promising for constructing general-purpose systems, yet challenging because LoRA update directions span different subspaces and contribute unevenly. When merged naively, such mismatches can weaken the directions most critical to certain task losses while overemphas…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.8

Can Hierarchical Cross-Modal Fusion Predict Human Perception of AI Dubbed Content?

2026-03-30 · Ashwini Dasare, Nirmesh Shah, Ashishkumar Gudmalwar, Pankaj Wasnik

General AI

Evaluating AI generated dubbed content is inherently multi-dimensional, shaped by synchronization, intelligibility, speaker consistency, emotional alignment, and semantic context. Human Mean Opinion Scores (MOS) remain the gold standard but are costly and impractical at scale. We present a hierarchical multimodal archi…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.8

DinoDental: Benchmarking DINOv3 as a Unified Vision Encoder for Dental Image Analysis

2026-03-30 · Kun Tang, Xinquan Yang, Mianjie Zheng, Xuefen Liu, Xuguang Li, Xiaoqi Guo, Ruihan Chen, Linlin Shen, He Meng

General AI

The scarcity and high cost of expert annotations in dental imaging present a significant challenge for the development of AI in dentistry. DINOv3, a state-of-the-art, self-supervised vision foundation model pre-trained on 1.7 billion images, offers a promising pathway to mitigate this issue. However, its reliability wh…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.8

VAANI: Capturing the language landscape for an inclusive digital India

2026-03-30 · Sujith Pulikodan, Abhayjeet Singh, Agneedh Basu, Lokesh Rady, Nihar Desai, Pavan Kumar J, Prajjwal Srivastav, Pranav D Bhat, Raghu Dharmaraju, Ritika Gupta, Sathvik Udupa, Saurabh Kumar, Sumit Sharma, Vaibhav Vishwakarma, Visruth Sanka, Dinesh Tewari, Harsh Dhand, Amrita Kamat, Sukhwinder Singh, Shikhar Vashishth, Partha Talukdar, Raj Acharya, Prasanta Kumar Ghosh

General AI

Project VAANI is an initiative to create an India-representative multi-modal dataset that comprehensively maps India's linguistic diversity, starting with 165 districts across the country in its first two phases. Speech data is collected through a carefully structured process that uses image-based prompts to encourage …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.8

Benchmarking PhD-Level Coding in 3D Geometric Computer Vision

2026-03-31 · Wenyi Li, Renkai Luo, Yue Yu, Huan-ang Gao, Mingju Gao, Li Yuan, Chaoyou Fu, Hao Zhao

General AI

AI-assisted coding has rapidly reshaped software practice and research workflows, yet today's models still struggle to produce correct code for complex 3D geometric vision. If models could reliably write such code, the research of our community would change substantially. To measure progress toward that goal, we introd…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.8

Enhancing Structural Mapping with LLM-derived Abstractions for Analogical Reasoning in Narratives

2026-03-31 · Mohammadhossein Khojasteh, Yifan Jiang, Stefano De Giorgis, Frank van Harmelen, Filip Ilievski

General AI

Analogical reasoning is a key driver of human generalization in problem-solving and argumentation. Yet, analogies between narrative structures remain challenging for machines. Cognitive engines for structural mapping are not directly applicable, as they assume pre-extracted entities, whereas LLMs' performance is sensit…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.8

ReFormeR: Learning and Applying Explicit Query Reformulation Patterns

2026-04-01 · Amin Bigdeli, Mert Incesu, Negar Arabzadeh, Charles L. A. Clarke, Ebrahim Bagheri

General AI

We present ReFormeR, a pattern-guided approach for query reformulation. Instead of prompting a language model to generate reformulations of a query directly, ReFormeR first elicits short reformulation patterns from pairs of initial queries and empirically stronger reformulations, consolidates them into a compact librar…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.8

Revisiting Human-in-the-Loop Object Retrieval with Pre-Trained Vision Transformers

2026-04-01 · Kawtar Zaher, Olivier Buisson, Alexis Joly

General AI

Building on existing approaches, we revisit Human-in-the-Loop Object Retrieval, a task that consists of iteratively retrieving images containing objects of a class-of-interest, specified by a user-provided query. Starting from a large unlabeled image collection, the aim is to rapidly identify diverse instances of an ob…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.8

Transformer self-attention encoder-decoder with multimodal deep learning for response time series forecasting and digital twin support in wind structural health monitoring

2026-04-02 · Feiyu Zhou, Marios Impraimakis

General AI

The wind-induced structural response forecasting capabilities of a novel transformer methodology are examined here. The model also provides a digital twin component for bridge structural health monitoring. Firstly, the approach uses the temporal characteristics of the system to train a forecasting model. Secondly, the …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.8

LoMa: Local Feature Matching Revisited

2026-04-06 · David Nordström, Johan Edstedt, Georg Bökman, Jonathan Astermark, Anders Heyden, Viktor Larsson, Mårten Wadenbäck, Michael Felsberg, Fredrik Kahl

General AI

Local feature matching has long been a fundamental component of 3D vision systems such as Structure-from-Motion (SfM), yet progress has lagged behind the rapid advances of modern data-driven approaches. The newer approaches, such as feed-forward reconstruction models, have benefited extensively from scaling dataset siz…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.8

CoStream: Codec-Guided Resource-Efficient System for Video Streaming Analytics

2026-04-07 · Yulin Zou, Yan Chen, Wenyan Chen, JooYoung Park, Shivaraman Nitin, Luo Tao, Francisco Romero, Dmitrii Ustiugov

General AI

Video streaming analytics is a crucial workload for vision-language model serving, but the high cost of multimodal inference limits scalability. Prior systems reduce inference cost by exploiting temporal and spatial redundancy in video streams, but they target either the vision transformer (ViT) or the LLM with a limit…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.8

Graph-PiT: Enhancing Structural Coherence in Part-Based Image Synthesis via Graph Priors

2026-04-07 · Junbin Zhang, Meng Cao, Feng Tan, Yikai Lin, Yuexian Zou

General AI

Achieving fine-grained and structurally sound controllability is a cornerstone of advanced visual generation. Existing part-based frameworks treat user-provided parts as an unordered set and therefore ignore their intrinsic spatial and semantic relationships, which often results in compositions that lack structural int…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.8

TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models

2026-04-07 · Lin Mu, Haiyang Wang, Li Ni, Lei Sang, Zhize Wu, Peiquan Jin, Yiwen Zhang

General AI

Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning of Large Language Models (LLMs), and recent Mixture-of-Experts (MoE) extensions further enhance flexibility by dynamically combining multiple LoRA experts. However, existing MoE-augmented LoRA methods assume that experts operate independently, often lea…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.8

SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds

2026-04-09 · Yunsong Zhou, Hangxu Liu, Xuekun Jiang, Xing Shen, Yuanzhen Zhou, Hui Wang, Baole Fang, Yang Tian, Mulin Yu, Qiaojun Yu, Li Ma, Hengjie Li, Hanqing Wang, Jia Zeng, Jiangmiao Pang

General AI

Robotic manipulation with deformable objects represents a data-intensive regime in embodied learning, where shape, contact, and topology co-evolve in ways that far exceed the variability of rigids. Although simulation promises relief from the cost of real-world data acquisition, prevailing sim-to-real pipelines remain …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.8

From Syntax to Emotion: A Mechanistic Analysis of Emotion Inference in LLMs

2026-04-28 · Bangzhao Shu, Arinjay Singh, Mai ElSherief

General AI

Large language models (LLMs) are increasingly used in emotionally sensitive human-AI applications, yet little is known about how emotion recognition is internally represented. In this work, we investigate the internal mechanisms of emotion recognition in LLMs using sparse autoencoders (SAEs). By analyzing sparse featur…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 5.5

How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data

2026-03-23 · Zixian Huang, Kaichen Yang, Xu Huang, Feiyang Hao, Qiming Ge, Bowen Li, He Du, Kai Chen, Qipeng Guo

General AI

A widely adopted strategy for model enhancement is to use synthetic data generated by a stronger model for supervised fine-tuning (SFT). However, for emerging reasoning models like Qwen3-8B, this approach often fails to improve reasoning capabilities and can even lead to a substantial drop in performance. In this work,…

Review
pending
Role
unreviewed
Read
later
huggingface Score 5.5

Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling

2026-04-10 · Ximing Xing, Ziteng Xue, Zhenxi Li, Weicong Liang, Linqing Wang, Zhantao Yang, Tiankai Hang, Zijin Yin, Qinglin Lu, Chunyu Wang, Qian Yu

General AI

Recent large language models have shifted SVG generation from differentiable rendering optimization to autoregressive program synthesis. However, existing approaches still rely on generic byte-level tokenization inherited from natural language processing, which poorly reflects the geometric structure of vector graphics…

Review
pending
Role
unreviewed
Read
later
huggingface Score 5.5

ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack

2026-04-14 · Yein Park, Jungwoo Park, Jaewoo Kang

General AI

Large language models (LLMs), despite being safety-aligned, exhibit brittle refusal behaviors that can be circumvented by simple linguistic changes. As tense jailbreaking demonstrates that models refusing harmful requests often comply when rephrased in past tense, a critical generalization gap is revealed in current al…

Review
pending
Role
unreviewed
Read
later
huggingface Score 5.5

Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems

2026-04-14 · Jiacheng Liu, Xiaohan Zhao, Xinyi Shang, Zhiqiang Shen

General AI

Claude Code is an agentic coding tool that can run shell commands, edit files, and call external services on behalf of the user. This study describes its comprehensive architecture by analyzing the publicly available TypeScript source code and further comparing it with OpenClaw, an independent open-source AI agent syst…

Review
pending
Role
unreviewed
Read
later
huggingface Score 5.5

Model Capability Dominates: Inference-Time Optimization Lessons from AIMO 3

2026-04-16 · Natapong Nitarach

General AI

Majority voting over multiple LLM attempts improves mathematical reasoning, but correlated errors limit the effective sample size. A natural fix is to assign different reasoning strategies to different voters. The approach, Diverse Prompt Mixer, is tested on the AIMO 3 competition: 3 models, 23+ experiments, 50 IMO-lev…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 5.5

The Cognitive Penalty: Ablating System 1 and System 2 Reasoning in Edge-Native SLMs for Decentralized Consensus

2026-04-18 · Syed Muhammad Aqdas Rizvi

General AI

Decentralized Autonomous Organizations (DAOs) are inclined explore Small Language Models (SLMs) as edge-native constitutional firewalls to vet proposals and mitigate semantic social engineering. While scaling inference-time compute (System 2) enhances formal logic, its efficacy in highly adversarial, cryptoeconomic gov…

Review
pending
Role
unreviewed
Read
later
huggingface Score 5.5

Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration

2026-04-20 · Qifan Zhang, Dongyang Ma, Tianqing Fang, Jia Li, Jing Tang, Nuo Chen, Haitao Mi, Yan Wang

General AI

Most agents today ``self-evolve'' by following rewards and rules defined by humans. However, this process remains fundamentally dependent on external supervision; without human guidance, the evolution stops. In this work, we train agents to possess an intrinsic meta-evolution capability to spontaneously learn about uns…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 5.5

TEMPO: Scaling Test-time Training for Large Reasoning Models

2026-04-21 · Qingyang Zhang, Xinke Kong, Haitao Wu, Qinghua Hu, Minghao Wu, Baosong Yang, Yu Cheng, Yun Luo, Ganqu Cui, Changqing Zhang

General AI

Test-time training (TTT) adapts model parameters on unlabeled test instances during inference time, which continuously extends capabilities beyond the reach of offline training. Despite initial gains, existing TTT methods for LRMs plateau quickly and do not benefit from additional test-time compute. Without external ca…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 5.5

For-Value: Efficient Forward-Only Data Valuation for finetuning LLMs and VLMs

2026-04-25 · Wenlong Deng, Qi Zeng, Jiaming Zhang, Minghui Chen, Zixin Ding, Christos Thrampoulidis, Boying Gong, Xiaoxiao Li

General AI

Data valuation is essential for enhancing the transparency and accountability of large language models (LLMs) and vision-language models (VLMs). However, existing methods typically rely on gradient computations, making them computationally prohibitive for billion-parameter models and precluding batch parallelization. I…

Review
pending
Role
unreviewed
Read
later
huggingface Score 5.4

ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models

2026-04-29 · Rui Xu, Jiepeng Wang, Hao Pan, Yang Liu, Xin Tong, Shiqing Xin, Changhe Tu, Taku Komura, Wenping Wang

General AI

In this paper, we study an under-explored but important factor of diffusion generative models, i.e., the combinatorial complexity. Data samples are generally high-dimensional, and for various structured generation tasks, additional attributes are combined to associate with data samples. We show that the space spanned b…

Review
pending
Role
unreviewed
Read
later
huggingface Score 5.4

Learning while Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies

2026-05-01 · Yi Wang, Xinchen Li, Pengwei Xie, Pu Yang, Buqing Nie, Yunuo Cai, Qinglin Zhang, Chendi Qu, Jeffrey Wu, Jianheng Song, Xinlin Ren, Jingshun Huang, Mingjie Pan, Siyuan Feng, Zhi Chen, Jianlan Luo

General AI

Generalist robot policies increasingly benefit from large-scale pretraining, but offline data alone is insufficient for robust real-world deployment. Deployed robots encounter distribution shifts, long-tail failures, task variations, and human correction opportunities that fixed demonstration datasets cannot fully capt…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.3

LottieGPT: Tokenizing Vector Animation for Autoregressive Generation

2026-04-13 · Junhao Chen, Kejun Gao, Yuehan Cui, Mingze Sun, Mingjin Chen, Shaohui Wang, Xiaoxiao Long, Fei Ma, Qi Tian, Ruqi Huang, Hao Zhao

General AI

Despite rapid progress in video generation, existing models are incapable of producing vector animation, a dominant and highly expressive form of multimedia on the Internet. Vector animations offer resolution-independence, compactness, semantic structure, and editable parametric motion representations, yet current gene…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

Pair2Scene: Learning Local Object Relations for Procedural Scene Generation

2026-04-13 · Xingjian Ran, Shujie Zhang, Weipeng Zhong, Li Luo, Bo Dai

General AI

Generating high-fidelity 3D indoor scenes remains a significant challenge due to data scarcity and the complexity of modeling intricate spatial relations. Current methods often struggle to scale beyond training distribution to dense scenes or rely on LLMs/VLMs that lack the ability for precise spatial reasoning. Buildi…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

Learning Versatile Humanoid Manipulation with Touch Dreaming

2026-04-14 · Yaru Niu, Zhenlong Fang, Binghong Chen, Shuai Zhou, Revanth Senthilkumaran, Hao Zhang, Bingqing Chen, Chen Qiu, H. Eric Tseng, Jonathan Francis, Ding Zhao

General AI

Humanoid robots promise general-purpose assistance, yet real-world humanoid loco-manipulation remains challenging because it requires whole-body stability, dexterous hands, and contact-aware perception under frequent contact changes. In this work, we study dexterous, contact-rich humanoid loco-manipulation. We first de…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

PAL: Personal Adaptive Learner

2026-04-14 · Megha Chakraborty, Darssan L. Eswaramoorthi, Madhur Thareja, Het Riteshkumar Shah, Finlay Palmer, Aryaman Bahl, Michelle A Ihetu, Amit Sheth

General AI

AI-driven education platforms have made some progress in personalisation, yet most remain constrained to static adaptation--predefined quizzes, uniform pacing, or generic feedback--limiting their ability to respond to learners' evolving understanding. This shortfall highlights the need for systems that are both context…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

Probabilistic Feature Imputation and Uncertainty-Aware Multimodal Federated Aggregation

2026-04-14 · Nafis Fuad Shahid, Maroof Ahmed, Md Akib Haider, Saidur Rahman Sagor, Aashnan Rahman, Md Azam Hossain

General AI

Multimodal federated learning enables privacy-preserving collaborative model training across healthcare institutions. However, a fundamental challenge arises from modality heterogeneity: many clinical sites possess only a subset of modalities due to resource constraints or workflow variations. Existing approaches addre…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

Scalable Trajectory Generation for Whole-Body Mobile Manipulation

2026-04-14 · Yida Niu, Xinhai Chang, Xin Liu, Ziyuan Jiao, Yixin Zhu

General AI

Robots deployed in unstructured environments must coordinate whole-body motion -- simultaneously moving a mobile base and arm -- to interact with the physical world. This coupled mobility and dexterity yields a state space that grows combinatorially with scene and object diversity, demanding datasets far larger than th…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

Agent-Aided Design for Dynamic CAD Models

2026-04-16 · Mitch Adler, Matthew Russo, Michael Cafarella

General AI

In the past year, researchers have started to create agentic systems that can design real-world CAD-style objects in a training-free setting, a new variety of system that we call Agent-Aided Design. Generally speaking, these systems place an agent in a feedback loop in which it can write code, compile that code to an a…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations

2026-04-16 · Manan Gupta, Dhruv Kumar

General AI

LLM-as-judge frameworks are increasingly used for automatic NLG evaluation, yet their per-instance reliability remains poorly understood. We present a two-pronged diagnostic toolkit applied to SummEval: $\textbf{(1)}$ a transitivity analysis that reveals widespread per-input inconsistency masked by low aggregate violat…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

Why Do Vision Language Models Struggle To Recognize Human Emotions?

2026-04-16 · Madhav Agarwal, Sotirios A. Tsaftaris, Laura Sevilla-Lara, Steven McDonagh

General AI

Understanding emotions is a fundamental ability for intelligent systems to be able to interact with humans. Vision-language models (VLMs) have made tremendous progress in the last few years for many visual tasks, potentially offering a promising solution for understanding emotions. However, it is surprising that even t…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

GAViD: A Large-Scale Multimodal Dataset for Context-Aware Group Affect Recognition from Videos

2026-04-17 · Deepak Kumar, Abhishek Pratap Singh, Puneet Kumar, Xiaobai Li, Balasubramanian Raman

General AI

Understanding affective dynamics in real-world social systems is fundamental to modeling and analyzing human-human interactions in complex environments. Group affect emerges from intertwined human-human interactions, contextual influences, and behavioral cues, making its quantitative modeling a challenging computationa…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.3

The QBF Gallery 2023

2026-04-17 · Simone Heisinger, Luca Pulina, Martina Seidl

General AI

The QBF Gallery 2023, the last QBF evaluation event, continues the tradition to survey and document the state of the art in solving quantified Boolean formulas (QBFs). It provides a detailed overview by collecting newly developed solvers and formulas as benchmarks. This report documents the solvers and formulas submitt…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.3

AnchorSeg: Language Grounded Query Banks for Reasoning Segmentation

2026-04-20 · Rui Qian, Chuanhang Deng, Qiang Huang, Jian Xiong, Mingxuan Li, Yingbo Zhou, Wei Zhai, Jintao Chen, Dejing Dou

General AI

Reasoning segmentation requires models to ground complex, implicit textual queries into precise pixel-level masks. Existing approaches rely on a single segmentation token $\texttt{<SEG>}$, whose hidden state implicitly encodes both semantic reasoning and spatial localization, limiting the model's ability to explicitly …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale

2026-04-20 · A. Sophia Koepke, Daniil Zverev, Shiry Ginosar, Alexei A. Efros

General AI

The Platonic Representation Hypothesis suggests that neural networks trained on different modalities (e.g., text and images) align and eventually converge toward the same representation of reality. If true, this has significant implications for whether modality choice matters at all. We show that the experimental evide…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

Dual Alignment Between Language Model Layers and Human Sentence Processing

2026-04-20 · Tatsuki Kuribayashi, Alex Warstadt, Yohei Oseki, Ethan Gotlieb Wilcox

General AI

A recent study (Kuribayashi et al., 2025) has shown that human sentence processing behavior, typically measured on syntactically unchallenging constructions, can be effectively modeled using surprisal from early layers of large language models (LLMs). This raises the question of whether such advantages of internal laye…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

Evolutionary Negative Module Pruning for Better LoRA Merging

2026-04-20 · Anda Cao, Zhuo Gou, Yi Wang, Kaixuan Chen, Yu Wang, Can Wang, Mingli Song, Jie Song

General AI

Merging multiple Low-Rank Adaptation (LoRA) experts into a single backbone is a promising approach for efficient multi-task deployment. While existing methods strive to alleviate interference via weight interpolation or subspace alignment, they rest upon the implicit assumption that all LoRA matrices contribute constru…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

2026-04-20 · Haoyu Wu, Jiwen Yu, Yingtian Zou, Xihui Liu

General AI

Video world models have achieved remarkable success in simulating environmental dynamics in response to actions by users or agents. They are modeled as action-conditioned video generation models that take historical frames and current actions as input to predict future frames. Yet, most existing approaches are limited …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

SynAgent: Generalizable Cooperative Humanoid Manipulation via Solo-to-Cooperative Agent Synergy

2026-04-20 · Wei Yao, Haohan Ma, Hongwen Zhang, Yunlian Sun, Liangjun Xing, Zhile Yang, Yuanjun Guo, Yebin Liu, Jinhui Tang

General AI

Controllable cooperative humanoid manipulation is a fundamental yet challenging problem for embodied intelligence, due to severe data scarcity, complexities in multi-agent coordination, and limited generalization across objects. In this paper, we present SynAgent, a unified framework that enables scalable and physicall…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

DP-FlogTinyLLM: Differentially private federated log anomaly detection using Tiny LLMs

2026-04-21 · Isaiah Thompson, Tanmay Sen, Ritwik Bhattacharya

General AI

Modern distributed systems generate massive volumes of log data that are critical for detecting anomalies and cyber threats. However, in real world settings, these logs are often distributed across multiple organizations and cannot be centralized due to privacy and security constraints. Existing log anomaly detection m…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

InHabit: Leveraging Image Foundation Models for Scalable 3D Human Placement

2026-04-21 · Nikita Kister, Pradyumna YM, István Sárándi, Jiayi Wang, Anna Khoreva, Gerard Pons-Moll

General AI

Training embodied agents to understand 3D scenes as humans do requires large-scale data of people meaningfully interacting with diverse environments, yet such data is scarce. Real-world motion capture is costly and limited to controlled settings, while existing synthetic datasets rely on simple geometric heuristics tha…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

Structure-guided molecular design with contrastive 3D protein-ligand learning

2026-04-21 · Carles Navarro, Philipp Tholke, Gianni de Fabritiis

General AI

Structure-based drug discovery faces the dual challenge of accurately capturing 3D protein-ligand interactions while navigating ultra-large chemical spaces to identify synthetically accessible candidates. In this work, we present a unified framework that addresses these challenges by combining contrastive 3D structure …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling

2026-04-21 · Boyu Chen, Yi Chen, Lu Qiu, Jerry Bai, Yuying Ge, Yixiao Ge

General AI

Scaling humanoid foundation models is bottlenecked by the scarcity of robotic data. While massive egocentric human data offers a scalable alternative, bridging the cross-embodiment chasm remains a fundamental challenge due to kinematic mismatches. We introduce UniT (Unified Latent Action Tokenizer via Visual Anchoring)…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

Interval POMDP Shielding for Imperfect-Perception Agents

2026-04-22 · William Scarbro, Ravi Mangal

General AI

Autonomous systems that rely on learned perception can make unsafe decisions when sensor readings are misclassified. We study shielding for this setting: given a proposed action, a shield blocks actions that could violate safety. We consider the common case where system dynamics are known but perception uncertainty mus…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

Reliability as a Design Principle: A Systematic Review and Integrated Framework for Renewable-Based Microgrids

2026-04-22 · Mohammed Zeehan Saleheen, Markus Wagner, Reza Razzaghi, Hao Wang

General AI

Reliable operation is a central motivation for deploying renewable-based microgrids. This paper presents a systematic rapid review that positions reliability as the central organizing principle for microgrid design. Specifically, this review systematically synthesizes recent literature to examine how planning assumptio…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs

2026-04-23 · Hao-Yu Hsu, Tianhang Cheng, Jing Wen, Alexander G. Schwing, Shenlong Wang

General AI

Understanding human activities and their surrounding environments typically relies on visual perception, yet cameras pose persistent challenges in privacy, safety, energy efficiency, and scalability. We explore an alternative: 4D perception without vision. Its goal is to reconstruct human motion and 3D scene layouts pu…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

UniGenDet: A Unified Generative-Discriminative Framework for Co-Evolutionary Image Generation and Generated Image Detection

2026-04-23 · Yanran Zhang, Wenzhao Zheng, Yifei Li, Bingyao Yu, Yu Zheng, Lei Chen, Jiwen Lu, Jie Zhou

General AI

In recent years, significant progress has been made in both image generation and generated image detection. Despite their rapid, yet largely independent, development, these two fields have evolved distinct architectural paradigms: the former predominantly relies on generative networks, while the latter favors discrimin…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

Complexity of Linear Regions in Self-supervised Deep ReLU Networks

2026-04-27 · Mufhumudzi Muthivhi, Terence L. van Zyl

General AI

There has been growing interest in studying the complexity of Rectified Linear Unit (ReLU) based activation networks. Recent work investigates the evolution of the number of piecewise-linear partitions (linear regions) that are formed during training. However, current research is limited to examining the complexity of …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.2

Cognitive Atrophy and Systemic Collapse in AI-Dependent Software Engineering

2026-04-29 · Frank Ginac

General AI

The integration of Large Language Models (LLMs) into the software development lifecycle (SDLC) masks a critical socio-technical failure: Cognitive-Systemic Collapse. This paper introduces "Epistemological Debt," the hidden carrying cost incurred when engineers substitute logical derivation with passive AI verification.…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

Hot Fixing in the Wild

2026-04-29 · Carol Hanna, Karine Even-Mendoza, W. B. Langdon, Mar Zamorano López, Justyna Petke, Federica Sarro

General AI

Despite the operational importance of hot fixes, large-scale evidence on how they reshape routine maintenance workflows, particularly in the era of autonomous coding agents, remains limited. We analyse hot fixes present in over 61,000 GitHub repositories from the Hao-Li/AIDev dataset and find consistent patterns of urg…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

Learning Over-Relaxation Policies for ADMM with Convergence Guarantees

2026-04-29 · Junan Lin, Paul J. Goulart, Luca Furieri

General AI

The Alternating Direction Method of Multipliers (ADMM) is a widely used method for structured convex optimization, and its practical performance depends strongly on the choice of penalty and relaxation parameters. Motivated by settings such as Model Predictive Control (MPC), where one repeatedly solves related optimiza…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

MISES: Minimal Information Sufficiency for Effective Service

2026-04-29 · Joss Armstrong

General AI

Category-based coordination mechanisms allocate resources by mapping a declared service category to a fixed resource profile, without observing individual demand types. We establish three results for this class of mechanisms. First, the relative welfare gap Delta satisfies a tight two-sided bound in terms of the aggreg…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

Many-to-many stable matching in large economies

2026-04-29 · Michael Greinecke, Karolina Vocke

General AI

We study stability notions for networked many-to-many matching markets with individually insignificant agents in distributional form. Outcomes are formulated as joint distributions over characteristics of agents and contract choices. Characteristics can lie in an arbitrary Polish space. We provide a mechanical method f…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

Resolving growth-induced off-stoichiometry in AgCrSe$_2$ single crystals

2026-04-29 · Felix Eder, Zeno Maesen, Yurii Skourski, Enrico Giannini, Oksana Zaharko, Fabian O. von Rohr

General AI

The layered delafossite-like antiferromagnet AgCrSe$_2$ is a superionic conductor at high temperatures and has been reported to exhibit anomalous Hall behavior and Kondo physics at low temperatures. These extraordinary transport properties have been established almost exclusively on single crystals grown by chemical va…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

Rethinking Nonlocality: Locality, Counterfactuals, and the EPR-Bell Argument

2026-04-29 · Partha Ghose

General AI

The widespread claim that violations of Bell inequalities establish the nonlocality of nature is critically reexamined. It is argued that this conclusion is not logically compelled by either the Einstein--Podolsky--Rosen (EPR) argument or Bell's theorem. The analysis highlights the central role of counterfactual reason…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

Phase Transitions in Economic Inequality:Taxation and Extremal Replacement Dynamics

2026-04-30 · Lautaro Giordano, Sebastian Gonçalves, José Roberto Iglesias, María Fabiana Laguna

General AI

We present a minimal agent-based model of interacting agents characterized by their wealth to study taxation and inequality in a non-conservative economy. Wealth evolves through an extremal stochastic replacement process in which the poorest agent has its wealth replaced by a new random value, financed through a collec…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

Post-Optimization Adaptive Rank Allocation for LoRA

2026-04-30 · Vishnuprasadh Kumaravelu, Sunil Gupta, P. K. Srijith

General AI

Exponential growth in the scale of modern foundation models has led to the widespread adoption of Low-Rank Adaptation (LoRA) as a parameter-efficient fine-tuning technique. However, standard LoRA implementations disregard the varying intrinsic dimensionality of model layers and enforce a uniform rank, leading to parame…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

Penalized Likelihood for Dyadic Network Formation Models with Degree Heterogeneity

2026-05-01 · Zizhong Yan, Jingrong Li, Yi Zhang

General AI

Estimating network formation models with degree heterogeneity raises two problems in empirical networks. First, agents that send no links, receive no links, or link to all remaining agents can make the fixed-effects MLE fail to exist. Trimming these agents changes the estimation sample and induces selection bias. Secon…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

Posterior Augmented Flow Matching

2026-05-01 · George Stoica, Sayak Paul, Matthew Wallingford, Vivek Ramanujan, Abhay Nori, Winson Han, Ali Farhadi, Ranjay Krishna, Judy Hoffman

General AI

Flow matching (FM) trains a time-dependent vector field that transports samples from a simple prior to a complex data distribution. However, for high-dimensional images, each training sample supervises only a single trajectory and intermediate point, yielding an extremely sparse and high-variance training signal. This …

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

Simpson's paradox explains the ubiquity of nonlinear, threshold, and complex contagions

2026-05-01 · Laurent Hébert-Dufresne, Antoine Allard, Jean-Gabriel Young, William H. W. Thompson, Guillaume St-Onge

General AI

Complex contagions describe systems where the probability or rate of contagious transmission is a nonlinear function of the exposure to contagious agents. These models were first studied theoretically but have since been used to capture effects such as nonconformism, social reinforcement or peer pressure in empirical d…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

Unsupervised Denoising of Real Clinical Low Dose Liver CT with Perceptual Attention Networks

2026-05-01 · Jingxi Pu, Tonghua Liu, Zhilin Guan, Siqiao Li, Yang Ming, Zheng Cong, Wei Zhang, Fangwei Li

General AI

With the development of deep learning, medical image processing has been widely used to assist clinical research. This paper focuses on the denoising problem of low-dose computed tomography using deep learning. Although low-dose computed tomography reduces radiation exposure to patients, it also introduces more noise, …

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

Equilibrium Stability and Uniqueness with a Large Number of Commodities and Patient Consumers

2026-05-04 · Xinyang Wang

General AI

We show that a large effective number of commodities can be a source of equilibrium stability and uniqueness: expanding substitution opportunities strengthens aggregate substitution effects. We study finite dated-commodity exchange economies obtained by truncating a countably infinite-horizon environment with discounte…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

Private Speech Classification without Collapse: Stabilized DP Training and Offline Distillation

2026-05-04 · Yadi Wen, Tianxin Li, Enji Liang, Rong Du, Yue Fu

General AI

We study example-level private supervised speech classification under a practical release constraint: training may access privileged side information, but the released model must be audio-only. This setting is important because speech systems can often exploit richer side information during development, whereas deploym…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

Uncountably many conditionally inaccessible decisions exist in every finite probability space

2026-05-04 · Zalán Gyenis, Miklós Rédei, Leszek Wroński

General AI

In a recent paper \cite{Redei-Jing2026} the notion of conditional $p$-inaccessibility of a decision based on utility maximization was defined and examples of conditionally $p$-inaccessible decisions were given. The conditional inaccessibility of a decision based on maximizing utility calculated by a probability measure…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

When Is the Same Model Not the Same Service? A Measurement Study of Hosted Open-Weight LLM APIs

2026-05-04 · Haorui Li, Zhenghui He, Xuanzi Liu, Yang Xu, Dongsheng Liu, Jiakang Ma, Lupan Wu, Yangjie Wu, Xiongchao Tang, Tianhui Shi

General AI

Open-weight large language models (LLMs) are often described as downloadable model artifacts, but in production they are increasingly consumed as hosted APIs. This paper studies the intermediary service layer that turns a model release into an operational endpoint. Using sampled request logs, provider metadata, compati…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

HUGO-CS: A Hybrid-Labeled, Uncertainty-Aware, General-Purpose, Observational Dataset for Cold Spray

2026-05-05 · Stephen Price, Kyle Miller, Marco Musto, Kenneth Kroenlein, James Saal, Kyle Tsaknopoulos, Elke A. Rundensteiner, Danielle L. Cote

General AI

Cold spraying is an increasingly common approach for repairing and manufacturing components due to its solid-state manufacturing capabilities. However, process optimization remains difficult due to many interdependent parameters and the lack of large-scale, machine-readable data to support modeling. While the scientifi…

Review
pending
Role
unreviewed
Read
later
huggingface Score 5.0

Structural Graph Probing of Vision-Language Models

2026-03-28 · Haoyu He, Yue Zhuo, Yu Zheng, Qi R. Wang

General AI

Vision-language models (VLMs) achieve strong multimodal performance, yet how computation is organized across populations of neurons remains poorly understood. In this work, we study VLMs through the lens of neural topology, representing each layer as a within-layer correlation graph derived from neuron-neuron co-activa…

Review
pending
Role
unreviewed
Read
later
huggingface Score 5.0

Emergent Social Intelligence Risks in Generative Multi-Agent Systems

2026-03-29 · Yue Huang, Yu Jiang, Wenjie Wang, Haomin Zhuang, Xiaonan Luo, Yuchen Ma, Zhangchen Xu, Zichen Chen, Nuno Moniz, Zinan Lin, Pin-Yu Chen, Nitesh V Chawla, Nouha Dziri, Huan Sun, Xiangliang Zhang

General AI

Multi-agent systems composed of large generative models are rapidly moving from laboratory prototypes to real-world deployments, where they jointly plan, negotiate, and allocate shared resources to solve complex tasks. While such systems promise unprecedented scalability and autonomy, their collective interaction also …

Review
pending
Role
unreviewed
Read
later
huggingface Score 5.0

CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence

2026-03-30 · Tianle Zeng, Hanxuan Chen, Yanci Wen, Hong Zhang

General AI

The convergence of low-altitude economies, embodied intelligence, and air-ground cooperative systems creates growing demand for simulation infrastructure capable of jointly modeling aerial and ground agents within a single physically coherent environment. Existing open-source platforms remain domain-segregated: driving…

Review
pending
Role
unreviewed
Read
later
huggingface Score 5.0

OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training

2026-03-30 · Haiyue Song, Masao Utiyama

General AI

Continual pre-training is widely used to adapt LLMs to target languages and domains, yet the mixture ratio of training data remains a sensitive hyperparameter that is expensive to tune: they must be fixed before training begins, and a suboptimal choice can waste weeks of compute. In this work, we propose OptiMer, which…

Review
pending
Role
unreviewed
Read
later
huggingface Score 5.0

HDP: A Lightweight Cryptographic Protocol for Human Delegation Provenance in Agentic AI Systems

2026-04-06 · Asiri Dalugoda

General AI

Agentic AI systems increasingly execute consequential actions on behalf of human principals, delegating tasks through multi-step chains of autonomous agents. No existing standard addresses a fundamental accountability gap: verifying that terminal actions in a delegation chain were genuinely authorized by a human princi…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 5.0

QiMeng-PRepair: Precise Code Repair via Edit-Aware Reward Optimization

2026-04-07 · Changxin Ke, Rui Zhang, Jiaming Guo, Yuanbo Wen, Li Ding, Shuo Wang, Xuyuan Zhu, Xiong Peng, Di Huang, Zidong Du, Xing Hu, Qi Guo, Yunji Chen

General AI

Large Language Models (LLMs) achieve strong program repair performance but often suffer from over-editing, where excessive modifications overwrite correct code and hinder bug localization. We systematically quantify its impact and introduce precise repair task, which maximizes reuse of correct code while fixing only bu…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 5.0

Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization

2026-04-27 · Xinxin Liu, Ming Li, Zonglin Lyu, Yuzhang Shang, Chen Chen

General AI

Human visual preferences are inherently multi-dimensional, encompassing aesthetics, detail fidelity, and semantic alignment. However, existing datasets provide only single, holistic annotations, resulting in severe label noise: images that excel in some dimensions but are deficient in others are simply marked as winner…

Review
pending
Role
unreviewed
Read
later
huggingface Score 5.0

Soft Anisotropic Diagrams for Differentiable Image Representation

2026-04-27 · Laki Iinbor, Zhiyang Dou, Wojciech Matusik

General AI

We introduce Soft Anisotropic Diagrams (SAD), an explicit and differentiable image representation parameterized by a set of adaptive sites in the image plane. In SAD, each site specifies an anisotropic metric and an additively weighted distance score, and we compute pixel colors as a softmax blend over a small per-pixe…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.8

Frequency Switching Mechanism for Parameter-E!cient Multi-Task Learning

2026-03-22 · Shih-Wen Liu, Yen-Chang Chen, Wei-Ta Chu, Fu-En Yang, Yu-Chiang Frank Wang

General AI

Multi-task learning (MTL) aims to enable a single model to solve multiple tasks efficiently; however, current parameter-efficient fine-tuning (PEFT) methods remain largely limited to single-task adaptation. We introduce \textbf{Free Sinewich}, a parameter-efficient multi-task learning framework that enables near-zero-c…

Review
pending
Role
unreviewed
Read
now
arxiv Score 4.8

Designing Any Imaging System from Natural Language: Agent-Constrained Composition over a Finite Primitive Basis

2026-03-26 · Chengshuai Yang

General AI

Designing a computational imaging system -- selecting operators, setting parameters, validating consistency -- requires weeks of specialist effort per modality, creating an expertise bottleneck that excludes the broader scientific community from prototyping imaging instruments. We introduce spec.md, a structured specif…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 4.8

The Kitchen Loop: User-Spec-Driven Development for a Self-Evolving Codebase

2026-03-26 · Yannick Roy

General AI

Code production is now a commodity; the bottleneck is knowing what to build and proving it works. We present the Kitchen Loop, a framework for autonomous, self-evolving software built on a unified trust model: (1) a specification surface enumerating what the product claims to support; (2) 'As a User x 1000', where an L…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 4.8

Hybrid Diffusion Model for Breast Ultrasound Image Augmentation

2026-03-27 · Farhan Fuad Abir, Sanjeda Sara Jennifer, Niloofar Yousefi, Laura J. Brattain

General AI

We propose a hybrid diffusion-based augmentation framework to overcome the critical challenge of ultrasound data augmentation in breast ultrasound (BUS) datasets. Unlike conventional diffusion-based augmentations, our approach improves visual fidelity and preserves ultrasound texture by combining text-to-image generati…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.8

EpiScreen: Early Epilepsy Detection from Electronic Health Records with Large Language Models

2026-03-30 · Shuang Zhou, Kai Yu, Zaifu Zhan, Huixue Zhou, Min Zeng, Feng Xie, Zhiyi Sha, Rui Zhang

General AI

Epilepsy and psychogenic non-epileptic seizures often present with similar seizure-like manifestations but require fundamentally different management strategies. Misdiagnosis is common and can lead to prolonged diagnostic delays, unnecessary treatments, and substantial patient morbidity. Although prolonged video-electr…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 4.8

Geometry-aware similarity metrics for neural representations on Riemannian and statistical manifolds

2026-03-30 · N Alex Cayco Gajic, Arthur Pellegrino

General AI

Similarity measures are widely used to interpret the representational geometries used by neural networks to solve tasks. Yet, because existing methods compare the extrinsic geometry of representations in state space, rather than their intrinsic geometry, they may fail to capture subtle yet crucial distinctions between …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 4.8

Multimodal Analytics of Cybersecurity Crisis Preparation Exercises: What Predicts Success?

2026-03-30 · Conrad Borchers, Valdemar Švábenský, Sandesh K. Kafle, Kevin K. Tang, Jan Vykopal

General AI

Instructional alignment, the match between intended cognition and enacted activity, is central to effective instruction but hard to operationalize at scale. We examine alignment in cybersecurity simulations using multimodal traces from 23 teams (76 students) across five exercise sessions. Study 1 codes objectives and t…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 4.8

Rethinking Language Model Scaling under Transferable Hypersphere Optimization

2026-03-30 · Liliang Ren, Yang Liu, Yelong Shen, Weizhu Chen

General AI

Scaling laws for large language models depend critically on the optimizer and parameterization. Existing hyperparameter transfer laws are mainly developed for first-order optimizers, and they do not structurally prevent training instability at scale. Recent hypersphere optimization methods constrain weight matrices to …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 4.8

Learning Structural-Functional Brain Representations through Multi-Scale Adaptive Graph Attention for Cognitive Insight

2026-03-31 · Badhan Mazumder, Sir-Lord Wiafe, Aline Kotoski, Vince D. Calhoun, Dong Hye Ye

General AI

Understanding how brain structure and function interact is key to explaining intelligence yet modeling them jointly is challenging as the structural and functional connectome capture complementary aspects of organization. We introduced Multi-scale Adaptive Graph Network (MAGNet), a Transformer-style graph neural networ…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 4.8

Large-scale Codec Avatars: The Unreasonable Effectiveness of Large-scale Avatar Pretraining

2026-04-02 · Junxuan Li, Rawal Khirodkar, Chengan He, Zhongshi Jiang, Giljoo Nam, Lingchen Yang, Jihyun Lee, Egor Zakharov, Zhaoen Su, Rinat Abdrashitov, Yuan Dong, Julieta Martinez, Kai Li, Qingyang Tan, Takaaki Shiratori, Matthew Hu, Peihong Guo, Xuhua Huang, Ariyan Zarei, Marco Pesavento, Yichen Xu, He Wen, Teng Deng, Wyatt Borsos, Anjali Thakrar, Jean-Charles Bazin, Carsten Stoll, Ginés Hidalgo, James Booth, Lucy Wang, Xiaowen Ma, Yu Rong, Sairanjith Thalanki, Chen Cao, Christian Häne, Abhishek Kar, Sofien Bouaziz, Jason Saragih, Yaser Sheikh, Shunsuke Saito

General AI

High-quality 3D avatar modeling faces a critical trade-off between fidelity and generalization. On the one hand, multi-view studio data enables high-fidelity modeling of humans with precise control over expressions and poses, but it struggles to generalize to real-world data due to limited scale and the domain gap betw…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.8

SPAR: Single-Pass Any-Resolution ViT for Open-vocabulary Segmentation

2026-04-02 · Naomi Kombol, Ivan Martinović, Siniša Šegvić, Giorgos Tolias

General AI

Foundational Vision Transformers (ViTs) have limited effectiveness in tasks requiring fine-grained spatial understanding, due to their fixed pre-training resolution and inherently coarse patch-level representations. These challenges are especially pronounced in dense prediction scenarios, such as open-vocabulary segmen…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.8

The Self Driving Portfolio: Agentic Architecture for Institutional Asset Management

2026-04-02 · Andrew Ang, Nazym Azimbayev, Andrey Kim

General AI

Agentic AI shifts the investor's role from analytical execution to oversight. We present an agentic strategic asset allocation pipeline in which approximately 50 specialized agents produce capital market assumptions, construct portfolios using over 20 competing methods, and critique and vote on each other's output. A r…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.8

MMTalker: Multiresolution 3D Talking Head Synthesis with Multimodal Feature Fusion

2026-04-03 · Bin Liu, Zhixiang Xiong, Zhifen He, Bo Li

General AI

Speech-driven three-dimensional (3D) facial animation synthesis aims to build a mapping from one-dimensional (1D) speech signals to time-varying 3D facial motion signals. Current methods still face challenges in maintaining lip-sync accuracy and producing realistic facial expressions, primarily due to the highly ill-po…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.8

Are Latent Reasoning Models Easily Interpretable?

2026-04-06 · Connor Dilgren, Sarah Wiegreffe

General AI

Latent reasoning models (LRMs) have attracted significant research interest due to their low inference cost (relative to explicit reasoning models) and theoretical ability to explore multiple reasoning paths in parallel. However, these benefits come at the cost of reduced interpretability: LRMs are difficult to monitor…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 4.8

Beyond the Final Actor: Modeling the Dual Roles of Creator and Editor for Fine-Grained LLM-Generated Text Detection

2026-04-06 · Yang Li, Qiang Sheng, Zhengjia Wang, Yehan Yang, Danding Wang, Juan Cao

General AI

The misuse of large language models (LLMs) requires precise detection of synthetic text. Existing works mainly follow binary or ternary classification settings, which can only distinguish pure human/LLM text or collaborative text at best. This remains insufficient for the nuanced regulation, as the LLM-polished human t…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 4.8

HI-MoE: Hierarchical Instance-Conditioned Mixture-of-Experts for Object Detection

2026-04-06 · Vadim Vashkelis, Natalia Trukhina

General AI

Mixture-of-Experts (MoE) architectures enable conditional computation by activating only a subset of model parameters for each input. Although sparse routing has been highly effective in language models and has also shown promise in vision, most vision MoE methods operate at the image or patch level. This granularity i…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 4.8

Your Pre-trained Diffusion Model Secretly Knows Restoration

2026-04-06 · Sudarshan Rajagopalan, Vishal M. Patel

General AI

Pre-trained diffusion models have enabled significant advancements in All-in-One Restoration (AiOR), offering improved perceptual quality and generalization. However, diffusion-based restoration methods primarily rely on fine-tuning or Control-Net style modules to leverage the pre-trained diffusion model's priors for A…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 4.8

Artificial Intelligence and the Structure of Mathematics

2026-04-07 · Maissam Barkeshli, Michael R. Douglas, Michael H. Freedman

General AI

Recent progress in artificial intelligence (AI) is unlocking transformative capabilities for mathematics. There is great hope that AI will help solve major open problems and autonomously discover new mathematical concepts. In this essay, we further consider how AI may open a grand perspective on mathematics by forging …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 4.8

Constraint-Driven Warm-Freeze for Efficient Transfer Learning in Photovoltaic Systems

2026-04-07 · Yasmeen Saeed, Ahmed Sharshar, Mohsen Guizani

General AI

Detecting cyberattacks in photovoltaic (PV) monitoring and MPPT control signals requires models that are robust to bias, drift, and transient spikes, yet lightweight enough for resource-constrained edge controllers. While deep learning outperforms traditional physics-based diagnostics and handcrafted features, standard…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 4.8

DiffHDR: Re-Exposing LDR Videos with Video Diffusion Models

2026-04-07 · Zhengming Yu, Li Ma, Mingming He, Leo Isikdogan, Yuancheng Xu, Dmitriy Smirnov, Pablo Salamanca, Dao Mi, Pablo Delgado, Ning Yu, Julien Philip, Xin Li, Wenping Wang, Paul Debevec

General AI

Most digital videos are stored in 8-bit low dynamic range (LDR) formats, where much of the original high dynamic range (HDR) scene radiance is lost due to saturation and quantization. This loss of highlight and shadow detail precludes mapping accurate luminance to HDR displays and limits meaningful re-exposure in post-…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 4.8

LLM4CodeRE: Generative AI for Code Decompilation Analysis and Reverse Engineering

2026-04-07 · Hamed Jelodar, Samita Bai, Tochukwu Emmanuel Nwankwo, Parisa Hamedi, Mohammad Meymani, Roozbeh Razavi-Far, Ali A. Ghorbani

General AI

Code decompilation analysis is a fundamental yet challenging task in malware reverse engineering, particularly due to the pervasive use of sophisticated obfuscation techniques. Although recent large language models (LLMs) have shown promise in translating low-level representations into high-level source code, most exis…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 4.8

Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts

2026-04-09 · Jiayuan Ye, Vitaly Feldman, Kunal Talwar

General AI

Large language models (LLMs) can struggle to memorize factual knowledge in their parameters, often leading to hallucinations and poor performance on knowledge-intensive tasks. In this paper, we formalize fact memorization from an information-theoretic perspective and study how training data distributions affect fact ac…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 4.8

Dynamic Decision Learning: Test-Time Evolution for Abnormality Grounding in Rare Diseases

2026-04-27 · Jun Li, Mingxuan Liu, Jiazhen Pan, Che Liu, Wenjia Bai, Cosmin I. Bercea, Julia A. Schnabel

General AI

Clinical abnormality grounding for rare diseases is often hindered by data scarcity, making supervised fine-tuning impractical and single-pass inference highly unstable. We propose Dynamic Decision Learning (DDL), a framework that enables frozen large vision-language models (LVLMs) to refine their decisions across both…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.8

Personalized Multi-Interest Modeling for Cross-Domain Recommendation to Cold-Start Users

2026-04-28 · Xiaodong Li, Jiawei Sheng, Jiangxia Cao, Xinghua Zhang, Wenyuan Zhang, Yong Sun, Shirui Pan, Zhihong Tian, Tingwen Liu

General AI

Cross-domain recommendation (CDR) has demonstrated to be an effective solution for alleviating the user cold-start issue. By leveraging rich user-item interactions available in a richly informative source domain, CDR could improve the recommendation performance for cold-start users in the target domain. Previous CDR ap…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.8

Pythia: Toward Predictability-Driven Agent-Native LLM Serving

2026-04-28 · Shan Yu, Junyi Shu, Yuanjiang Ni, Kun Qian, Xue Li, Yang Wang, Jinyuan Zhang, Ziyi Xu, Shuo Yang, Lingjun Zhu, Ennan Zhai, Qingda Lu, Jiarong Xing, Youyou Lu, Xin Jin, Xuanzhe Liu, Harry Xu

General AI

As LLM applications grow more complex, developers are increasingly adopting multi-agent architectures to decompose workflows into specialized, collaborative components, introducing structure that constrains agent behavior and exposes useful semantic predictability. Unlike traditional LLM serving, which operates under h…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.8

Slice Agent: Identifying and Isolating Slices in Shared Open Radio Unit

2026-04-28 · Felipe Arnholda, Flavio Rocha, Lucio Prade, Cristiano Bonato Both

General AI

Network Slice as a Service (NSaaS) is a key enabler of Beyond Fifth Generation (5G) and Sixth Generation (6G) networks, supporting next-generation applications such as extended reality (XR), immersive services, and the tactile Internet. These networks must provide native support for slice-aware services across the enti…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.8

Towards Unified Multi-task EEG Analysis with Low-Rank Adaptation

2026-04-28 · Sicheng Dai, Kai Chen, Hongwang Xiao, Shan Yu, Qiwei Ye

General AI

Recent self-supervised pre-training methods for electroencephalogram (EEG) have shown promising results. However, the pre-trained models typically require full fine-tuning on each downstream task individually to achieve good performance. In practical applications involving multiple tasks, utilizing a separate model for…

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.5

EditCrafter: Tuning-free High-Resolution Image Editing via Pretrained Diffusion Model

2026-04-11 · Kunho Kim, Sumin Seo, Yongjun Cho, Hyungjin Chung

General AI

We propose EditCrafter, a high-resolution image editing method that operates without tuning, leveraging pretrained text-to-image (T2I) diffusion models to process images at resolutions significantly exceeding those used during training. Leveraging the generative priors of large-scale T2I diffusion models enables the de…

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.5

Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models

2026-04-11 · Ivan Sedykh, Nikita Sorokin, Valentin Malykh

General AI

Recent advances in masked diffusion language models (MDLMs) narrow the quality gap to autoregressive LMs, but their sampling remains expensive because generation requires many full-sequence denoising passes with a large Transformer and, unlike autoregressive decoding, cannot benefit from KV caching. In this work, we ex…

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.5

Prompt Relay: Inference-Time Temporal Control for Multi-Event Video Generation

2026-04-11 · Gordon Chen, Ziqi Huang, Ziwei Liu

General AI

Video diffusion models have achieved remarkable progress in generating high-quality videos. However, these models struggle to represent the temporal succession of multiple events in real-world videos and lack explicit mechanisms to control when semantic concepts appear, how long they persist, and the order in which mul…

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.5

Zero-shot World Models Are Developmentally Efficient Learners

2026-04-11 · Khai Loong Aw, Klemen Kotar, Wanhee Lee, Seungwoo Kim, Khaled Jedoui, Rahul Venkatesh, Lilian Naing Chen, Michael C. Frank, Daniel L. K. Yamins

General AI

Young children demonstrate early abilities to understand their physical world, estimating depth, motion, object coherence, interactions, and many other aspects of physical scene understanding. Children are both data-efficient and flexible cognitive systems, creating competence despite extremely limited training data, w…

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.5

Accelerating Speculative Decoding with Block Diffusion Draft Trees

2026-04-14 · Liran Ringel, Yaniv Romano

General AI

Speculative decoding accelerates autoregressive language models by using a lightweight drafter to propose multiple future tokens, which the target model then verifies in parallel. DFlash shows that a block diffusion drafter can generate an entire draft block in a single forward pass and achieve state-of-the-art specula…

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.5

EdgeDetect: Importance-Aware Gradient Compression with Homomorphic Aggregation for Federated Intrusion Detection

2026-04-16 · Noor Islam S. Mohammad

General AI

Federated learning (FL) enables collaborative intrusion detection without raw data exchange, but conventional FL incurs high communication overhead from full-precision gradient transmission and remains vulnerable to gradient inference attacks. This paper presents EdgeDetect, a communication-efficient and privacy-aware …

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.5

TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification

2026-04-16 · Adam Rida

General AI

Every call to an LLM classification endpoint produces a labeled input-output pair already retained in production logs. These pairs constitute a free, growing training set: a lightweight surrogate trained on them can absorb a significant portion of future traffic at near-zero marginal inference cost. The open questions …

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.5

C-GenReg: Training-Free 3D Point Cloud Registration by Multi-View-Consistent Geometry-to-Image Generation with Probabilistic Modalities Fusion

2026-04-17 · Yuval Haitman, Amit Efraim, Joseph M. Francos

General AI

We introduce C-GenReg, a training-free framework for 3D point cloud registration that leverages the complementary strengths of world-scale generative priors and registration-oriented Vision Foundation Models (VFMs). Current learning-based 3D point cloud registration methods struggle to generalize across sensing modalit…

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.5

Target-Oriented Pretraining Data Selection via Neuron-Activated Graph

2026-04-17 · Zijun Wang, Haoqin Tu, Weidong Zhou, Yiyang Zhou, Xiaohuan Zhou, Bingni Zhang, Weiguo Feng, Taifeng Wang, Cihang Xie, Fengze Liu

General AI

Everyday tasks come with a target, and pretraining models around this target is what turns them into experts. In this paper, we study target-oriented language model (LM) pretraining by introducing Neuron-Activated Graph Ranking (NAG-based Ranking), a training-free and interpretable framework for target pretraining data…

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.5

Back to Repair: A Minimal Denoising Network\ for Time Series Anomaly Detection

2026-04-19 · Kadir-Kaan Özer, René Ebeling, Markus Enzweiler

General AI

We introduce JuRe (Just Repair), a minimal denoising network for time series anomaly detection that exposes a central finding: architectural complexity is unnecessary when the training objective correctly implements the manifold-projection principle. JuRe consists of a single depthwise-separable convolutional residual …

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.5

Coevolving Representations in Joint Image-Feature Diffusion

2026-04-19 · Theodoros Kouzelis, Spyros Gidaris, Nikos Komodakis

General AI

Joint image-feature generative modeling has recently emerged as an effective strategy for improving diffusion training by coupling low-level VAE latents with high-level semantic features extracted from pre-trained visual encoders. However, existing approaches rely on a fixed representation space, constructed independen…

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.5

ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis

2026-04-21 · Zhengwentai Sun, Keru Zheng, Chenghong Li, Hongjie Liao, Xihe Yang, Heyuan Li, Yihao Zhi, Shuliang Ning, Shuguang Cui, Xiaoguang Han

General AI

Human video generation remains challenging due to the difficulty of jointly modeling human appearance, motion, and camera viewpoint under limited multi-view data. Existing methods often address these factors separately, resulting in limited controllability or reduced visual quality. We revisit this problem from an imag…

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.5

Tadabur: A Large-Scale Quran Audio Dataset

2026-04-21 · Faisal Alherran

General AI

Despite growing interest in Quranic data research, existing Quran datasets remain limited in both scale and diversity. To address this gap, we present Tadabur, a large-scale Quran audio dataset. Tadabur comprises more than 1400+ hours of recitation audio from over 600 distinct reciters, providing substantial variation …

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.5

Near-Future Policy Optimization

2026-04-22 · Chuanyu Qin, Chenxu Yang, Qingyi Si, Naibin Gu, Dingyu Yao, Zheng Lin, Peng Fu, Nan Duan, Jiaqi Wang

General AI

Reinforcement learning with verifiable rewards (RLVR) has become a core post-training recipe. Introducing suitable off-policy trajectories into on-policy exploration accelerates RLVR convergence and raises the performance ceiling, yet finding a source of such trajectories remains the key challenge. Existing mixed-polic…

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.5

DiffNR: Diffusion-Enhanced Neural Representation Optimization for Sparse-View 3D Tomographic Reconstruction

2026-04-23 · Shiyan Su, Ruyi Zha, Danli Shi, Hongdong Li, Xuelian Cheng

General AI

Neural representations (NRs), such as neural fields and 3D Gaussians, effectively model volumetric data in computed tomography (CT) but suffer from severe artifacts under sparse-view settings. To address this, we propose DiffNR, a novel framework that enhances NR optimization with diffusion priors. At its core is Slice…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Graph-Enhanced LLM for SWAN-ISAC

2026-04-11 · Qian Gao, Ruikang Zhong, Yuanwei Liu

General AI

Segmented pinching antenna assisted integrated sensing and communication (ISAC) systems enable flexible spatial resource utilization by allowing different waveguide segments to be dynamically configured for transmission and reception. However, the resulting design requires the joint optimization of antenna deployment, …

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Identifying Disruptive Models in the Open-Source LLM Community

2026-04-13 · Xiaoting Wei, Lele Kang, Xuelian Pan, Jiannan Yang

General AI

The rapid growth of open-source large language models (LLMs) has created a complex ecosystem of model inheritance and reuse. However, existing research has focused mainly on descriptive analyses of lineage evolution, with limited attention to identifying which models play a disruptive role in shaping subsequent develop…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Koopman Representations for Non-Vanishing Time Intervals: An Optimization Approach and Sampling Effects

2026-04-13 · Younghwan Cho, Richard Sowers

General AI

Koopman operator theory is a key tool in data assimilation of complex dynamical systems, with the potential to be applied to multimodal data. We formulate the problem of learning Koopman eigenfunctions from observations at arbitrary, possibly non-vanishing, time intervals as an optimization problem. Analysis of the for…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Towards Automated Pentesting with Large Language Models

2026-04-13 · Ricardo Bessa, Rui Claro, João Trindade, João Lourenço

General AI

Large Language Models (LLMs) are redefining offensive cybersecurity by allowing the generation of harmful machine code with minimal human intervention. While attackers take advantage of dark LLMs such as XXXGPT and WolfGPT to produce malicious code, ethical hackers can follow similar approaches to automate traditional …

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Conflated Inverse Modeling to Generate Diverse and Temperature-Change Inducing Urban Vegetation Patterns

2026-04-14 · Baris Sarper Tezcan, Hrishikesh Viswanath, Rubab Saher, Daniel Aliaga

General AI

Urban areas are increasingly vulnerable to thermal extremes driven by rapid urbanization and climate change. Traditionally, thermal extremes have been monitored using Earth-observing satellites and numerical modeling frameworks. For example, land surface temperature derived from Landsat or Sentinel imagery is commonly …

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Personalizing LLM-Based Conversational Programming Assistants

2026-04-14 · Jonan Richards

General AI

Large Language Models (LLMs) have shown much promise in powering a variety of software engineering (SE) tools. Offering natural language as an intuitive interaction mechanism, LLMs have recently been employed as conversational ``programming assistants'' capable of supporting several SE activities simultaneously. As wit…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

OneHOI: Unifying Human-Object Interaction Generation and Editing

2026-04-15 · Jiun Tian Hoe, Weipeng Hu, Xudong Jiang, Yap-Peng Tan, Chee Seng Chan

General AI

Human-Object Interaction (HOI) modelling captures how humans act upon and relate to objects, typically expressed as <person, action, object> triplets. Existing approaches split into two disjoint families: HOI generation synthesises scenes from structured triplets and layout, but fails to integrate mixed conditions like…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Benchmarking Optimizers for MLPs in Tabular Deep Learning

2026-04-16 · Yury Gorishniy, Ivan Rubachev, Dmitrii Feoktistov, Artem Babenko

General AI

MLP is a heavily used backbone in modern deep learning (DL) architectures for supervised learning on tabular data, and AdamW is the go-to optimizer used to train tabular DL models. Unlike architecture design, however, the choice of optimizer for tabular DL has not been examined systematically, despite new optimizers sh…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Knowing that you do not know everything

2026-04-16 · Alex A. T. Rathke

General AI

We show that a rational agent with true and refinable knowledge of events cannot know if she knows everything or not. This epistemic limitation is not resolved by introspection about tautologies or by learning about new events.

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories

2026-04-16 · Zhanhao Liang, Tao Yang, Jie Wu, Chengjian Feng, Liang Zheng

General AI

This paper focuses on the alignment of flow matching models with human preferences. A promising way is fine-tuning by directly backpropagating reward gradients through the differentiable generation process of flow matching. However, backpropagating through long trajectories results in prohibitive memory costs and gradi…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Simplifying Safety Proofs with Forward-Backward Reasoning and Prophecy

2026-04-16 · Eden Frenkel, Kenneth L. McMillan, Oded Padon, Sharon Shoham

General AI

We propose an incremental approach for safety proofs that decomposes a proof with a complex inductive invariant into a sequence of simpler proof steps. Our proof system combines rules for (i) forward reasoning using inductive invariants, (ii) backward reasoning using inductive invariants of a time-reversed system, and …

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Compositional Design, Implementation, and Verification of Swarms (Technical Report)

2026-04-17 · Florian Furbach, Lucas Clorius, Roland Kuhn, Hernán Melgratti, Alceste Scalas, Emilio Tuosto

General AI

Swarm protocols are a recently introduced formalism for specifying, implementing, and verifying peer-to-peer systems called swarms. A swarm consists of distributed agents called machines that communicate by asynchronous event propagation. Following a local-first model, each machine can progress without requiring contin…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Geometric regularization of autoencoders via observed stochastic dynamics

2026-04-17 · Sean Hill, Felix X. -F. Ye

General AI

Stochastic dynamical systems with slow or metastable behavior evolve, on long time scales, on an unknown low-dimensional manifold in high-dimensional ambient space. Building a reduced simulator from short-burst ambient ensembles is a long-standing problem: local-chart methods like ATLAS suffer from exponential landmark…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Phase transitions in Doi-Onsager, Noisy Transformer, and other multimodal models

2026-04-17 · Kyunghoo Mun, Matthew Rosenzweig

General AI

We study phase transitions for repulsive-attractive mean-field free energies on the circle. For a $\frac{1}{n+1}$-periodic interaction whose Fourier coefficients satisfy a certain decay condition, we prove that the critical coupling strength $K_c$ coincides with the linear stability threshold $K_\#$ of the uniform dist…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Real-Time Solution-Seeking for Game-Theoretic Autonomous Driving via Time-Distributed Iterations

2026-04-17 · Shaoqing Liu, Mushuang Liu

General AI

Computational complexity has been a major challenge in game-theoretic model predictive control (GT-MPC), as real-time solutions to a game (e.g., Nash equilibria (NEs)) have to be computed at each sampling instant of an MPC. This challenge is especially critical in autonomous driving, where interactions may involve many…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

AVISE: Framework for Evaluating the Security of AI Systems

2026-04-22 · Mikko Lempinen, Joni Kemppainen, Niklas Raesalmi

General AI

As artificial intelligence (AI) systems are increasingly deployed across critical domains, their security vulnerabilities pose growing risks of high-profile exploits and consequential system failures. Yet systematic approaches to evaluating AI security remain underdeveloped. In this paper, we introduce AVISE (AI Vulner…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Modularity, Extensions and Connectivity in Infinite Matroids

2026-04-22 · Mattias Ehatamm, Peter Nelson, Fernanda Rivera Omana

General AI

We generalize the well-studied notion of a modular pair of a finite matroid to arbitrary families of sets in infinite matroids, and use it to develop the theory of infinite matroids in several as-yet-unexplored areas. Our results include a complete theory of single-element extensions, a description of the relationship …

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Foundation models for discovering robust biomarkers of neurological disorders from dynamic functional connectivity

2026-04-23 · Deepank Girish, Yi Hao Chan, Sukrit Gupta, Jing Xia, Jagath C. Rajapakse

General AI

Several brain foundation models (FM) have recently been proposed to predict brain disorders by modelling dynamic functional connectivity (FC). While they demonstrate remarkable model performance and zero- or few-shot generalization, the salient features identified as potential biomarkers are yet to be thoroughly evalua…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Low-Rank Adaptation Redux for Large Models

2026-04-23 · Bingcong Li, Yilang Zhang, Georgios B. Giannakis

General AI

Low-rank adaptation (LoRA) has emerged as the de facto standard for parameter-efficient fine-tuning (PEFT) of foundation models, enabling the adaptation of billion-parameter networks with minimal computational and memory overhead. Despite its empirical success and rapid proliferation of variants, it remains elusive whi…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Novelty-Based Generation of Continuous Landscapes with Diverse Local Optima Networks

2026-04-23 · Kippei Mizuta, Shoichiro Tanaka, Shuhei Tanaka, Toshiharu Hatanaka

General AI

Local Optima Networks (LONs) represent the global structure of search spaces as graphs, but their construction requires iterative execution of a search algorithm to find local optima and approximate transitions between Basins of Attraction (BoAs). In continuous optimization, this high computational cost prevents system…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

A dataset of early blockchain-registered AI agents on Ethereum

2026-04-24 · Yulin Liu

General AI

This study presents a structured dataset of blockchain-registered artificial intelligence agents under the ERC-8004 standard on Ethereum. The dataset integrates on-chain identity records, minting transactions, transfer events, reputation summaries, and individual feedback records, together with resolved off-chain metad…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Relaxation-Informed Training of Neural Network Surrogate Models

2026-04-24 · Calvin Tsay

General AI

ReLU neural networks trained as surrogate models can be embedded exactly in mixed-integer linear programs (MILPs), enabling global optimization over the learned function. The tractability of the resulting MILP depends on structural properties of the network, i.e., the number of binary variables in associated formulatio…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Spend Less, Fit Better: Budget-Efficient Scaling Law Fitting via Active Experiment Selection

2026-04-24 · Sijie Li, Shanda Li, Haowei Lin, Weiwei Sun, Ameet Talwalkar, Yiming Yang

General AI

Scaling laws are used to plan multi-million-dollar training runs, but fitting those laws can itself cost millions. In modern large-scale workflows, assembling a sufficiently informative set of pilot experiments is already a major budget-allocation problem rather than a routine preprocessing step. We formulate scaling-l…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Vibe coding for clinicians: democratising bespoke software development for digital health innovation

2026-04-24 · Ariel Yuhan Ong, Iain Livingstone, Caroline Kilduff, Mertcan Sevgi, David A Merle, Eden Ruffell, Pearse A Keane, Fares Antaki

General AI

Clinicians often face workflow problems that are perceived as either too bespoke or low stakes to attract commercial attention. Historically, most do not have the technical knowledge to address these problems, but the recent emergence of "vibe coding" presents a transformative opportunity. Vibe coding refers to the co-…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Learn&Drop: Fast Learning of CNNs based on Layer Dropping

2026-04-25 · Giorgio Cruciata, Luca Cruciata, Liliana Lo Presti, Jan Van Gemert, Marco La Cascia

General AI

This paper proposes a new method to improve the training efficiency of deep convolutional neural networks. During training, the method evaluates scores to measure how much each layer's parameters change and whether the layer will continue learning or not. Based on these scores, the network is scaled down such that the …

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Bayesian inference for hidden Markov models under genuine multimodality with application to ecological time series

2026-04-27 · Marco A. Gallegos-Herrada, Vianey Leos-Barajas, Jeffrey S. Rosenthal

General AI

Bayesian inference in hidden Markov models (HMMs) can be challenging due to the presence of multimodality in the likelihood function, and consequently in the joint posterior distribution, even after correcting for label switching. The parallel tempering (PT) algorithm, a state-space augmentation method, is a widely use…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Distributional Robustness of Linear Contracts

2026-04-27 · Shiliang Zuo

General AI

Linear contracts are ubiquitous in practice, yet optimal contract theory often prescribes complex, nonlinear structures. We provide a distributional robustness justification for linear contracts. We study a principal-agent problem where the agent exerts costly effort across multiple tasks, generating a stochastic signa…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Personalized Worked Example Generation from Student Code Submissions using Pattern-based Knowledge Components

2026-04-27 · Griffin Pitts, Muntasir Hoq, Peter Brusilovsky, Narges Norouzi, Arto Hellas, Juho Leinonen, Bita Akram

General AI

Adaptive programming practice often relies on fixed libraries of worked examples and practice problems, which require substantial authoring effort and may not correspond well to the logical errors and partial solutions students produce while writing code. As a result, students may receive learning content that does not…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Verification of Correlated Equilibria in Concurrent Reachability Games

2026-04-27 · Senthil Rajasekaran, Jean-François Raskin, Moshe Y. Vardi

General AI

As part of an effort to apply the rigorous guarantees of formal verification to multi-agent systems, the field of equilibrium analysis, also called rational verification, studies equilibria in multiplayer games to reason about system-level properties such as safety and scalability. While most prior work focuses on dete…

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.0

Representation Alignment for Just Image Transformers is not Easier than You Think

2026-03-15 · Jaeyo Shin, Jiwook Kim, Hyunjung Shim

General AI

Representation Alignment (REPA) has emerged as a simple way to accelerate Diffusion Transformers training in latent space. At the same time, pixel-space diffusion transformers such as Just image Transformers (JiT) have attracted growing attention because they remove a dependency on a pretrained tokenizer, and then avoi…

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.0

Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration

2026-03-25 · Danil Tokhchukov, Aysel Mirzoeva, Andrey Kuznetsov, Konstantin Sobolev

General AI

In this paper, we uncover the hidden potential of Diffusion Transformers (DiTs) to significantly enhance generative tasks. Through an in-depth analysis of the denoising process, we demonstrate that introducing a single learned scaling parameter can significantly improve the performance of DiT blocks. Building on this i…

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.0

WAFT-Stereo: Warping-Alone Field Transforms for Stereo Matching

2026-03-25 · Yihan Wang, Jia Deng

General AI

We introduce WAFT-Stereo, a simple and effective warping-based method for stereo matching. WAFT-Stereo demonstrates that cost volumes, a common design used in many leading methods, are not necessary for strong performance and can be replaced by warping with improved efficiency. WAFT-Stereo ranks first on ETH3D, KITTI a…

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.0

PMT: Plain Mask Transformer for Image and Video Segmentation with Frozen Vision Encoders

2026-03-26 · Niccolò Cavagnero, Narges Norouzi, Gijs Dubbelman, Daan de Geus

General AI

Vision Foundation Models (VFMs) pre-trained at scale enable a single frozen encoder to serve multiple downstream tasks simultaneously. Recent VFM-based encoder-only models for image and video segmentation, such as EoMT and VidEoMT, achieve competitive accuracy with remarkably low latency, yet they require finetuning th…

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.0

S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation

2026-03-26 · Ligong Han, Hao Wang, Han Gao, Kai Xu, Akash Srivastava

General AI

Block-diffusion language models offer a promising path toward faster-than-autoregressive generation by combining block-wise autoregressive decoding with within-block parallel denoising. However, in the few-step regime needed for practical acceleration, standard confidence-thresholded decoding is often brittle: aggressi…

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.0

Think over Trajectories: Leveraging Video Generation to Reconstruct GPS Trajectories from Cellular Signaling

2026-03-27 · Ruixing Zhang, Hanzhang Jiang, Leilei Sun, Liangzhe Han, Jibin Wang, Weifeng Lv

General AI

Mobile devices continuously interact with cellular base stations, generating massive volumes of signaling records that provide broad coverage for understanding human mobility. However, such records offer only coarse location cues (e.g., serving-cell identifiers) and therefore limit their direct use in applications that…

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.0

HISA: Efficient Hierarchical Indexing for Fine-Grained Sparse Attention

2026-03-30 · Yufei Xu, Fanxu Meng, Fan Jiang, Yuxuan Wang, Ruijie Zhou, Jiexi Wu, Zhixin Pan, Zhaohui Wang, Xiaojuan Tang, Wenjie Pei, Tongxuan Liu, Di yin, Xing Sun, Muhan Zhang

General AI

Token-level sparse attention mechanisms, exemplified by DeepSeek Sparse Attention (DSA), achieve fine-grained key selection by scoring every historical token for each query using a lightweight indexer, and then computing attention only over the selected subset. While the downstream sparse attention scales efficiently, …

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.0

Training a Student Expert via Semi-Supervised Foundation Model Distillation

2026-04-04 · Pardis Taghavi, Tian Liu, Renjie Li, Reza Langari, Zhengzhong Tu

General AI

Foundation models deliver strong perception but are often too computationally heavy to deploy, and adapting them typically requires costly annotations. We introduce a semi-supervised knowledge distillation (SSKD) framework that compresses pre-trained vision foundation models (VFMs) into compact experts using limited la…

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.0

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

2026-04-06 · DataFlow Team, Bohan Zeng, Daili Hua, Kaixin Zhu, Yifan Dai, Bozhou Li, Yuran Wang, Chengzhuo Tong, Yifan Yang, Mingkun Chang, Jianbin Zhao, Zhou Liu, Hao Liang, Xiaochen Ma, Ruichuan An, Junbo Niu, Zimo Meng, Tianyi Bai, Meiyi Qiang, Huanyao Zhang, Zhiyou Xiao, Tianyu Guo, Qinhan Yu, Runhao Zhao, Zhengpin Li, Xinyi Huang, Yisheng Pan, Yiwen Tang, Yang Shi, Yue Ding, Xinlong Chen, Hongcheng Gao, Minglei Shi, Jialong Wu, Zekun Wang, Yuanxing Zhang, Xintao Wang, Pengfei Wan, Yiren Song, Mike Zheng Shou, Wentao Zhang

General AI

World models have garnered significant attention as a promising research direction in artificial intelligence, yet a clear and unified definition remains lacking. In this paper, we introduce OpenWorldLib, a comprehensive and standardized inference framework for Advanced World Models. Drawing on the evolution of world m…

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.0

POS-ISP: Pipeline Optimization at the Sequence Level for Task-aware ISP

2026-04-08 · Jiyun Won, Heemin Yang, Woohyeok Kim, Jungseul Ok, Sunghyun Cho

General AI

Recent work has explored optimizing image signal processing (ISP) pipelines for various tasks by composing predefined modules and adapting them to task-specific objectives. However, jointly optimizing module sequences and parameters remains challenging. Existing approaches rely on neural architecture search (NAS) or st…

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.0

ViVa: A Video-Generative Value Model for Robot Reinforcement Learning

2026-04-09 · Jindi Lv, Hao Li, Jie Li, Yifei Nie, Fankun Kong, Yang Wang, Xiaofeng Wang, Zheng Zhu, Chaojun Ni, Qiuping Deng, Hengtao Li, Jiancheng Lv, Guan Huang

General AI

Vision-language-action (VLA) models have advanced robot manipulation through large-scale pretraining, but real-world deployment remains challenging due to partial observability and delayed feedback. Reinforcement learning addresses this via value functions, which assess task progress and guide policy improvement. Howev…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

FCL-COD: Weakly Supervised Camouflaged Object Detection with Frequency-aware and Contrastive Learning

2026-03-24 · Jingchen Ni, Quan Zhang, Dan Jiang, Keyu Lv, Ke Zhang, Chun Yuan

General AI

Existing camouflage object detection (COD) methods typically rely on fully-supervised learning guided by mask annotations. However, obtaining mask annotations is time-consuming and labor-intensive. Compared to fully-supervised methods, existing weakly-supervised COD methods exhibit significantly poorer performance. Eve…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

OptiSAR-Net++: A Large-Scale Benchmark and Transformer-Free Framework for Cross-Domain Remote Sensing Visual Grounding

2026-03-25 · Xiaoyu Tang, Jun Dong, Jintao Cheng, Rui Fan

General AI

Remote sensing visual grounding (RSVG) aims to localize specific targets in remote sensing images using natural language expressions. However, existing methods are restricted to single-sensor domains, i.e., either optical or synthetic aperture radar (SAR), limiting their real-world applicability. In this paper, we intr…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

Conchordal: Emergent Harmony via Direct Cognitive Coupling in a Psychoacoustic Landscape

2026-03-26 · Koichi Takahashi

General AI

This paper introduces Conchordal, a bio-acoustic instrument for generative composition whose sonic agents are governed by artificial life dynamics within a psychoacoustic fitness landscape. The system is built on Direct Cognitive Coupling (DCC), a design principle requiring that generative dynamics operate directly wit…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

MegaFlow: Zero-Shot Large Displacement Optical Flow

2026-03-26 · Dingxi Zhang, Fangjinhua Wang, Marc Pollefeys, Haofei Xu

General AI

Accurate estimation of large displacement optical flow remains a critical challenge. Existing methods typically rely on iterative local search or/and domain-specific fine-tuning, which severely limits their performance in large displacement and zero-shot generalization scenarios. To overcome this, we introduce MegaFlow…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

On the Formalization of Network Topology Matrices in HOL

2026-03-26 · Kubra Aksoy, Adnan Rashid, Osman Hasan, Sofiene Tahar

General AI

Network topology matrices are algebraic representations of graphs that are widely used in modeling and analysis of various applications including electrical circuits, communication networks and transportation systems. In this paper, we propose to use Higher-Order-Logic (HOL) based interactive theorem proving to formali…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

2026-03-26 · Yawen Luo, Xiaoyu Shi, Junhao Zhuang, Yutian Chen, Quande Liu, Xintao Wang, Pengfei Wan, Tianfan Xue

General AI

Multi-shot video generation is crucial for long narrative storytelling, yet current bidirectional architectures suffer from limited interactivity and high latency. We propose ShotStream, a novel causal multi-shot architecture that enables interactive storytelling and efficient on-the-fly frame generation. By reformulat…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

Let Triggers Control: Frequency-Aware Dropout for Effective Token Control

2026-03-28 · Junyoung Koh, Hoyeon Moon, Dongha Kim, Seungmin Lee, Sanghyun Park, Min Song

General AI

Text-to-image models such as Stable Diffusion have achieved unprecedented levels of high-fidelity visual synthesis. As these models advance, personalization of generative models -- commonly facilitated through Low-Rank Adaptation (LoRA) with a dedicated trigger token -- has become a significant area of research. Previo…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

A Cross-Scale Decoder with Token Refinement for Off-Road Semantic Segmentation

2026-03-30 · Seongkyu Choi Jhonghyun An

General AI

Off-road semantic segmentation is fundamentally challenged by irregular terrain, vegetation clutter, and inherent annotation ambiguity. Unlike urban scenes with crisp object boundaries, off-road environments exhibit strong class-level similarity among terrain categories, resulting in thick and uncertain transition regi…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

FlowIt: Global Matching for Optical Flow with Confidence-Guided Refinement

2026-03-30 · Sadra Safadoust, Fabio Tosi, Matteo Poggi, Fatma Güney

General AI

We present FlowIt, a novel architecture for optical flow estimation designed to robustly handle large pixel displacements. At its core, FlowIt leverages a hierarchical transformer architecture that captures extensive global context, enabling the model to effectively model long-range correspondences. To overcome the lim…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

FocusVLA: Focused Visual Utilization for Vision-Language-Action Models

2026-03-30 · Yichi Zhang, Weihao Yuan, Yizhuo Zhang, Xidong Zhang, Jia Wan

General AI

Vision-Language-Action (VLA) models improve action generation by conditioning policies on rich vision-language information. However, current auto-regressive policies are constrained by three bottlenecks: (1) architectural bias drives models to overlook visual details, (2) an excessive number of visual tokens makes atte…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

PoseDreamer: Scalable and Photorealistic Human Data Generation Pipeline with Diffusion Models

2026-03-30 · Lorenza Prospero, Orest Kupyn, Ostap Viniavskyi, João F. Henriques, Christian Rupprecht

General AI

Acquiring labeled datasets for 3D human mesh estimation is challenging due to depth ambiguities and the inherent difficulty of annotating 3D geometry from monocular images. Existing datasets are either real, with manually annotated 3D geometry and limited scale, or synthetic, rendered from 3D engines that provide preci…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

SHOW3D: Capturing Scenes of 3D Hands and Objects in the Wild

2026-03-30 · Patrick Rim, Kevin Harris, Braden Copple, Shangchen Han, Xu Xie, Ivan Shugurov, Sizhe An, He Wen, Alex Wong, Tomas Hodan, Kun He

General AI

Accurate 3D understanding of human hands and objects during manipulation remains a significant challenge for egocentric computer vision. Existing hand-object interaction datasets are predominantly captured in controlled studio settings, which limits both environmental diversity and the ability of models trained on such…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

Superintelligence and Law

2026-03-30 · Noam Kolt

General AI

The prospect of artificial superintelligence -- AI agents that can generally outperform humans in cognitive tasks and economically valuable activities -- will transform the legal order as we know it. Operating autonomously or under only limited human oversight, AI agents will assume a growing range of roles in the lega…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

Consensus-Based Multi-Objective Controller Synthesis

2026-03-31 · Ingyu Jang, Leila J. Bridgeman

General AI

Despite longstanding interest, controller synthesis remains challenging for networks of heterogeneous, nonlinear agents. Moreover, the requirements for computational scalability and information privacy have become increasingly critical. This paper introduces a dissipativity-based distributed controller synthesis framew…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

Performative Scenario Optimization

2026-03-31 · Quanyan Zhu, Zhengye Han

General AI

This paper introduces a performative scenario optimization framework for decision-dependent chance-constrained problems. Unlike classical stochastic optimization, we account for the feedback loop where decisions actively shape the underlying data-generating process. We define performative solutions as self-consistent e…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

Flexibility allocation in random bipartite matching markets: exact matching rates and dominance regimes

2026-04-02 · Taha Ameen, Flore Sentenac, Sophie H. Yu

General AI

This paper studies how a fixed flexibility budget should be allocated across the two sides of a balanced bipartite matching market. We model compatibilities via a sparse bipartite stochastic block model in which flexible agents are more likely to connect with agents on the opposite side, and derive an exact variational…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

Hypernetwork-Conditioned Reinforcement Learning for Robust Control of Fixed-Wing Aircraft under Actuator Failures

2026-04-03 · Dennis Marquis, Mazen Farhood

General AI

This paper presents a reinforcement learning-based path-following controller for a fixed-wing small uncrewed aircraft system (sUAS) that is robust to certain actuator failures. The controller is conditioned on a parameterization of actuator faults using hypernetwork-based adaptation. We consider parameter-efficient for…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

Comparing Human Oversight Strategies for Computer-Use Agents

2026-04-06 · Chaoran Chen, Zhiping Zhang, Zeya Chen, Eryue Xu, Yinuo Yang, Ibrahim Khalilov, Simret A Gebreegziabher, Yanfang Ye, Ziang Xiao, Yaxing Yao, Tianshi Li, Toby Jia-Jun Li

General AI

LLM-powered computer-use agents (CUAs) are shifting users from direct manipulation to supervisory coordination. Existing oversight mechanisms, however, have largely been studied as isolated interface features, making broader oversight strategies difficult to compare. We conceptualize CUA oversight as a structural coord…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

Fully Procedural Synthetic Data from Simple Rules for Multi-View Stereo

2026-04-06 · Zeyu Ma, Alexander Raistrick, Jia Deng

General AI

In this paper, we explore the design space of procedural rules for multi-view stereo (MVS). We demonstrate that we can generate effective training data using SimpleProc: a new, fully procedural generator driven by a very small set of rules using Non-Uniform Rational Basis Splines (NURBS), as well as basic displacement …

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

PointTPA: Dynamic Network Parameter Adaptation for 3D Scene Understanding

2026-04-06 · Siyuan Liu, Chaoqun Zheng, Xin Zhou, Tianrui Feng, Dingkang Liang, Xiang Bai

General AI

Scene-level point cloud understanding remains challenging due to diverse geometries, imbalanced category distributions, and highly varied spatial layouts. Existing methods improve object-level performance but rely on static network parameters during inference, limiting their adaptability to dynamic scene data. We propo…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

Rapid convergence of tempering chains to multimodal Gibbs measures

2026-04-06 · Seungjae Son

General AI

We study the spectral gaps of parallel and simulated tempering chains targeting multimodal Gibbs measures. In particular, we consider chains constructed from Metropolis random walks that preserve the Gibbs distributions at a sequence of harmonically spaced temperatures. We prove that their spectral gaps admit polynomia…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision

2026-04-06 · Hyunsoo Cha, Wonjung Woo, Byungjun Kim, Hanbyul Joo

General AI

We present Vanast, a unified framework that generates garment-transferred human animation videos directly from a single human image, garment images, and a pose guidance video. Conventional two-stage pipelines treat image-based virtual try-on and pose-driven animation as separate processes, which often results in identi…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

Action Images: End-to-End Policy Learning via Multiview Video Generation

2026-04-07 · Haoyu Zhen, Zixian Gao, Qiao Sun, Yilin Zhao, Yuncong Yang, Yilun Du, Tsun-Hsuan Wang, Yi-Ling Qiao, Chuang Gan

General AI

World action models (WAMs) have emerged as a promising direction for robot policy learning, as they can leverage powerful video backbones to model the future states. However, existing approaches often rely on separate action modules, or use action representations that are not pixel-grounded, making it difficult to full…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

On the Convergence of an Opinion-Action Coevolution Model with Bounded Confidence

2026-04-07 · Chen Song, Angela Fontan, Rong Su, Julien M. Hendrickx, Vladimir Cvetkovic, Karl H. Johansson

General AI

This paper presents a theoretical convergence analysis for an opinion-action coevolution model that integrates the opinion updating rule of the Hegselmann-Krause model with a utility-based decision-making mechanism. The model is reformulated into an augmented state-space representation, where the state matrix induces a…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

The Character Error Vector: Decomposable errors for page-level OCR evaluation

2026-04-07 · Jonathan Bourne, Mwiza Simbeye, Joseph Nockels

General AI

The Character Error Rate (CER) is a key metric for evaluating the quality of Optical Character Recognition (OCR). However, this metric assumes that text has been perfectly parsed, which is often not the case. Under page-parsing errors, CER becomes undefined, limiting its use as a metric and making evaluating page-level…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

Value Mirror Descent for Reinforcement Learning

2026-04-07 · Zhichao Jia, Guanghui Lan

General AI

Value iteration-type methods have been extensively studied for computing a nearly optimal value function in reinforcement learning (RL). Under a generative sampling model, these methods can achieve sharper sample complexity than policy optimization approaches, particularly in their dependence on the discount factor. In…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

E-3DPSM: A State Machine for Event-Based Egocentric 3D Human Pose Estimation

2026-04-09 · Mayur Deshmukh, Hiroyasu Akada, Helge Rhodin, Christian Theobalt, Vladislav Golyanik

General AI

Event cameras offer multiple advantages in monocular egocentric 3D human pose estimation from head-mounted devices, such as millisecond temporal resolution, high dynamic range, and negligible motion blur. Existing methods effectively leverage these properties, but suffer from low 3D estimation accuracy, insufficient in…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

Learning vs. Optimizing Bidders in Budgeted Auctions

2026-04-09 · Giannis Fikioris, Balasubramanian Sivan, Éva Tardos

General AI

The study of repeated interactions between a learner and a utility-maximizing optimizer has yielded deep insights into the manipulability of learning algorithms. However, existing literature primarily focuses on independent, unlinked rounds, largely ignoring the ubiquitous practical reality of budget constraints. In th…

Review
pending
Role
unreviewed
Read
later
arxiv Score 3.8

When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models

2026-04-09 · Zhengyang Sun, Yu Chen, Xin Zhou, Xiaofan Li, Xiwu Chen, Dingkang Liang, Xiang Bai

General AI

Text-to-video diffusion models have enabled open-ended video synthesis, but often struggle with generating the correct number of objects specified in a prompt. We introduce NUMINA , a training-free identify-then-guide framework for improved numerical alignment. NUMINA identifies prompt-layout inconsistencies by selecti…

Review
pending
Role
unreviewed
Read
later