Research Paper Cockpit

Daily Digest - 2026-05-09

Papers first seen in this daily snapshot.

Daily Archives

Quick jump into generated daily digests.

Research Workflow

Latest digest: 2026-05-13.

Papers

65 visible entries

arxiv Score 25.8

STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?

2026-05-07 · Hanxiang Chao, Yihan Bai, Rui Sheng, Tianle Li, Yushi Sun

Research Track A · General AI

Large Language Model (LLM) agents are increasingly expected to maintain coherent, long-term personalized memory, yet current benchmarks primarily measure static fact retrieval, overlooking the ability to revise stored beliefs when new evidence emerges. We identify a critical and underexplored failure mode, Implicit Con…

Review
pending
Role
unreviewed
Read
now
arxiv Score 23.5

Balancing Stability and Plasticity in Sequentially Trained Early-Exiting Neural Networks

2026-05-06 · Alaa Zniber, Ouassim Karrakchou, Mounir Ghogho

Research Track A · General AI

Early-exiting neural networks enable adaptive inference by allowing inputs to exit at intermediate classifiers, reducing computation for easy samples while maintaining high accuracy. In practice, exits can be trained sequentially by incrementally adding them to a shared backbone; however, this sequential training can c…

Review
pending
Role
unreviewed
Read
now
arxiv Score 22.8

MedHorizon: Towards Long-context Medical Video Understanding in the Wild

2026-05-07 · Bodong Du, Bowen Liu, Yang Yu, Xinpeng Ding, Zhiheng Wu, Shuning Wang, Shuo Nie, Naiming Liu, Qifeng Chen, Yangqiu Song, Xiaomeng Li

General AI

Medical multimodal large language models (MLLMs) have advanced image understanding and short-video analysis, but real clinical review often requires full-procedure video understanding. Unlike general long videos, medical procedures contain highly redundant anatomical views, while decisive evidence is temporally sparse,…

Review
pending
Role
unreviewed
Read
now
huggingface Score 22.4

Audio-Visual Intelligence in Large Foundation Models

2026-05-05 · You Qin, Kai Liu, Shengqiong Wu, Kai Wang, Shijian Deng, Yapeng Tian, Junbin Xiao, Yazhou Xing, Yinghao Ma, Bobo Li, Roger Zimmermann, Lei Cui, Furu Wei, Jiebo Luo, Hao Fei

General AI

Audio-Visual Intelligence (AVI) has emerged as a central frontier in artificial intelligence, bridging auditory and visual modalities to enable machines that can perceive, generate, and interact in the multimodal real world. In the era of large foundation models, joint modeling of audio and vision has become increasing…

Review
pending
Role
unreviewed
Read
now
arxiv Score 22.0

CRAFT: Forgetting-Aware Intervention-Based Adaptation for Continual Learning

2026-05-07 · Md Anwar Hossen, Fatema Siddika, Juan Pablo Munoz, Tanya Roosta, Ali Jannesari

Research Track A · General AI

Large language models (LLMs) can acquire new capabilities through fine-tuning, but continual adaptation often leads to catastrophic forgetting. We propose CRAFT, a continual learning framework that avoids updating model weights by instead learning low-rank interventions on hidden representations. CRAFT proceeds in thre…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.5

Attribution-Guided Continual Learning for Large Language Models

2026-05-06 · Yazheng Liu, Yuxuan Wan, Rui Xu, Xi Zhang, Sihong Xie, Hui Xiong

Research Track A · General AI

Large language models (LLMs) often suffer from catastrophic forgetting in continual learning: after learning new tasks sequentially, they perform worse on earlier tasks. Existing methods mitigate catastrophic forgetting by data replay, parameter freezing, or regularization. However, these methods lack semantic awarenes…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.4

HERCULES: Hardware-Efficient, Robust, Continual Learning Neural Architecture Search

2026-05-03 · Matteo Gambella, Fabrizio Pittorino, Manuel Roveri

Research Track A · General AI

Neural Architecture Search (NAS) has emerged as a powerful framework for automatically discovering neural architectures that balance accuracy and efficiency. However, as AI transitions from static benchmarks to real-world deployment, the traditional focus on hardware-aware efficiency is no longer sufficient. We observe…

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.8

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

2026-05-07 · Tianle Wang, Zhaoyang Wang, Guangchen Lan, Xinpeng Wei, Sipeng Zhang, Guanwen Qiu, Abulhair Saparov

General AI

Reinforcement learning (RL) has been applied to improve large language model (LLM) reasoning, yet the systematic study of how training scales with task difficulty has been hampered by the lack of controlled, scalable environments. We introduce ScaleLogic, a synthetic logical reasoning framework that offers independent …

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.5

Continual Knowledge Updating in LLM Systems: Learning Through Multi-Timescale Memory Dynamics

2026-05-06 · Andreas Pattichis, Constantine Dovrolis

Research Track A · General AI

LLMs are trained once, then deployed into a world that never stops changing. External memory compensates for this, but most systems manage it explicitly rather than letting it adapt on its own. Biological memory works differently: coupled multi-timescale dynamics make new associations immediately usable, strengthen wha…

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.0

CoMemNet: Contrastive Sampling with Memory Replay Network for Continual Traffic Prediction

2026-05-07 · Mei Wu, Wenchao Weng, Wenxin Su, Wenjie Tang, Wei Zhou

Research Track A · General AI

In recent years, the integration of non-topological space modeling with temporal learning methods has emerged as an effective approach for capturing spatio-temporal information in non-Euclidean graphs. However, most existing methods rely on static underlying graph structures, which are inadequate for capturing the cont…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.9

cotomi Act: Learning to Automate Work by Watching You

2026-05-04 · Masafumi Oyamada, Kunihiro Takeoka, Kosuke Akimoto, Ryoma Obara, Masafumi Enomoto, Haochen Zhang, Daichi Haraguchi, Takuya Tamura

Research Track B · General AI

What if a browser agent could learn your work simply by watching you do it? We present cotomi Act, a browser-based computer-using agent that combines reliable multi-step task execution with persistent organizational knowledge learned from user behavior. For execution, an agent scaffold with adaptive lazy observation, v…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.0

You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translation

2026-05-06 · Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera, Stjepan Picek, Saraga Sakthidharan

Research Track A · General AI

The open-source ecosystem has accelerated the democratization of Large Language Models (LLMs) through the public distribution of specialized Low-Rank Adaptation (LoRA) modules. However, integrating these third-party adapters often induces catastrophic forgetting of the base model's foundational safety alignment. Restor…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.8

Cited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agents

2026-05-07 · Hailey Onweller, Elias Lumer, Austin Huber, Pia Ramchandani, Vamse Kumar Subbiah, Corey Feld

General AI

Large language models (LLMs) power deep research agents that synthesize information from hundreds of web sources into cited reports, yet these citations cannot be reliably verified. Current approaches either trust models to self-cite accurately, risking bias, or employ retrieval-augmented generation (RAG) that does not…

Review
pending
Role
unreviewed
Read
now
huggingface Score 18.4

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

2026-05-03 · Zhuofeng Li, Haoxiang Zhang, Cong Wei, Pan Lu, Ping Nie, Yi Lu, Yuyang Bai, Shangbin Feng, Hangxiao Zhu, Ming Zhong, Yuyu Zhang, Jianwen Xie, Yejin Choi, James Zou, Jiawei Han, Wenhu Chen, Jimmy Lin, Dongfu Jiang, Yu Zhang

General AI

Modern retrieval systems, whether lexical or semantic, expose a corpus through a fixed similarity interface that compresses access into a single top-k retrieval step before reasoning. This abstraction is efficient, but for agentic search, it becomes a bottleneck: exact lexical constraints, sparse clue conjunctions, loc…

Review
pending
Role
unreviewed
Read
now
huggingface Score 18.0

Balanced Aggregation: Understanding and Fixing Aggregation Bias in GRPO

2026-04-14 · Zhiyuan Zeng, Jiameng Huang, Zhangyue Yin, Jiashuo Liu, Ziniu Li, Bingrui Li, Yuhao Wu, Yining Zheng, Ge Zhang, Wenhao Huang, Xipeng Qiu

General AI

Reinforcement learning with verifiable rewards (RLVR) has become a central paradigm for improving reasoning and code generation in large language models, and GRPO-style training is widely adopted for its simplicity and effectiveness. However, an important design choice remains underexplored: how token-level policy grad…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.8

Beyond Negative Rollouts: Positive-Only Policy Optimization with Implicit Negative Gradients

2026-05-07 · Mingwei Xu, Hao Fang

General AI

Reinforcement learning with verifiable rewards (RLVR), due to the deterministic verification, becomes a dominant paradigm for enhancing the reasoning ability of large language models (LLMs). The community witnesses the rapid change from the Proximal Policy Optimization (PPO) to Group Relative Policy Optimization (GRPO)…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.8

GlazyBench: A Benchmark for Ceramic Glaze Property Prediction and Image Generation

2026-05-07 · Ziyu Zhai, Siyou Li, Juexi Shao, Juntao Yu

General AI

Developing ceramic glazes is a costly, time-consuming process of trial and error due to complex chemistry, placing a significant burden on independent artists. While recent advances in multimodal AI offer a modern solution, the field lacks the large-scale datasets required to train these models. We propose GlazyBench, …

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.8

SkillOS: Learning Skill Curation for Self-Evolving Agents

2026-05-07 · Siru Ouyang, Jun Yan, Yanfei Chen, Rujun Han, Zifeng Wang, Bhavana Dalvi Mishra, Rui Meng, Chun-Liang Li, Yizhu Jiao, Kaiwen Zha, Maohao Shen, Vishy Tirumalashetty, George Lee, Jiawei Han, Tomas Pfister, Chen-Yu Lee

General AI

LLM-based agents are increasingly deployed to handle streaming tasks, yet they often remain one-off problem solvers that fail to learn from past interactions. Reusable skills distilled from experience provide a natural substrate for self-evolution, where high-quality skill curation serves as the key bottleneck. Existin…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.8

Detecting Time Series Anomalies Like an Expert: A Multi-Agent LLM Framework with Specialized Analyzers

2026-05-07 · Hyeongwon Kang, Jeongseob Kim, Jinwoo Park, Pilsung Kang

General AI

Recent studies have explored large language models for time-series anomaly detection, yet existing approaches often rely on a single general-purpose model to directly infer anomaly indices or intervals, limiting controllability, interpretability, and reliability for complex anomaly patterns. We propose SAGE (Specialize…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.8

Patch2Vuln: Agentic Reconstruction of Vulnerabilities from Linux Distribution Binary Patches

2026-05-07 · Isaac David, Arthur Gervais

General AI

Security updates create a short but important window in which defenders and attackers can compare vulnerable and patched software. Yet in many operational settings, the most accessible artifacts are binary packages rather than source patches or advisory text. This paper asks whether a language-model agent, restricted t…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.8

Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval

2026-05-07 · Zeyu Yang, Qi Ma, Jason Chen, Anshumali Shrivastava

General AI

Retrieval-augmented agents are increasingly the interface to large organizational knowledge bases, yet most still treat retrieval as a black box: they issue exploratory queries, inspect returned snippets, and iteratively reformulate until useful evidence emerges. This approach resembles how a newcomer searches an unfam…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

MINER: Mining Multimodal Internal Representation for Efficient Retrieval

2026-05-07 · Weien Li, Rui Song, Zeyu Li, Haochen Liu, Gonghao Zhang, Difan Jiao, Zhenwei Tang, Bowei He, Haolun Wu, Xue Liu, Ye Yuan

General AI

Visual document retrieval has become essential for accessing information in visually rich documents. Existing approaches fall into two camps. Late-interaction retrievers achieve strong quality through fine-grained token-level matching but store hundreds of vectors per page, incurring large index footprints and high ser…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction

2026-05-07 · Xiangyuan Xue, Yifan Zhou, Zidong Wang, Shengji Tang, Philip Torr, Wanli Ouyang, Lei Bai, Zhenfei Yin

General AI

Large language models (LLMs) are increasingly used as interactive agents, but optimizing them for long-horizon decision making remains difficult because current methods are largely purely reactive, which weakens both exploration and credit assignment over extended trajectories. In this work, we present Strategic Trajec…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

Verifier-Backed Hard Problem Generation for Mathematical Reasoning

2026-05-07 · Yuhang Lai, Jiazhan Feng, Yee Whye Teh, Ning Miao

General AI

Large Language Models (LLMs) demonstrate strong capabilities for solving scientific and mathematical problems, yet they struggle to produce valid, challenging, and novel problems - an essential component for advancing LLM training and enabling autonomous scientific research. Existing problem generation approaches eithe…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.0

Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

2026-05-07 · Yuxing Liu, Jianyu Wang, Tong Zhang

Research Track A · General AI

Optimizers play an important role in both pretraining and finetuning stages when training large language models (LLMs). In this paper, we present an observation that full finetuning with the same optimizer as in pretraining achieves a better learning-forgetting tradeoff, i.e., forgetting less while achieving the same o…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

OpenWatch: A Multimodal Benchmark for Hand Gesture Recognition on Smartwatches

2026-05-06 · Pietro Bonazzi, Youssef Ahmed, Daniel Eckert, Andrea Ronco, Junjie Zeng, Dengxin Dai, Michele Magno

General AI

Despite widespread adoption of smartwatches worldwide, open-benchmarks for wrist-based gesture recognition remain surprisingly limited. In this work, we introduce the first open-access multi-modal benchmark, OpenWatch, for wrist-based gesture recognition using synchronized inertial and physiological sensing on a commer…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study

2026-05-07 · Hao Dong, Hongzhao Li, Shupan Li, Muhammad Haris Khan, Eleni Chatzi, Olga Fink

General AI

Despite the growing popularity of Multimodal Domain Generalization (MMDG) for enhancing model robustness, it remains unclear whether reported performance gains reflect genuine algorithmic progress or are artifacts of inconsistent evaluation protocols. Current research is fragmented, with studies varying significantly a…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

Debiased Multimodal Personality Understanding through Dual Causal Intervention

2026-05-07 · Yangfu Zhu, Zitong Han, Nianwen Ning, Yuting Wei, Yuandong Wang, Hang Feng, Zhenzhou Shao

General AI

Multimodalpersonalityunderstandingplaysacriticalroleinhuman centered artificial intelligence. Previous work mainly focus on learn-ing rich multimodal representations for video personality under standing. However, they often suffer from potential harm caused by subject bias (e.g., observable age and unobservable mental …

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems

2026-05-07 · Zhexuan Wang, Xuebo Liu, Li Wang, Zifei Shan, Yutong Wang, Zhenxi Song, Min Zhang

General AI

Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts. While the quality of these prompts is pivotal, jointly optimizing them across interacting agents remains a non-trivial challenge, prim…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.0

BAMI: Training-Free Bias Mitigation in GUI Grounding

2026-05-07 · Borui Zhang, Bo Zhang, Bo Wang, Wenzhao Zheng, Yuhao Cheng, Liang Tang, Yiqiang Yan, Jie Zhou, Jiwen Lu

Research Track B · General AI

GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance. Utilizing the proposed \textbf{Masked Prediction Distribution (MPD)} attribution metho…

Review
pending
Role
unreviewed
Read
now
huggingface Score 13.0

BioTool: A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models

2026-05-07 · Xin Gao, Ruiyi Zhang, Meixi Du, Peijia Qin, Pengtao Xie

Research Track A · General AI

Despite the success of large language models (LLMs) on general-purpose tasks, their performance in highly specialized domains such as biomedicine remains unsatisfactory. A key limitation is the inability of LLMs to effectively leverage biomedical tools, which clinical experts and biomedical researchers rely on extensiv…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.0

On the Implicit Reward Overfitting and the Low-rank Dynamics in RLVR

2026-05-07 · Hao Ye, Jisheng Dang, Junfeng Fang, Bimei Wang, Yizhou Zhang, Ning Lv, Wencan Zhang, Hong Peng, Bin Hu, Tat-Seng Chua

Research Track A · General AI

Recent extensive research has demonstrated that the enhanced reasoning capabilities acquired by models through Reinforcement Learning with Verifiable Rewards (RLVR) are primarily concentrated within the rank-1 components. Predicated on this observation, we employed Periodic Rank-1 Substitution and identified a counteri…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generation

2026-05-07 · Omar El Khalifi, Thomas Rossi, Oscar Fossey, Thibault Fouque, Ulysse Mizrahi, Philip Torr, Ivan Laptev, Fabio Pizzati, Baptiste Bellot-Gurlet

General AI

For artistic applications, video generation requires fine-grained control over both performance and cinematography, i.e., the actor's motion and the camera trajectory. We present ActCam, a zero-shot method for video generation that jointly transfers character motion from a driving video into a new scene and enables per…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

NeuroAgent: LLM Agents for Multimodal Neuroimaging Analysis and Research

2026-05-07 · Lujia Zhong, Yihao Xia, Jianwei Zhang, Shuo huang, Jiaxin Yue, Mingyang Xia, Yonggang Shi

General AI

Multimodal neuroimaging analysis often involves complex, modality-specific preprocessing workflows that require careful configuration, quality control, and coordination across heterogeneous toolchains. Beyond preprocessing, downstream statistical analysis and disease classification commonly require task-specific code, …

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.5

Beyond Forgetting in Continual Medical Image Segmentation: A Comprehensive Benchmark Study

2026-05-07 · Bomin Wang, Hangqi Zhou, Yibo Gao, Xiahai Zhuang

Research Track A · General AI

Continual learning (CL) is essential for deploying medical image segmentation models in clinical environments where imaging domains, anatomical targets, and diagnostic tasks evolve over time. However, continual segmentation still faces three main challenges. First, the scenarios for this task remain insufficiently stan…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.0

PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors

2026-05-07 · Xinmiao Huang, Jinwei Hu, Rajarshi Roy, Changshun Wu, Yi Dong, Xiaowei Huang

Research Track B · General AI

Large language model (LLM) agents now execute long, tool-using tasks where final outcome checks can arrive too late for intervention. Online warning requires lightweight prefix monitors over heterogeneous traces, but hand-authored event schemas are brittle and deployment-time LLM judging is costly. We introduce PrefixG…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.8

AI Co-Mathematician: Accelerating Mathematicians with Agentic AI

2026-05-07 · Daniel Zheng, Ingrid von Glehn, Yori Zwols, Iuliya Beloshapka, Lars Buesing, Daniel M. Roy, Martin Wattenberg, Bogdan Georgiev, Tatiana Schmidt, Andrew Cowie, Fernanda Viegas, Dimitri Kanevsky, Vineet Kahlon, Hartmut Maennel, Sophia Alj, George Holland, Alex Davies, Pushmeet Kohli

General AI

We introduce the AI co-mathematician, a workbench for mathematicians to interactively leverage AI agents to pursue open-ended research. The AI co-mathematician is optimized to provide holistic support for the exploratory and iterative reality of mathematical workflows, including ideation, literature search, computation…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.8

Recursive Agent Optimization

2026-05-07 · Apurva Gandhi, Satyaki Chakraborty, Xiangjun Wang, Aviral Kumar, Graham Neubig

General AI

We introduce Recursive Agent Optimization (RAO), a reinforcement learning approach for training recursive agents: agents that can spawn and delegate sub-tasks to new instantiations of themselves recursively. Recursive agents implement an inference-time scaling algorithm that naturally allows agents to scale to longer c…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.5

Shortcut Solutions Learned by Transformers Impair Continual Compositional Reasoning

2026-05-06 · William T. Redman, Erik C. Johnson, Brian Robinson

Research Track A · General AI

Identifying and exploiting common features across domains is at the heart of the human ability to make analogies, and is believed to be crucial for the ability to continually learn. To do this successfully, general and flexible computational strategies must be developed. While the extent to which Transformer neural net…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.5

Scene-Adaptive Continual Learning for CSI-based Human Activity Recognition with Mixture of Experts

2026-05-07 · Wenhan Zheng, Yuyi Mao, Ivan Wang-Hei Ho

Research Track A

Channel state information (CSI)-based human activity recognition (HAR) is vulnerable to performance degradation under domain shifts across varying physical environments. Continual learning (CL) offers a principled way to learn new domains sequentially while preserving past knowledge, but existing CL solutions for CSI-b…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.4

CTM-AI: A Blueprint for General AI Inspired by a Model of Consciousness

2026-04-30 · Haofei Yu, Yining Zhao, Lenore Blum, Manuel Blum, Paul Pu Liang

Research Track B · General AI

Despite remarkable advances, today's AI systems remain narrow in scope, falling short of the flexible, adaptive, and multisensory intelligence that characterizes human capabilities. This gap has fueled longstanding debates about whether AI might one day achieve human-like generality or even consciousness, and whether t…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.8

PSK at SemEval-2026 Task 9: Multilingual Polarization Detection Using Ensemble Gemma Models with Synthetic Data Augmentation

2026-05-06 · Srikar Kashyap Pulipaka

General AI

We present our system for SemEval-2026 Task 9: Multilingual Polarization Detection, a binary classification task spanning 22 languages. Our approach fine-tunes separate Gemma~3 models (12B and 27B parameters) per language using Low-Rank Adaptation (LoRA), augmented with synthetic data generated by a large language mode…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.8

SIGMA-ASL: Sensor-Integrated Multimodal Dataset for Sign Language Recognition

2026-05-07 · Xiaofang Xiao, Guangchao Li, Guangrong Zhao, Qi Lin, Wen Ma, Hongkai Wen, Yanxiang Wang, Yiran Shen

General AI

Automatic sign language recognition (SLR) has become a key enabler of inclusive human-computer interaction, fostering seamless communication between deaf individuals and hearing communities. Despite significant advances in multimodal learning, existing SLR research remains dominated by vision-based datasets, which are …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.5

WAAA! Web Adversaries Against Agentic Browsers

2026-05-06 · Sohom Datta, Alex Nahapetyan, William Enck, Alexandros Kapravelos

Research Track B · General AI

Large language models (LLMs) are increasingly being integrated into web browsers to create agentic browsing systems that execute actions on behalf of the user. Prior work considering the security of agentic browsers focuses exclusively on indirect prompt-injection attacks. However, by failing to consider traditional we…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 10.5

GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs

2026-05-07 · Pranav Mantini, Shishir K. Shah

Research Track A

We address the challenge of knowledge composition in Vision-Language Models (VLMs), where accumulating expertise across multiple domains or tasks typically leads to catastrophic forgetting. We introduce GeoStack (Geometric Stacking), a modular framework that allows independently trained domain experts to be composed in…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.8

EMO: Pretraining Mixture of Experts for Emergent Modularity

2026-05-07 · Ryan Wang, Akshita Bhagia, Sewon Min

General AI

Large language models are typically deployed as monolithic systems, requiring the full model even when applications need only a narrow subset of capabilities, e.g., code, math, or domain-specific knowledge. Mixture-of-Experts (MoEs) seemingly offer a potential alternative by activating only a subset of experts per inpu…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.8

Quantifying Trade-Offs Between Stability and Goal-Obfuscation

2026-05-07 · Yixuan Wang, Dan Guralnik, Warren Dixon

General AI

Safety-critical autonomy in adversarial settings demands more than Lyapunov stability of tracking error signals. An agent executing a goal-directed trajectory is intrinsically legible to a passive observer running online Bayesian inference, because the contractive dynamics of any Lyapunov basin of attraction concentrat…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.8

Task-Aware Answer Preservation under Audio Compression for Large Audio Language Models

2026-05-07 · Amir Ivry

General AI

Large audio language models (LALMs) are increasingly used to reason over long audio clips, yet deployment often compresses audio before inference to reduce memory and latency. The risk is that compression can leave aggregate accuracy acceptable while sharply degrading answers for a deployment-critical query family. We …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.5

Generative Quantum-inspired Kolmogorov-Arnold Eigensolver

2026-05-06 · Yu-Cheng Lin, Yu-Chao Hsu, I-Shan Tsai, Chun-Hua Lin, Kuo-Chung Peng, Jiun-Cheng Jiang, Yun-Yuan Wang, Tzung-Chi Huang, Tai-Yue Li, Kuan-Cheng Chen, Samuel Yen-Chi Chen, Nan-Yow Chen

General AI

High-performance computing (HPC) is increasingly important for scalable quantum chemistry workflows that couple classical generative models, quantum circuit simulation, and selected configuration interaction postprocessing. We present the generative quantum-inspired Kolmogorov-Arnold eigensolver (GQKAE), a parameter-ef…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.0

RaguTeam at SemEval-2026 Task 8: Meno and Friends in a Judge-Orchestrated LLM Ensemble for Faithful Multi-Turn Response Generation

2026-05-06 · Ivan Bondarenko, Roman Derunets, Oleg Sedukhin, Mikhail Komarov, Ivan Chernov, Mikhail Kulakov

General AI

We present our winning system for Task~B (generation with reference passages) in SemEval-2026 Task~8: MTRAGEval. Our method is a heterogeneous ensemble of seven LLMs with two prompting variants, where a GPT-4o-mini judge selects the best candidate per instance. We ranked 1st out of 26 teams, achieving a conditioned har…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.0

CXR-ContraBench: Benchmarking Negated-Option Attraction in Medical VLMs

2026-05-07 · Zhengru Fang, Yanan Ma, Yu Guo, Senkang Hu, Yixian Zhang, Hangcheng Cao, Wenbo Ding, Yuguang Fang

Research Track A · General AI

When a chest X-ray shows consolidation but the question asks which finding is present, a medical vision-language model may answer "No consolidation." This is more than an incorrect choice: it is a polarity reversal that emits a clinical statement contradicting the image. We study this failure as negated-option attracti…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.0

TIDE: Every Layer Knows the Token Beneath the Context

2026-05-07 · Ajay Jaiswal, Lauren Hannah, Han-Byul Kim, Duc Hoang, Mehrdad Farajtabar, Minsik Cho

General AI

We revisit a universally accepted but under-examined design choice in every modern LLM: a token index is looked up once at the input embedding layer and then permanently discarded. This single-injection assumption induces two structural failures: (i) the Rare Token Problem, where a Zipf-type distribution of vocabulary …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Relit-LiVE: Relight Video by Jointly Learning Environment Video

2026-05-07 · Weiqing Xiao, Hong Li, Xiuyu Yang, Houyuan Chen, Wenyi Li, Tianqi Liu, Shaocong Xu, Chongjie Ye, Hao Zhao, Beibei Wang

General AI

Recent advances have shown that large-scale video diffusion models can be repurposed as neural renderers by first decomposing videos into intrinsic scene representations and then performing forward rendering under novel illumination. While promising, this paradigm fundamentally relies on accurate intrinsic decompositio…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Rethinking Adapter Placement: A Dominant Adaptation Module Perspective

2026-05-07 · Suoxin Zhang, Run He, Di Fang, Xiang Tan, Kaixuan Chen, Huiping Zhuang

General AI

Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method that places trainable low-rank adapters into frozen pre-trained models. Recent studies show that using fewer LoRA adapters may still maintain or even improve performance, but existing methods still distribute adapters broadly, leaving wh…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation

2026-05-07 · Abdelrahman Zaian, Sheethal Bhat, Mohamed Abdalkader, Andreas Maier

General AI

Diabetic Retinopathy (DR) is a leading cause of preventable blindness among working-age adults worldwide, yet most automated screening systems are limited to image-level classification and lack clinically structured reporting. We propose Retina-RAG, a low-cost modular framework that jointly performs DR severity grading…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.0

KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

2026-05-06 · Han Wang, Jintao Zhang, Kai Jiang, Haoxu Wang, Jianfei Chen, Jun Zhu

General AI

LLM-based Triton kernel generation has attracted significant interest, yet a fundamental empirical question remains unanswered: where does this capability break down, and why? We present KernelBench-X, a benchmark designed to answer this question through category-aware evaluation of correctness and hardware efficiency …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.0

PianoCoRe: Combined and Refined Piano MIDI Dataset

2026-05-07 · Ilya Borovik

General AI

Symbolic music datasets with matched scores and performances are essential for many music information retrieval (MIR) tasks. Yet, existing resources often cover a narrow range of composers, lack performance variety, omit note-level alignments, or use inconsistent naming formats. This work presents PianoCoRe, a large-sc…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.0

Sparkle: Realizing Lively Instruction-Guided Video Background Replacement via Decoupled Guidance

2026-05-07 · Ziyun Zeng, Yiqi Lin, Guoqiang Liang, Mike Zheng Shou

General AI

In recent years, open-source efforts like Senorita-2M have propelled video editing toward natural language instruction. However, current publicly available datasets predominantly focus on local editing or style transfer, which largely preserve the original scene structure and are easier to scale. In contrast, Backgroun…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels

2026-05-07 · Sushant Gautam, Finn Schwall, Annika Willoch Olstad, Fernando Vallecillos Ruiz, Birk Torpmann-Hagen, Sunniva Maria Stordal Bjørklund, Leon Moonen, Klas Pettersen, Michael A. Riegler

General AI

Many deployments must compare candidate language models for safety before a labeled benchmark exists for the relevant language, sector, or regulatory regime. We formalize this setting as benchmarkless comparative safety scoring and specify the contract under which a scenario-based audit can be interpreted as deployment…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.2

PriorNet: Prior-Guided Engagement Estimation from Face Video

2026-05-05 · Alexander Vedernikov

General AI

Engagement estimation from face video remains challenging because facial evidence is often incomplete, labeled data are limited, and engagement annotations are subjective. We present PriorNet, a prior-guided framework that injects task-relevant priors at three stages of the pipeline: preprocessing, model adaptation, an…

Review
pending
Role
unreviewed
Read
later
huggingface Score 7.0

RemoteZero: Geospatial Reasoning with Zero Human Annotations

2026-05-06 · Liang Yao, Fan Liu, Shengxiang Xu, Chuanyi Zhang, Rui Min, Shimin Di, Yuhui Zheng

General AI

Geospatial reasoning requires models to resolve complex spatial semantics and user intent into precise target locations for Earth observation. Recent progress has liberated the reasoning path from manual curation, allowing models to generate their own inference chains. Yet a final dependency remains: they are still sup…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.8

From Review to Design: Ethical Multimodal Driver Monitoring Systems for Risk Mitigation, Incident Response, and Accountability in Automated Vehicles

2026-05-07 · Bilal Khana, Waseem Shariff, Rory Coyne, Muhammad Ali Farooq, Peter Corcoran

General AI

As vehicles transition toward higher levels of automation, Driver Monitoring Systems (DMS) have become essential for ensuring human oversight, safety, and regulatory compliance in a vehicle. These systems rely on multimodal sensing and AI-driven inference to assess driver attention, cognitive state, and readiness to ta…

Review
pending
Role
unreviewed
Read
later
huggingface Score 6.4

The Scaling Properties of Implicit Deductive Reasoning in Transformers

2026-05-05 · Enrico Vompa, Tanel Tammet

General AI

We investigate the scaling properties of implicit deductive reasoning over Horn clauses in depth-bounded Transformers. By systematically decorrelating provability from spurious features and enforcing algorithmic alignment, we find that in sufficiently deep models with a bidirectional prefix mask, implicit reasoning app…

Review
pending
Role
unreviewed
Read
later
huggingface Score 6.0

When to Trust Imagination: Adaptive Action Execution for World Action Models

2026-05-07 · Rui Wang, Yue Zhang, Jiehong Lin, Kuncheng Luo, Jianan Wang, Zhongrui Wang, Xiaojuan Qi

General AI

World Action Models (WAMs) have recently emerged as a promising paradigm for robotic manipulation by jointly predicting future visual observations and future actions. However, current WAMs typically execute a fixed number of predicted actions after each model inference, leaving the robot blind to whether the imagined f…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

HUGO-CS: A Hybrid-Labeled, Uncertainty-Aware, General-Purpose, Observational Dataset for Cold Spray

2026-05-05 · Stephen Price, Kyle Miller, Marco Musto, Kenneth Kroenlein, James Saal, Kyle Tsaknopoulos, Elke A. Rundensteiner, Danielle L. Cote

General AI

Cold spraying is an increasingly common approach for repairing and manufacturing components due to its solid-state manufacturing capabilities. However, process optimization remains difficult due to many interdependent parameters and the lack of large-scale, machine-readable data to support modeling. While the scientifi…

Review
pending
Role
unreviewed
Read
later