Research Paper Cockpit

Needs Review

Unresolved papers that are still in your triage queue.

Daily Archives

Quick jump into generated daily digests.

Research Workflow

Latest digest: 2026-03-28.

Papers

102 visible entries

arxiv Score 31.5

Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning

2026-03-12 · Jiaheng Hu, Jay Shim, Chen Tang, Yoonchang Sung, Bo Liu, Peter Stone, Roberto Martin-Martin

Research Track A · General AI

Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual learning suggests that naive Sequential Fine-Tuning (Seq. FT) leads to catastrophi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 29.4

ContractSkill: Repairable Contract-Based Skills for Multimodal Web Agents

2026-03-20 · Zijian Lu, Yiping Zuo, Yupeng Nie, Xin He, Weibei Fan, Chen Dai

Research Track B · General AI

Despite rapid progress in multimodal GUI agents, reusable skill acquisition remains difficult because on-demand generated skills often leave action semantics, state assumptions, and success criteria implicit. This makes them brittle to execution errors, hard to verify, and difficult to repair. We present ContractSkill,…

Review
pending
Role
unreviewed
Read
now
arxiv Score 25.9

Universe Routing: Why Self-Evolving Agents Need Epistemic Control

2026-03-16 · Zhaohui Geoffrey Wang

Research Track A · General AI

A critical failure mode of current lifelong agents is not lack of knowledge, but the inability to decide how to reason. When an agent encounters "Is this coin fair?" it must recognize whether to invoke frequentist hypothesis testing or Bayesian posterior inference - frameworks that are epistemologically incompatible. M…

Review
pending
Role
unreviewed
Read
now
arxiv Score 24.8

Enhancing Web Agents with a Hierarchical Memory Tree

2026-03-07 · Yunteng Tan, Zhi Gao, Xinxiao Wu

Research Track B · General AI

Large language model-based web agents have shown strong potential in automating web interactions through advanced reasoning and instruction following. While retrieval-based memory derived from historical trajectories enables these agents to handle complex, long-horizon tasks, current methods struggle to generalize acro…

Review
pending
Role
unreviewed
Read
now
arxiv Score 24.2

A Subgoal-driven Framework for Improving Long-Horizon LLM Agents

2026-03-20 · Taiyi Wang, Sian Gooding, Florian Hartmann, Oriana Riva, Edward Grefenstette

Research Track B · General AI

Large language model (LLM)-based agents have emerged as powerful autonomous controllers for digital environments, including mobile interfaces, operating systems, and web browsers. Web navigation, for example, requires handling dynamic content and long sequences of actions, making it particularly challenging. Existing L…

Review
pending
Role
unreviewed
Read
now
arxiv Score 24.0

Continual Learning in Large Language Models: Methods, Challenges, and Opportunities

2026-03-13 · Hongyang Chen, Zhongwu Sun, Hongfei Ye, Kunchi Li, Xuemin Lin

Research Track A · General AI

Continual learning (CL) has emerged as a pivotal paradigm to enable large language models (LLMs) to dynamically adapt to evolving knowledge and sequential tasks while mitigating catastrophic forgetting-a critical limitation of the static pre-training paradigm inherent to modern LLMs. This survey presents a comprehensiv…

Review
pending
Role
unreviewed
Read
now
arxiv Score 23.8

Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos

2026-03-23 · Shoubin Yu, Lei Shu, Antoine Yang, Yao Fu, Srinivas Sunkara, Maria Wang, Jindong Chen, Mohit Bansal, Boqing Gong

Research Track B · General AI

Multimodal AI agents are increasingly automating complex real-world workflows that involve online web execution. However, current web-agent benchmarks suffer from a critical limitation: they focus entirely on web-based interaction and perception, lacking grounding in the user's real-world physical surroundings. This li…

Review
pending
Role
unreviewed
Read
now
huggingface Score 23.8

FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol

2026-03-26 · Jie Zhu, Yimin Tian, Boyang Li, Kehao Wu, Zhongzhi Liang, Junhui Li, Xianyin Zhang, Lifan Guo, Feng Chen, Yong Liu, Chi Zhang

General AI

This paper introduces FinMCP-Bench, a novel benchmark for evaluating large language models (LLMs) in solving real-world financial problems through tool invocation of financial model context protocols. FinMCP-Bench contains 613 samples spanning 10 main scenarios and 33 sub-scenarios, featuring both real and synthetic us…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.6

Demographic Fairness in Multimodal LLMs: A Benchmark of Gender and Ethnicity Bias in Face Verification

2026-03-26 · Ünsal Öztürk, Hatef Otroshi Shahreza, Sébastien Marcel

General AI

Multimodal Large Language Models (MLLMs) have recently been explored as face verification systems that determine whether two face images are of the same person. Unlike dedicated face recognition systems, MLLMs approach this task through visual prompting and rely on general visual and reasoning abilities. However, the d…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.6

ElephantBroker: A Knowledge-Grounded Cognitive Runtime for Trustworthy AI Agents

2026-03-26 · Cristian Lupascu, Alexandru Lupascu

Research Track A · General AI

Large Language Model based agents increasingly operate in high stakes, multi turn settings where factual grounding is critical, yet their memory systems typically rely on flat key value stores or plain vector retrieval with no mechanism to track the provenance or trustworthiness of stored knowledge. We present Elephant…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.5

AI Planning Framework for LLM-Based Web Agents

2026-03-13 · Orit Shahnovsky, Rotem Dror

Research Track B · General AI

Developing autonomous agents for web-based tasks is a core challenge in AI. While Large Language Model (LLM) agents can interpret complex user requests, they often operate as black boxes, making it difficult to diagnose why they fail or how they plan. This paper addresses this gap by formally treating web tasks as sequ…

Review
pending
Role
unreviewed
Read
now
huggingface Score 20.8

Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math

2026-03-26 · Dingjie Song, Tianlong Xu, Yi-Fan Zhang, Hang Li, Zhiling Yan, Xing Fan, Haoyang Li, Lichao Sun, Qingsong Wen

General AI

Assessing student handwritten scratchwork is crucial for personalized educational feedback but presents unique challenges due to diverse handwriting, complex layouts, and varied problem-solving approaches. Existing educational NLP primarily focuses on textual responses and neglects the complexity and multimodality inhe…

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.6

Colon-Bench: An Agentic Workflow for Scalable Dense Lesion Annotation in Full-Procedure Colonoscopy Videos

2026-03-26 · Abdullah Hamdi, Changchun Yang, Xin Gao

General AI

Early screening via colonoscopy is critical for colon cancer prevention, yet developing robust AI systems for this domain is hindered by the lack of densely annotated, long-sequence video datasets. Existing datasets predominantly focus on single-class polyp detection and lack the rich spatial, temporal, and linguistic …

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.6

Is Mathematical Problem-Solving Expertise in Large Language Models Associated with Assessment Performance?

2026-03-26 · Liang Zhang, Yu Fu, Xinyi Jin

General AI

Large Language Models (LLMs) are increasingly used in math education not only as problem solvers but also as assessors of learners' reasoning. However, it remains unclear whether stronger math problem-solving ability is associated with stronger step-level assessment performance. This study examines that relationship us…

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.6

LanteRn: Latent Visual Structured Reasoning

2026-03-26 · André G. Viveiros, Nuno Gonçalves, Matthias Lindemann, André Martins

General AI

While language reasoning models excel in many tasks, visual reasoning remains challenging for current large multimodal models (LMMs). As a result, most LMMs default to verbalizing perceptual content into text, a strong limitation for tasks requiring fine-grained spatial and visual understanding. While recent approaches…

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.6

SlotVTG: Object-Centric Adapter for Generalizable Video Temporal Grounding

2026-03-26 · Jiwook Han, Geo Ahn, Youngrae Kim, Jinwoo Choi

General AI

Multimodal Large Language Models (MLLMs) have shown strong performance on Video Temporal Grounding (VTG). However, their coarse recognition capabilities are insufficient for fine-grained temporal understanding, making task-specific fine-tuning indispensable. This fine-tuning causes models to memorize dataset-specific s…

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.6

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

2026-03-26 · Jingwei Ni, Yihao Liu, Xinpeng Liu, Yutao Sun, Mengyu Zhou, Pengyu Cheng, Dexin Wang, Xiaoxi Jiang, Guanjun Jiang

General AI

Equipping Large Language Model (LLM) agents with domain-specific skills is critical for tackling complex tasks. Yet, manual authoring creates a severe scalability bottleneck. Conversely, automated skill generation often yields fragile or fragmented results because it either relies on shallow parametric knowledge or seq…

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.2

Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective

2026-03-15 · Mohamed Aghzal, Gregory J. Stein, Ziyu Yao

Research Track B · General AI

Large language model (LLM) web agents are increasingly used for web navigation but remain far from human reliability on realistic, long-horizon tasks. Existing evaluations focus primarily on end-to-end success, offering limited insight into where failures arise. We propose a hierarchical planning framework to analyze w…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.6

Seeing to Ground: Visual Attention for Hallucination-Resilient MDLLMs

2026-03-26 · Vishal Narnaware, Animesh Gupta, Kevin Zhai, Zhenyi Wang, Mubarak Shah

General AI

Multimodal Diffusion Large Language Models (MDLLMs) achieve high-concurrency generation through parallel masked decoding, yet the architectures remain prone to multimodal hallucinations. This structural vulnerability stems from an algorithmic flaw: the decoder ranks candidate tokens based on textual likelihood without …

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.5

ColorBrowserAgent: Complex Long-Horizon Browser Agent with Adaptive Knowledge Evolution

2026-01-12 · Jihong Wang, Jiamu Zhou, Weiming Zhang, Weiwen Liu, Zhuosheng Zhang, Xingyu Lou, Weinan Zhang, Huarong Deng, Jun Wang

Research Track B · General AI

With the advancement of vision-language models, web automation has made significant progress. However, deploying autonomous agents in real-world settings remains challenging, primarily due to site heterogeneity, where generalist models lack domain-specific priors for diverse interfaces, and long-horizon instability, ch…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.5

Ares: Adaptive Reasoning Effort Selection for Efficient LLM Agents

2026-03-09 · Jingbo Yang, Bairu Hou, Wei Wei, Yujia Bao, Shiyu Chang

Research Track B · General AI

Modern agents powered by thinking LLMs achieve high accuracy through long chain-of-thought reasoning but incur substantial inference costs. While many LLMs now support configurable reasoning levels (e.g., high/medium/low), static strategies are often ineffective: using low-effort modes at every step leads to significan…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.4

All-day Multi-scenes Lifelong Vision-and-Language Navigation with Tucker Adaptation

2026-03-15 · Xudong Wang, Gan Li, Zhiyu Liu, Yao Wang, Lianqing Liu, Zhi Han

Research Track A · General AI

Deploying vision-and-language navigation (VLN) agents requires adaptation across diverse scenes and environments, but fine-tuning on a specific scenario often causes catastrophic forgetting in others, which severely limits flexible long-term deployment. We formalize this challenge as the all-day multi-scenes lifelong V…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.8

Environment Maps: Structured Environmental Representations for Long-Horizon Agents

2026-03-24 · Yenchia Feng, Chirag Sharma, Karime Maamari

Research Track B · General AI

Although large language models (LLMs) have advanced rapidly, robust automation of complex software workflows remains an open problem. In long-horizon settings, agents frequently suffer from cascading errors and environmental stochasticity; a single misstep in a dynamic interface can lead to task failure, resulting in h…

Review
pending
Role
unreviewed
Read
now
huggingface Score 18.8

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

2026-03-26 · Yicheng Zou, Dongsheng Zhu, Lin Zhu, Tong Zhu, Yunhua Zhou, Peiheng Zhou, Xinyu Zhou, Dongzhan Zhou, Zhiwang Zhou, Yuhao Zhou, Bowen Zhou, Zhanping Zhong, Zhijie Zhong, Haiteng Zhao, Penghao Zhao, Xiaomeng Zhao, Zhiyuan Zhao, Yechen Zhang, Jin Zhang, Wenwei Zhang, Hongjie Zhang, Zhuo Zhang, Wenlong Zhang, Bo Zhang, Chao Zhang, Chen Zhang, Yuhang Zang, Fei Yuan, Jiakang Yuan, Jiashuo Yu, Jinhui Yin, Haochen Ye, Qian Yao, Bowen Yang, Danni Yang, Kaichen Yang, Ziang Yan, Jun Xu, Yicheng Xu, Wanghan Xu, Xuenan Xu, Chao Xu, Ruiliang Xu, Shuhao Xing, Long Xing, Xinchen Xie, Ling-I Wu, Zijian Wu, Zhenyu Wu, Lijun Wu, Yue Wu, Jianyu Wu, Wen Wu, Fan Wu, Xilin Wei, Qi Wei, Bingli Wang, Rui Wang, Ziyi Wang, Zun Wang, Yi Wang, Haomin Wang, Yizhou Wang, Lintao Wang, Yiheng Wang, Longjiang Wang, Bin Wang, Jian Tong, Zhongbo Tian, Huanze Tang, Chen Tang, Shixiang Tang, Yu Sun, Qiushi Sun, Xuerui Su, Qisheng Su, Chenlin Su, Demin Song, Jin Shi, Fukai Shang, Yuchen Ren, Pengli Ren, Xiaoye Qu, Yuan Qu, Jiantao Qiu, Yu Qiao, Runyu Peng, Tianshuo Peng, Jiahui Peng, Qizhi Pei, Zhuoshi Pan, Linke Ouyang, Wenchang Ning, Yichuan Ma, Zerun Ma, Ningsheng Ma, Runyuan Ma, Chengqi Lyu, Haijun Lv, Han Lv, Lindong Lu, Kuikun Liu, Jiangning Liu, Yuhong Liu, Kai Liu, Hongwei Liu, Zhoumianze Liu, Mengjie Liu, Ziyu Liu, Wenran Liu, Yang Liu, Liwei Liu, Kaiwen Liu, Junyao Lin, Junming Lin, Tianyang Lin, Dahua Lin, Jianze Liang, Linyang Li, Peiji Li, Zonglin Li, Zehao Li, Pengze Li, Guoyan Li, Lingkai Kong, Linglin Jing, Zhenjiang Jin, Feifei Jiang, Qian Jiang, Junhao Huang, Zixian Huang, Haian Huang, Zhouqi Hua, Han Hu, Linfeng Hou, Yinan He, Conghui He, Tianyao He, Xu Guo, Qipeng Guo, Aijia Guo, Yuzhe Gu, Lixin Gu, Jingyang Gong, Qiming Ge, Jiaye Ge, Songyang Gao, Jianfei Gao, Xinyu Fang, Caihua fan, Yue Fan, Yanhui Duan, Zichen Ding, Shengyuan Ding, Xuanlang Dai, Erfei Cui, Ganqu Cui, Pei Chu, Tao Chu, Guangran Cheng, Yu Cheng, Kai Chen, Yongkang Chen, Chiyu Chen, Guanzhou Chen, Qiaosheng Chen, Sitao Chen, Xin Chen, Haojiong Chen, Yicheng Chen, Weihan Cao, Yuhang Cao, Qinglong Cao, Lei Bai

General AI

We introduce Intern-S1-Pro, the first one-trillion-parameter scientific multimodal foundation model. Scaling to this unprecedented size, the model delivers a comprehensive enhancement across both general and scientific domains. Beyond stronger reasoning and image-text understanding capabilities, its intelligence is aug…

Review
pending
Role
unreviewed
Read
now
huggingface Score 18.4

MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution

2026-03-19 · Minhua Lin, Zhiwei Zhang, Hanqing Lu, Hui Liu, Xianfeng Tang, Qi He, Xiang Zhang, Suhang Wang

General AI

Memory-augmented LLM agents maintain external memory banks to support long-horizon interaction, yet most existing systems treat construction, retrieval, and utilization as isolated subroutines. This creates two coupled challenges: strategic blindness on the forward path of the memory cycle, where construction and retri…

Review
pending
Role
unreviewed
Read
now
huggingface Score 17.8

Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes

2026-03-26 · Yuqian Fu, Haohuan Huang, Kaiwen Jiang, Yuanheng Zhu, Dongbin Zhao

General AI

On-policy distillation (OPD) is appealing for large language model (LLM) post-training because it evaluates teacher feedback on student-generated rollouts rather than fixed teacher traces. In long-horizon settings, however, the common sampled-token variant is fragile: it reduces distribution matching to a one-token sig…

Review
pending
Role
unreviewed
Read
now
huggingface Score 17.4

IQuest-Coder-V1 Technical Report

2026-03-17 · Jian Yang, Wei Zhang, Shawn Guo, Zhengmao Ye, Lin Jing, Shark Liu, Yizhi Li, Jiajun Wu, Cening Liu, X. Ma, Yuyang Song, Siwei Wu, Yuwen Li, L. Liao, T. Zheng, Ziling Huang, Zelong Huang, Che Liu, Yan Xing, Renyuan Li, Qingsong Cai, Hanxu Yan, Siyue Wang, Shikai Li, Jason Klein Liu, An Huang, Yongsheng Kang, Jinxing Zhang, Chuan Hao, Haowen Wang, Weicheng Gu, Ran Tao, Mingjie Tang, Peihao Wu, Jianzhou Wang, Xianglong Liu, Weifeng Lv, Bryan Dai

General AI

In this report, we introduce the IQuest-Coder-V1 series-(7B/14B/40B/40B-Loop), a new family of code large language models (LLMs). Moving beyond static code representations, we propose the code-flow multi-stage training paradigm, which captures the dynamic evolution of software logic through different phases of the pipe…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.6

Agent Factories for High Level Synthesis: How Far Can General-Purpose Coding Agents Go in Hardware Optimization?

2026-03-26 · Abhishek Bhandwaldar, Mihir Choudhury, Ruchir Puri, Akash Srivastava

General AI

We present an empirical study of how far general-purpose coding agents -- without hardware-specific training -- can optimize hardware designs from high-level algorithmic specifications. We introduce an agent factory, a two-stage pipeline that constructs and coordinates multiple autonomous optimization agents. In Stage~…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.6

Just Zoom In: Cross-View Geo-Localization via Autoregressive Zooming

2026-03-26 · Yunus Talha Erzurumlu, Jiyong Kwag, Alper Yilmaz

General AI

Cross-view geo-localization (CVGL) estimates a camera's location by matching a street-view image to geo-referenced overhead imagery, enabling GPS-denied localization and navigation. Existing methods almost universally formulate CVGL as an image-retrieval problem in a contrastively trained embedding space. This ties per…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.5

Pruned Adaptation Modules: A Simple yet Strong Baseline for Continual Foundation Models

2026-03-22 · Elif Ceren Gok Yildirim, Murat Onur Yildirim, Joaquin Vanschoren

Research Track A · General AI

The continual learning literature has rapidly shifted from traditional class incremental learning (CIL) techniques to foundation model (FM)-based CIL methods without a clear understanding of how these newer approaches compare to strong, lightweight convolutional baselines. This abrupt transition has created a substanti…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.4

Deconfounded Lifelong Learning for Autonomous Driving via Dynamic Knowledge Spaces

2026-03-15 · Jiayuan Du, Yuebing Song, Yiming Zhao, Xianghui Pan, Jiawei Lian, Yuchu Lu, Liuyi Wang, Chengju Liu, Qijun Chen

Research Track A · General AI

End-to-End autonomous driving (E2E-AD) systems face challenges in lifelong learning, including catastrophic forgetting, difficulty in knowledge transfer across diverse scenarios, and spurious correlations between unobservable confounders and true driving intents. To address these issues, we propose DeLL, a Deconfounded…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.0

Lifelong Embodied Navigation Learning

2026-03-06 · Xudong Wang, Jiahua Dong, Baichen Liu, Qi Lyu, Lianqing Liu, Zhi Han

Research Track A · General AI

Embodied navigation agents powered by large language models have shown strong performance on individual tasks but struggle to continually acquire new navigation skills, which suffer from catastrophic forgetting. We formalize this challenge as lifelong embodied navigation learning (LENL), where an agent is required to a…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.8

BioVITA: Biological Dataset, Model, and Benchmark for Visual-Textual-Acoustic Alignment

2026-03-25 · Risa Shinoda, Kaede Shiohara, Nakamasa Inoue, Kuniaki Saito, Hiroaki Santo, Fumio Okura

General AI

Understanding animal species from multimodal data poses an emerging challenge at the intersection of computer vision and ecology. While recent biological models, such as BioCLIP, have demonstrated strong alignment between images and textual taxonomic information for species identification, the integration of the audio …

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.8

Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models

2026-03-25 · Isha Puri, Mehul Damani, Idan Shenfeld, Marzyeh Ghassemi, Jacob Andreas, Yoon Kim

General AI

Given a question, a language model (LM) implicitly encodes a distribution over possible answers. In practice, post-training procedures for LMs often collapse this distribution onto a single dominant mode. While this is generally not a problem for benchmark-style evaluations that assume one correct answer, many real-wor…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.6

ELITE: Experiential Learning and Intent-Aware Transfer for Self-improving Embodied Agents

2026-03-25 · Bingqing Wei, Zhongyu Xia, Dingai Liu, Xiaoyu Zhou, Zhiwei Lin, Yongtao Wang

General AI

Vision-language models (VLMs) have shown remarkable general capabilities, yet embodied agents built on them fail at complex tasks, often skipping critical steps, proposing invalid actions, and repeating mistakes. These failures arise from a fundamental gap between the static training data of VLMs and the physical inter…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.6

R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning

2026-03-26 · Zirui Zhang, Haoyu Dong, Kexin Pei, Chengzhi Mao

General AI

Robust perception and reasoning require consistency across sensory modalities. Yet current multimodal models often violate this principle, yielding contradictory predictions for visual and textual representations of the same concept. Rather than masking these failures with standard voting mechanisms, which can amplify …

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.6

Vega: Learning to Drive with Natural Language Instructions

2026-03-26 · Sicheng Zuo, Yuxuan Li, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu

General AI

Vision-language-action models have reshaped autonomous driving to incorporate languages into the decision-making process. However, most existing pipelines only utilize the language modality for scene descriptions or reasoning and lack the flexibility to follow diverse user instructions for personalized driving. To addr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.2

WebNavigator: Global Web Navigation via Interaction Graph Retrieval

2026-03-20 · Xuanwang Zhang, Yuteng Han, Jinnan Qi, Mulong Xie, Zhen Wu, Xinyu Dai

Research Track B · General AI

Despite significant advances in autonomous web navigation, current methods remain far from human-level performance in complex web environments. We argue that this limitation stems from Topological Blindness, where agents are forced to explore via trial-and-error without access to the global topological structure of the…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.0

Reframing Long-Tailed Learning via Loss Landscape Geometry

2026-03-22 · Shenghan Chen, Yiming Liu, Yanzhen Wang, Yujia Wang, Xiankai Lu

Research Track A · General AI

Balancing performance trade-off on long-tail (LT) data distributions remains a long-standing challenge. In this paper, we posit that this dilemma stems from a phenomenon called "tail performance degradation" (the model tends to severely overfit on head classes while quickly forgetting tail classes) and pose a solution …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 14.6

SOMA: Strategic Orchestration and Memory-Augmented System for Vision-Language-Action Model Robustness via In-Context Adaptation

2026-03-25 · Zhuoran Li, Zhiyang Li, Kaijun Zhou, Jinyu Gu

Research Track A · General AI

Despite the promise of Vision-Language-Action (VLA) models as generalist robotic controllers, their robustness against perceptual noise and environmental variations in out-of-distribution (OOD) tasks remains fundamentally limited by the absence of long-term memory, causal failure attribution, and dynamic intervention c…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.6

Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving

2026-03-26 · Zehao Wang, Huaide Jiang, Shuaiwu Dong, Yuping Wang, Hang Qiu, Jiachen Li

General AI

Human driving behavior is inherently personal, which is shaped by long-term habits and influenced by short-term intentions. Individuals differ in how they accelerate, brake, merge, yield, and overtake across diverse situations. However, existing end-to-end autonomous driving systems either optimize for generic objectiv…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.0

AdaRubric: Task-Adaptive Rubrics for LLM Agent Evaluation

2026-03-22 · Liang Ding

Research Track B · General AI

LLM-as-Judge evaluation fails agent tasks because a fixed rubric cannot capture what matters for this task: code debugging demands Correctness and Error Handling; web navigation demands Goal Alignment and Action Efficiency. We present ADARUBRIC, which closes this gap by generating task-specific evaluation rubrics on th…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

Evidence of an Emergent "Self" in Continual Robot Learning

2026-03-25 · Adidev Jhunjhunwala, Judah Goldfeder, Hod Lipson

Research Track A

A key challenge to understanding self-awareness has been a principled way of quantifying whether an intelligent system has a concept of a "self," and if so how to differentiate the "self" from other cognitive structures. We propose that the "self" can be isolated by seeking the invariant portion of cognitive process th…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 13.6

Natural-Language Agent Harnesses

2026-03-26 · Linyue Pan, Lexiao Zou, Shuo Guo, Jingchen Ni, Hai-Tao Zheng

General AI

Agent performance increasingly depends on \emph{harness engineering}, yet harness design is usually buried in controller code and runtime-specific conventions, making it hard to transfer, compare, and study as a scientific object. We ask whether the high-level control logic of an agent harness can instead be externaliz…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.6

Social Hippocampus Memory Learning

2026-03-26 · Liping Yi, Zhiming Zhao, Qinghua Hu

General AI

Social learning highlights that learning agents improve not in isolation, but through interaction and structured knowledge exchange with others. When introduced into machine learning, this principle gives rise to social machine learning (SML), where multiple agents collaboratively learn by sharing abstracted knowledge.…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.6

Wan-Weaver: Interleaved Multi-modal Generation via Decoupled Training

2026-03-26 · Jinbo Xing, Zeyinzi Jiang, Yuxiang Tuo, Chaojie Mao, Xiaotang Gai, Xi Chen, Jingfeng Zhang, Yulin Pan, Zhen Han, Jie Xiao, Keyu Yan, Chenwei Xie, Chongyang Zhong, Kai Zhu, Tong Shen, Lianghua Huang, Yu Liu, Yujiu Yang

General AI

Recent unified models have made unprecedented progress in both understanding and generation. However, while most of them accept multi-modal inputs, they typically produce only single-modality outputs. This challenge of producing interleaved content is mainly due to training data scarcity and the difficulty of modeling …

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.5

Bi-CRCL: Bidirectional Conservative-Radical Complementary Learning with Pre-trained Foundation Models for Class-incremental Medical Image Analysis

2026-03-24 · Xinyao Wu, Zhe Xu, Cheng Chen, Jiawei Ma, Yefeng Zheng, Raymond Kai-yu Tong

Research Track A · General AI

Class-incremental learning (CIL) in medical image-guided diagnosis requires retaining prior diagnostic knowledge while adapting to newly emerging disease categories, which is critical for scalable clinical deployment. This problem is particularly challenging due to heterogeneous data and privacy constraints that preven…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 13.4

ClawTrap: A MITM-Based Red-Teaming Framework for Real-World OpenClaw Security Evaluation

2026-03-19 · Haochen Zhao, Shaoyang Cui

Research Track B · General AI

Autonomous web agents such as \textbf{OpenClaw} are rapidly moving into high-impact real-world workflows, but their security robustness under live network threats remains insufficiently evaluated. Existing benchmarks mainly focus on static sandbox settings and content-level prompt attacks, which leaves a practical gap …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 13.0

Vibe Code Bench: Evaluating AI Models on End-to-End Web Application Development

2026-03-04 · Hung Tran, Langston Nashold, Rayan Krishnan, Antoine Bigeard, Alex Gu

Research Track B · General AI

Code generation has emerged as one of AI's highest-impact use cases, yet existing benchmarks measure isolated tasks rather than the complete "zero-to-one" process of building a working application from scratch. We introduce Vibe Code Bench, a benchmark of 100 web application specifications (50 public validation, 50 hel…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 12.8

The Cognitive Firewall:Securing Browser Based AI Agents Against Indirect Prompt Injection Via Hybrid Edge Cloud Defense

2026-03-24 · Qianlong Lan, Anuj Kaul

Research Track B · General AI

Deploying large language models (LLMs) as autonomous browser agents exposes a significant attack surface in the form of Indirect Prompt Injection (IPI). Cloud-based defenses can provide strong semantic analysis, but they introduce latency and raise privacy concerns. We present the Cognitive Firewall, a three-stage spli…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

DIET: Learning to Distill Dataset Continually for Recommender Systems

2026-03-26 · Jiaqing Zhang, Hao Wang, Mingjia Yin, Bo Chen, Qinglin Jia, Rui Zhou, Ruiming Tang, ChaoYi Ma, Enhong Chen

Research Track A · General AI

Modern deep recommender models are trained under a continual learning paradigm, relying on massive and continuously growing streaming behavioral logs. In large-scale platforms, retraining models on full historical data for architecture comparison or iteration is prohibitively expensive, severely slowing down model deve…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 12.6

Back to Basics: Revisiting ASR in the Age of Voice Agents

2026-03-26 · Geeyang Tay, Wentao Ma, Jaewon Lee, Yuzhi Tang, Daniel Lee, Weisu Yin, Dongming Shen, Silin Meng, Yi Zhu, Mu Li, Alex Smola

General AI

Automatic speech recognition (ASR) systems have achieved near-human accuracy on curated benchmarks, yet still fail in real-world voice agents under conditions that current evaluations do not systematically cover. Without diagnostic tools that isolate specific failure factors, practitioners cannot anticipate which condi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.6

BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation

2026-03-26 · Yan Li, Zezi Zeng, Ziwei Zhou, Xin Gao, Muzhao Tian, Yifan Yang, Mingxi Cheng, Qi Dai, Yuqing Yang, Lili Qiu, Zhendong Wang, Zhengyuan Yang, Xue Yang, Lijuan Wang, Ji Li, Chong Luo

General AI

Recent advances in image generation models have expanded their applications beyond aesthetic imagery toward practical visual content creation. However, existing benchmarks mainly focus on natural image synthesis and fail to systematically evaluate models under the structured and multi-constraint requirements of real-wo…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.6

No Hard Negatives Required: Concept Centric Learning Leads to Compositionality without Degrading Zero-shot Capabilities of Contrastive Models

2026-03-26 · Hai X. Pham, David T. Hoffmann, Ricardo Guerrero, Brais Martinez

General AI

Contrastive vision-language (V&L) models remain a popular choice for various applications. However, several limitations have emerged, most notably the limited ability of V&L models to learn compositional representations. Prior methods often addressed this limitation by generating custom training data to obtain hard neg…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.6

Training the Knowledge Base through Evidence Distillation and Write-Back Enrichment

2026-03-26 · Yuxing Lu, Xukai Zhao, Wei Wu, Jinzhuo Wang

General AI

The knowledge base in a retrieval-augmented generation (RAG) system is typically assembled once and never revised, even though the facts a query requires are often fragmented across documents and buried in irrelevant content. We argue that the knowledge base should be treated as a trainable component and propose WriteB…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.5

STEM Agent: A Self-Adapting, Tool-Enabled, Extensible Architecture for Multi-Protocol AI Agent Systems

2026-03-22 · Alfred Shen, Aaron Shen

Research Track A · General AI

Current AI agent frameworks commit early to a single interaction protocol, a fixed tool integration strategy, and static user models, limiting their deployment across diverse interaction paradigms. To address these constraints, we introduce STEM Agent (Self-adapting, Tool-enabled, Extensible, Multi-agent), a modular ar…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 12.0

AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling

2026-03-22 · Liang Ding

Research Track B · General AI

LLM agents fail on the majority of real-world tasks -- GPT-4o succeeds on fewer than 15% of WebArena navigation tasks and below 55% pass@1 on ToolBench (Zhou et al., 2024; Qin et al., 2024) -- yet every failed trajectory is routinely discarded, wasting the dominant source of collected experience. We introduce AgentHER,…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 12.0

IntentWeave: A Progressive Entry Ladder for Multi-Surface Browser Agents in Cloud Portals

2026-03-24 · Wanying Mo, Jijia Lai, Xiaoming Wang

Research Track B · General AI

Browser agents built on LLMs can act in web interfaces, yet most remain confined to a single chat surface (e.g., a sidebar). This mismatch with real browsing can increase context-switching and reduce user control. We introduce \textbf{IntentWeave}, a design space of ten spatial paradigms for embedding agentic assistanc…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 11.8

SimGym: Traffic-Grounded Browser Agents for Offline A/B Testing in E-Commerce

2026-02-01 · Alberto Castelo, Zahra Zanjani Foumani, Ailin Fan, Keat Yang Koay, Vibhor Malik, Yuanzheng Zhu, Han Li, Meysam Feghhi, Ronie Uliana, Shuang Xie, Zhaoyu Zhang, Angelo Ocana Martins, Mingyu Zhao, Francis Pelland, Jonathan Faerman, Nikolas LeBlanc, Aaron Glazer, Andrew McNamara, Lingyun Wang, Zhong Wu

Research Track B · General AI

A/B testing remains the gold standard for evaluating e-commerce UI changes, yet it diverts traffic, takes weeks to reach significance, and risks harming user experience. We introduce SimGym, a scalable system for rapid offline A/B testing using traffic-grounded synthetic buyers powered by Large Language Model agents op…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 11.6

Beyond Benchmarks: How Users Evaluate AI Chat Assistants

2026-03-26 · Moiz Sadiq Awan, Muhammad Haris Noor, Muhammad Salman Munaf

Research Track A · General AI

Automated benchmarks dominate the evaluation of large language models, yet no systematic study has compared user satisfaction, adoption motivations, and frustrations across competing platforms using a consistent instrument. We address this gap with a cross-platform survey of 388 active AI chat users, comparing satisfac…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.6

Hierarchy-Guided Multimodal Representation Learning for Taxonomic Inference

2026-03-26 · Sk Miraj Ahmed, Xi Yu, Yunqi Li, Yuewei Lin, Wei Xu

General AI

Accurate biodiversity identification from large-scale field data is a foundational problem with direct impact on ecology, conservation, and environmental monitoring. In practice, the core task is taxonomic prediction - inferring order, family, genus, or species from imperfect inputs such as specimen images, DNA barcode…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.6

RefAlign: Representation Alignment for Reference-to-Video Generation

2026-03-26 · Lei Wang, YuXin Song, Ge Wu, Haocheng Feng, Hang Zhou, Jingdong Wang, Yaxing Wang, jian Yang

General AI

Reference-to-video (R2V) generation is a controllable video synthesis paradigm that constrains the generation process using both text prompts and reference images, enabling applications such as personalized advertising and virtual try-on. In practice, existing R2V methods typically introduce additional high-level seman…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.5

Safe and Scalable Web Agent Learning via Recreated Websites

2026-03-11 · Hyungjoo Chae, Jungsoo Park, Alan Ritter

Research Track B · General AI

Training autonomous web agents is fundamentally limited by the environments they learn from: real-world websites are unsafe to explore, hard to reset, and rarely provide verifiable feedback. We propose VeriEnv, a framework that treats language models as environment creators, automatically cloning real-world websites in…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 11.5

Similarity-Aware Mixture-of-Experts for Data-Efficient Continual Learning

2026-03-24 · Connor Mclaughlin, Nigel Lee, Lili Su

Research Track A

Machine learning models often need to adapt to new data after deployment due to structured or unstructured real-world dynamics. The Continual Learning (CL) framework enables continuous model adaptation, but most existing approaches either assume each task contains sufficiently many data samples or that the learning tas…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 11.4

Pixel-level Scene Understanding in One Token: Visual States Need What-is-Where Composition

2026-03-14 · Seokmin Lee, Yunghee Lee, Byeonghyun Pak, Byeongju Woo

General AI

For robotic agents operating in dynamic environments, learning visual state representations from streaming video observations is essential for sequential decision making. Recent self-supervised learning methods have shown strong transferability across vision tasks, but they do not explicitly address what a good visual …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 11.4

Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models

2026-03-15 · Lok-Lam Ieong, Chia-Chien Chen, Chih-Kai Yang, Yu-Han Huang, An-Yu Cheng, Hung-yi Lee

General AI

Chain-of-thought (CoT) prompting has been extended to large audio-language models (LALMs) to elicit reasoning, yet enhancing its effectiveness without training remains challenging. We study inference-time model steering as a training-free approach to improve LALM reasoning. We introduce three strategies using diverse i…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 11.0

In-Browser Agents for Search Assistance

2026-01-14 · Saber Zerhoudi, Michael Granitzer

Research Track B · General AI

A fundamental tension exists between the demand for sophisticated AI assistance in web search and the need for user data privacy. Current centralized models require users to transmit sensitive browsing data to external services, which limits user control. In this paper, we present a browser extension that provides a vi…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 11.0

RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots

2026-03-04 · Soroush Nasiriany, Sepehr Nasiriany, Abhiram Maddukuri, Yuke Zhu

Research Track A · General AI

Recent advances in robot learning have accelerated progress toward generalist robots that can perform everyday tasks in human environments. Yet it remains difficult to gauge how close we are to this vision. The field lacks a reproducible, large-scale benchmark for systematic evaluation. To fill this gap, we present Rob…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.8

Privacy Practices of Browser Agents

2025-12-08 · Alisha Ukani, Hamed Haddadi, Ali Shahin Shamsabadi, Peter Snyder

Research Track B · General AI

This paper presents a systematic evaluation of the privacy behaviors and attributes of eight recent, popular browser agents. Browser agents are software that automate Web browsing using large language models and ancillary tooling. However, the automated capabilities that make browser agents powerful also make them high…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.8

Cognitive Dark Matter: Measuring What AI Misses

2026-03-03 · Patrick J. Mineault, Thomas L. Griffiths, Sean Escola

Research Track A · General AI

We propose that the jagged intelligence landscape of modern AI systems arises from a missing training signal that we call "cognitive dark matter" (CDM): brain functions that meaningfully shape behavior yet are hard to infer from behavior alone. We identify key CDM domains-metacognition, cognitive flexibility, episodic …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.8

Not All Layers Are Created Equal: Adaptive LoRA Ranks for Personalized Image Generation

2026-03-23 · Donald Shenaj, Federico Errica, Antonio Carta

General AI

Low Rank Adaptation (LoRA) is the de facto fine-tuning strategy to generate personalized images from pre-trained diffusion models. Choosing a good rank is extremely critical, since it trades off performance and memory consumption, but today the decision is often left to the community's consensus, regardless of the pers…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.8

MetaKube: An Experience-Aware LLM Framework for Kubernetes Failure Diagnosis

2026-03-24 · Wei Sun, Ting Wang, Xinran Tian, Wanshun Lan, Xuhan Feng, Haoyue Li, Fangxin Wang

General AI

Existing LLM-based Kubernetes diagnostic systems cannot learn from operational experience, operating on static knowledge bases without improving from past resolutions. We present MetaKube, an experience-aware LLM framework through three synergistic innovations: (1) an Episodic Pattern Memory Network (EPMN) that abstrac…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.6

PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference

2026-03-26 · Xiaofeng Mao, Shaohao Rui, Kaining Ying, Bo Zheng, Chuanhao Li, Mingmin Chi, Kaipeng Zhang

General AI

Autoregressive video diffusion models have demonstrated remarkable progress, yet they remain bottlenecked by intractable linear KV-cache growth, temporal repetition, and compounding errors during long-video generation. To address these challenges, we present PackForcing, a unified framework that efficiently manages the…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.8

AVO: Agentic Variation Operators for Autonomous Evolutionary Search

2026-03-25 · Terry Chen, Zhifan Ye, Bing Xu, Zihao Ye, Timmy Liu, Ali Hassani, Tianqi Chen, Andrew Kerr, Haicheng Wu, Yang Xu, Yu-Jung Chen, Hanfeng Chen, Aditya Kane, Ronny Krashinsky, Ming-Yu Liu, Vinod Grover, Luis Ceze, Roger Bringmann, John Tran, Wei Liu, Fung Xie, Michael Lightstone, Humphrey Shi

General AI

Agentic Variation Operators (AVO) are a new family of evolutionary variation operators that replace the fixed mutation, crossover, and hand-designed heuristics of classical evolutionary search with autonomous coding agents. Rather than confining a language model to candidate generation within a prescribed pipeline, AVO…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.8

GridVAD: Open-Set Video Anomaly Detection via Spatial Reasoning over Stratified Frame Grids

2026-03-26 · Mohamed Eltahir, Ahmed O. Ibrahim, Obada Siralkhatim, Tabarak Abdallah, Sondos Mohamed

Research Track A · General AI

Vision-Language Models (VLMs) are powerful open-set reasoners, yet their direct use as anomaly detectors in video surveillance is fragile: without calibrated anomaly priors, they alternate between missed detections and hallucinated false alarms. We argue the problem is not the VLM itself but how it is used. VLMs should…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.6

Enhancing Efficiency and Performance in Deepfake Audio Detection through Neuron-level Dropin & Neuroplasticity Mechanisms

2026-03-25 · Yupei Li, Shuaijie Shao, Manuel Milling, Björn Schuller

General AI

Current audio deepfake detection has achieved remarkable performance using diverse deep learning architectures such as ResNet, and has seen further improvements with the introduction of large models (LMs) like Wav2Vec. The success of large language models (LLMs) further demonstrates the benefits of scaling model parame…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.6

Beyond Via: Analysis and Estimation of the Impact of Large Language Models in Academic Papers

2026-03-26 · Mingmeng Geng, Yuhang Dong, Thierry Poibeau

General AI

Through an analysis of arXiv papers, we report several shifts in word usage that are likely driven by large language models (LLMs) but have not previously received sufficient attention, such as the increased frequency of "beyond" and "via" in titles and the decreased frequency of "the" and "of" in abstracts. Due to the…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.6

PixelSmile: Toward Fine-Grained Facial Expression Editing

2026-03-26 · Jiabin Hua, Hengyuan Xu, Aojie Li, Wei Cheng, Gang Yu, Xingjun Ma, Yu-Gang Jiang

General AI

Fine-grained facial expression editing has long been limited by intrinsic semantic overlap. To address this, we construct the Flex Facial Expression (FFE) dataset with continuous affective annotations and establish FFE-Bench to evaluate structural confusion, editing accuracy, linear controllability, and the trade-off b…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.6

Self-Improvement of Large Language Models: A Technical Overview and Future Outlook

2026-03-26 · Haoyan Yang, Mario Xerri, Solha Park, Huajian Zhang, Yiyang Feng, Sai Akhil Kogilathota, Jiawei Zhou

General AI

As large language models (LLMs) continue to advance, improving them solely through human supervision is becoming increasingly costly and limited in scalability. As models approach human-level capabilities in certain domains, human feedback may no longer provide sufficiently informative signals for further improvement. …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.5

Anansi: Scalable Characterization of Message-Based Job Scams

2026-02-27 · Abisheka Pitumpe, Amir Rahmati

Research Track B · General AI

Job-based smishing scams, where victims are recruited under the guise of remote job opportunities, represent a rapidly growing and understudied threat within the broader landscape of online fraud. In this paper, we present Anansi, the first scalable, end-to-end measurement pipeline designed to systematically engage wit…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.0

Extending Precipitation Nowcasting Horizons via Spectral Fusion of Radar Observations and Foundation Model Priors

2026-03-23 · Yuze Qin, Qingyong Li, Zhiqing Guo, Wen Wang, Yan Liu, Yangli-ao Geng

General AI

Precipitation nowcasting is critical for disaster mitigation and aviation safety. However, radar-only models frequently suffer from a lack of large-scale atmospheric context, leading to performance degradation at longer lead times. While integrating meteorological variables predicted by weather foundation models offers…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Scaling DoRA: High-Rank Adaptation via Factored Norms and Fused Kernels

2026-03-23 · Alexandra Zelenin, Alexandra Zhuravlyova

General AI

Weight-Decomposed Low-Rank Adaptation (DoRA) extends LoRA by decoupling weight magnitude from direction, but its forward pass requires the row-wise norm of W + sBA, a computation that every major framework we surveyed implements by materializing the dense [d_out, d_in] product BA. At d_in = 8192 and rank r = 384, a sin…

Review
pending
Role
unreviewed
Read
now
huggingface Score 8.8

AVControl: Efficient Framework for Training Audio-Visual Controls

2026-03-25 · Matan Ben-Yosef, Tavi Halperin, Naomi Ken Korem, Mohammad Salama, Harel Cain, Asaf Joseph, Anthony Chen, Urska Jelercic, Ofir Bibi

General AI

Controlling video and audio generation requires diverse modalities, from depth and pose to camera trajectories and audio transformations, yet existing approaches either train a single monolithic model for a fixed set of controls or introduce costly architectural changes for each new modality. We introduce AVControl, a …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.8

VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

2026-03-25 · Qijia He, Xunmei Liu, Hammaad Memon, Ziang Li, Zixian Ma, Jaemin Cho, Jason Ren, Daniel S Weld, Ranjay Krishna

General AI

Scalable Vector Graphics (SVG) are an essential format for technical illustration and digital design, offering precise resolution independence and flexible semantic editability. In practice, however, original vector source files are frequently lost or inaccessible, leaving only "flat" rasterized versions (e.g., PNG or …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.6

A Mentalistic Interface for Probing Folk-Psychological Attribution to Non-Humanoid Robots

2026-03-26 · Giulio Pisaneschi, Pierpaolo Serio, Estelle Gerbier, Andrea Dan Ryals, Lorenzo Pollini, Mario G. C. A. Cimino

General AI

This paper presents an experimental platform for studying intentional-state attribution toward a non-humanoid robot. The system combines a simulated robot, realistic task environments, and large language model-based explanatory layers that can express the same behavior in mentalistic, teleological, or mechanistic terms…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.6

Measuring What Matters -- or What's Convenient?: Robustness of LLM-Based Scoring Systems to Construct-Irrelevant Factors

2026-03-26 · Cole Walsh, Rodica Ivan

General AI

Automated systems have been widely adopted across the educational testing industry for open-response assessment and essay scoring. These systems commonly achieve performance levels comparable to or superior than trained human raters, but have frequently been demonstrated to be vulnerable to the influence of construct-i…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

Parameter-Efficient Fine-Tuning for Medical Text Summarization: A Comparative Study of Lora, Prompt Tuning, and Full Fine-Tuning

2026-03-23 · Ulugbek Shernazarov, Rostislav Svitsov, Bin Shi

General AI

Fine-tuning large language models for domain-specific tasks such as medical text summarization demands substantial computational resources. Parameter-efficient fine-tuning (PEFT) methods offer promising alternatives by updating only a small fraction of parameters. This paper compares three adaptation approaches-Low-Ran…

Review
pending
Role
unreviewed
Read
now
arxiv Score 7.8

Rethinking Multimodal Fusion for Time Series: Auxiliary Modalities Need Constrained Fusion

2026-03-23 · Seunghan Lee, Jun Seo, Jaehoon Lee, Sungdong Yoo, Minjae Kim, Tae Yoon Lim, Dongwan Kang, Hwanil Choi, SoonYoung Lee, Wonbin Ahn

General AI

Recent advances in multimodal learning have motivated the integration of auxiliary modalities such as text or vision into time series (TS) forecasting. However, most existing methods provide limited gains, often improving performance only in specific datasets or relying on architecture-specific designs that limit gener…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.6

Designing Any Imaging System from Natural Language: Agent-Constrained Composition over a Finite Primitive Basis

2026-03-26 · Chengshuai Yang

General AI

Designing a computational imaging system -- selecting operators, setting parameters, validating consistency -- requires weeks of specialist effort per modality, creating an expertise bottleneck that excludes the broader scientific community from prototyping imaging instruments. We introduce spec.md, a structured specif…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.6

The Kitchen Loop: User-Spec-Driven Development for a Self-Evolving Codebase

2026-03-26 · Yannick Roy

General AI

Code production is now a commodity; the bottleneck is knowing what to build and proving it works. We present the Kitchen Loop, a framework for autonomous, self-evolving software built on a unified trust model: (1) a specification surface enumerating what the product claims to support; (2) 'As a User x 1000', where an L…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

Frequency Switching Mechanism for Parameter-E!cient Multi-Task Learning

2026-03-22 · Shih-Wen Liu, Yen-Chang Chen, Wei-Ta Chu, Fu-En Yang, Yu-Chiang Frank Wang

General AI

Multi-task learning (MTL) aims to enable a single model to solve multiple tasks efficiently; however, current parameter-efficient fine-tuning (PEFT) methods remain largely limited to single-task adaptation. We introduce \textbf{Free Sinewich}, a parameter-efficient multi-task learning framework that enables near-zero-c…

Review
pending
Role
unreviewed
Read
now
huggingface Score 6.8

Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration

2026-03-25 · Danil Tokhchukov, Aysel Mirzoeva, Andrey Kuznetsov, Konstantin Sobolev

General AI

In this paper, we uncover the hidden potential of Diffusion Transformers (DiTs) to significantly enhance generative tasks. Through an in-depth analysis of the denoising process, we demonstrate that introducing a single learned scaling parameter can significantly improve the performance of DiT blocks. Building on this i…

Review
pending
Role
unreviewed
Read
later
huggingface Score 6.8

WAFT-Stereo: Warping-Alone Field Transforms for Stereo Matching

2026-03-25 · Yihan Wang, Jia Deng

General AI

We introduce WAFT-Stereo, a simple and effective warping-based method for stereo matching. WAFT-Stereo demonstrates that cost volumes, a common design used in many leading methods, are not necessary for strong performance and can be replaced by warping with improved efficiency. WAFT-Stereo ranks first on ETH3D, KITTI a…

Review
pending
Role
unreviewed
Read
later
huggingface Score 6.8

PMT: Plain Mask Transformer for Image and Video Segmentation with Frozen Vision Encoders

2026-03-26 · Niccolò Cavagnero, Narges Norouzi, Gijs Dubbelman, Daan de Geus

General AI

Vision Foundation Models (VFMs) pre-trained at scale enable a single frozen encoder to serve multiple downstream tasks simultaneously. Recent VFM-based encoder-only models for image and video segmentation, such as EoMT and VidEoMT, achieve competitive accuracy with remarkably low latency, yet they require finetuning th…

Review
pending
Role
unreviewed
Read
later
huggingface Score 6.8

S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation

2026-03-26 · Ligong Han, Hao Wang, Han Gao, Kai Xu, Akash Srivastava

General AI

Block-diffusion language models offer a promising path toward faster-than-autoregressive generation by combining block-wise autoregressive decoding with within-block parallel denoising. However, in the few-step regime needed for practical acceleration, standard confidence-thresholded decoding is often brittle: aggressi…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.6

OptiSAR-Net++: A Large-Scale Benchmark and Transformer-Free Framework for Cross-Domain Remote Sensing Visual Grounding

2026-03-25 · Xiaoyu Tang, Jun Dong, Jintao Cheng, Rui Fan

General AI

Remote sensing visual grounding (RSVG) aims to localize specific targets in remote sensing images using natural language expressions. However, existing methods are restricted to single-sensor domains, i.e., either optical or synthetic aperture radar (SAR), limiting their real-world applicability. In this paper, we intr…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.6

Conchordal: Emergent Harmony via Direct Cognitive Coupling in a Psychoacoustic Landscape

2026-03-26 · Koichi Takahashi

General AI

This paper introduces Conchordal, a bio-acoustic instrument for generative composition whose sonic agents are governed by artificial life dynamics within a psychoacoustic fitness landscape. The system is built on Direct Cognitive Coupling (DCC), a design principle requiring that generative dynamics operate directly wit…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.6

MegaFlow: Zero-Shot Large Displacement Optical Flow

2026-03-26 · Dingxi Zhang, Fangjinhua Wang, Marc Pollefeys, Haofei Xu

General AI

Accurate estimation of large displacement optical flow remains a critical challenge. Existing methods typically rely on iterative local search or/and domain-specific fine-tuning, which severely limits their performance in large displacement and zero-shot generalization scenarios. To overcome this, we introduce MegaFlow…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.6

On the Formalization of Network Topology Matrices in HOL

2026-03-26 · Kubra Aksoy, Adnan Rashid, Osman Hasan, Sofiene Tahar

General AI

Network topology matrices are algebraic representations of graphs that are widely used in modeling and analysis of various applications including electrical circuits, communication networks and transportation systems. In this paper, we propose to use Higher-Order-Logic (HOL) based interactive theorem proving to formali…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.6

ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

2026-03-26 · Yawen Luo, Xiaoyu Shi, Junhao Zhuang, Yutian Chen, Quande Liu, Xintao Wang, Pengfei Wan, Tianfan Xue

General AI

Multi-shot video generation is crucial for long narrative storytelling, yet current bidirectional architectures suffer from limited interactivity and high latency. We propose ShotStream, a novel causal multi-shot architecture that enables interactive storytelling and efficient on-the-fly frame generation. By reformulat…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.8

FCL-COD: Weakly Supervised Camouflaged Object Detection with Frequency-aware and Contrastive Learning

2026-03-24 · Jingchen Ni, Quan Zhang, Dan Jiang, Keyu Lv, Ke Zhang, Chun Yuan

General AI

Existing camouflage object detection (COD) methods typically rely on fully-supervised learning guided by mask annotations. However, obtaining mask annotations is time-consuming and labor-intensive. Compared to fully-supervised methods, existing weakly-supervised COD methods exhibit significantly poorer performance. Eve…

Review
pending
Role
unreviewed
Read
later
huggingface Score 5.4

Representation Alignment for Just Image Transformers is not Easier than You Think

2026-03-15 · Jaeyo Shin, Jiwook Kim, Hyunjung Shim

General AI

Representation Alignment (REPA) has emerged as a simple way to accelerate Diffusion Transformers training in latent space. At the same time, pixel-space diffusion transformers such as Just image Transformers (JiT) have attracted growing attention because they remove a dependency on a pretrained tokenizer, and then avoi…

Review
pending
Role
unreviewed
Read
later