Deep Reads

arxiv Score 37.0

Neural Subspace Reallocation: Continual Learning as Retrieval-Based Subspace Memory Management

2026-06-29 · Byeong Hoon Yoon

Research Track A · General AI

We introduce Neural Subspace Reallocation (NSR), which reframes continual learning as memory management over parameter subspaces. Instead of treating Low-Rank Adaptation (LoRA) modules as disposable per-task adapters, NSR manages them as compressible, retrievable memory units on a frozen backbone through a recurring cy…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 36.4

RECALL: Recovery Experience Collection for Active Lifelong Learning in Vision-Language-Action Models

2026-06-22 · Ulas Berk Karli, Tesca Fitzgerald

Research Track A · General AI

Vision-Language-Action (VLA) models are commonly fine-tuned through passive imitation learning, where additional demonstrations are collected for tasks where the policy performs poorly. This approach incurs several downsides: it requires the robot to fail before data collection is triggered, provides little guidance ab…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 35.0

Lifelong Learning in Vision-Language Models: Enhanced EWC with Cross-Modal Knowledge Retention

2026-05-12 · Hamza Ahmed Durrani, Rafay Suleman Durrani

Research Track A · General AI

Large language-vision models (LVLMs) such as CLIP, Flamingo, and BLIP have revolutionized AI by enabling understanding across textual and visual modalities. These models excel at tasks like image captioning, visual question answering, and cross-modal retrieval. However, they face catastrophic forgetting when learning n…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 34.5

Modular Continual Learning via Zero-Leakage Reconstruction Routing and Autonomous Task Discovery

2026-04-15 · Noureddine Kermiche

Research Track A · General AI

Catastrophic forgetting remains a primary hurdle in sequential task learning for artificial neural networks. We propose a silicon-native modular architecture that achieves structural parameter isolation using Task-Specific Experts and a distributed, outlier-based Gatekeeper. Moving beyond traditional sequential consoli…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 30.0

Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning

2026-03-12 · Jiaheng Hu, Jay Shim, Chen Tang, Yoonchang Sung, Bo Liu, Peter Stone, Roberto Martin-Martin

Research Track A · General AI

Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual learning suggests that naive Sequential Fine-Tuning (Seq. FT) leads to catastrophi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 29.5

The Fine-Tuning Trap: Evaluating Negative Transfer and the Role of PEFT in Sub-1B Mathematical Reasoning

2026-06-05 · Rahul Nair, Chun Tao

Research Track A · General AI

Deploying Small Language Models (SLMs) on edge devices requires efficient fine-tuning strategies that adapt models to new tasks without degrading their general capabilities. In this study, we benchmark five sub-1B models (135M-1B) on mathematical reasoning tasks and uncover a critical vulnerability: Full Fine-Tuning (F…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 29.0

JumpLoRA: Sparse Adapters for Continual Learning in Large Language Models

2026-04-17 · Alexandra Dragomir, Ioana Pintilie, Antonio Barbalau, Marius Dragoi, Florin Brad, Cristian Daniel Paduraru, Alexandru Tifrea, Elena Burceanu, Radu Tudor Ionescu

Research Track A · General AI

Adapter-based methods have become a cost-effective approach to continual learning (CL) for Large Language Models (LLMs), by sequentially learning a low-rank update matrix for each task. To mitigate catastrophic forgetting, state-of-the-art approaches impose constraints on new adapters with respect to the previous ones,…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 28.5

Structured Distillation of Web Agent Capabilities Enables Generalization

2026-04-09 · Xing Han Lù, Siva Reddy

Research Track B · General AI

Frontier LLMs can navigate complex websites, but their cost and reliance on third-party APIs make local deployment impractical. We introduce Agent-as-Annotators, a framework that structures synthetic trajectory generation for web agents by analogy to human annotation roles, replacing the Task Designer, Annotator, and S…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 28.4

The Hitchhiker's Guide to Agentic AI: From Foundations to Systems

2026-06-22 · Haggai Roitman

General AI

The Hitchhiker's Guide to Agentic AI is a comprehensive practitioner's reference for building autonomous AI systems. The book covers the full stack from first principles to production deployment, organized around a central thesis: building great agentic systems requires understanding every layer of the pipeline, not ju…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 27.5

ContractSkill: Repairable Contract-Based Skills for Multimodal Web Agents

2026-03-20 · Zijian Lu, Yiping Zuo, Yupeng Nie, Xin He, Weibei Fan, Chen Dai, Lianyong Qi, Shi Jin

Research Track B · General AI

Despite rapid progress in multimodal GUI agents, reusable skill acquisition remains difficult because on-demand generated skills often leave action semantics, state assumptions, and success criteria implicit. This makes them brittle to execution errors, hard to verify, and difficult to repair. We present ContractSkill,…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 27.5

FOGO: Forgetting-aware Orthogonalization Optimizer

2026-06-09 · Toan Nguyen, Yang Liu, Trung Le, Celso de Melo, Flora D. Salim

Research Track A · General AI

We argue that forgetting is not confined to continual learning but is a general optimization phenomenon: during standard training, dominant mini-batch gradients suppress rare but useful update directions, causing short-term forgetting at every step. When such knowledge is never revisited, these losses compound into lon…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 27.0

Weasel: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data Selection

2026-05-19 · Fatemeh Pesaran zadeh, Seyeon Choi, Xing Han Lù, Siva Reddy, Gunhee Kim, Fatemeh Pesaran Zadeh

Research Track B · General AI

Large language models (LLMs) have enabled web agents that follow natural language goals through multi-step browser interactions. However, agents fine-tuned on specific trajectories and domain often struggle to generalize out of domain, and offline training can be compute-inefficient due to noisy, redundant trajectories…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 27.0

Towards Continual Motion-Language Agents: LoRA Variants for Incremental Motion Understanding and Generation

2026-06-29 · Bertram Taetz, Hugo Albuquerque Cosme da Silva, Gabriele Bleser-Taetz

Research Track A · General AI

Motion-language agents must possess the bidirectional capability to both understand human movement (motion-to-text, M2T) and generate it from natural language (text-to-motion, T2M). While foundational models have achieved strong performance in static settings, autonomous agents operating in dynamic environments must co…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 26.4

V-Zero: Answer-Label-Free On-Policy Distillation with Contrastive Evidence Gating for Fine-Grained Visual Reasoning

2026-06-24 · Haoxiang Sun, Zhihang Yi, Langxuan Deng, Yuhao Zhou, Peiqi Jia, Jian Zhao, Li Yuan, Jiancheng Lv, Tao Wang

General AI

Fine-grained visual reasoning requires multimodal large language models (MLLMs) to identify task-relevant visual evidence and ground their reasoning in local image regions. Existing agentic methods typically rely on reinforcement learning with verifiable rewards or supervised fine-tuning on large-scale annotated reason…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 26.3

ATP-Bench: Towards Agentic Tool Planning for MLLM Interleaved Generation

2026-03-31 · Yinuo Liu, Zi Qian, Heng Zhou, Jiahao Zhang, Yajie Zhang, Zhihang Li, Mengyu Zhou, Erchao Zhao, Xiaoxi Jiang, Guanjun Jiang

General AI

Interleaved text-and-image generation represents a significant frontier for Multimodal Large Language Models (MLLMs), offering a more intuitive way to convey complex information. Current paradigms rely on either image generation or retrieval augmentation, yet they typically treat the two as mutually exclusive paths, fa…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 26.3

Automatic Ontology Construction Using LLMs as an External Layer of Memory, Verification, and Planning for Hybrid Intelligent Systems

2026-04-22 · Pavel Salovskii, Iuliia Gorshkova

General AI

This paper presents a hybrid architecture for intelligent systems in which large language models (LLMs) are extended with an external ontological memory layer. Instead of relying solely on parametric knowledge and vector-based retrieval (RAG), the proposed approach constructs and maintains a structured knowledge graph …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 26.3

SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks

2026-06-08 · Hongcheng Gao, Hailong Qu, Jingyi Tang, Jiahao Wang, Zihao Huang, Hengkang Qiao, Shihong Huang, Junming Yang, Yi Li, Hongyixuan Yuan, Wenjie Li, Bohan Zeng, Wenbo Li, Bo Wang, Jianhui Liu, Olive Huang, Haoyang Huang, Wentao Zhang, Guoqing Huang, Nan Duan, Yinpeng Dong

General AI

Spatial reasoning is a foundational capability for multimodal large language models (MLLMs) to perceive and operate within the physical world. However, existing benchmarks predominantly rely on passive evaluation (e.g., static VQA) or simulator-specific pipelines, failing to assess general interactive spatial understan…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 26.0

Low-Rank Adaptation Reduces Catastrophic Forgetting in Sequential Transformer Encoder Fine-Tuning: Controlled Empirical Evidence and Frozen-Backbone Representation Probes

2026-03-29 · Ashish Pandey

Research Track A

Sequential fine-tuning of pretrained language encoders often overwrites previously acquired capabilities, but the forgetting behavior of parameter-efficient updates remains under-characterized. We present a controlled empirical study of Low-Rank Adaptation (LoRA) in sequential transformer encoder fine-tuning with compa…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 26.0

WebChallenger: A Reliable and Efficient Generalist Web Agent

2026-06-09 · Jayoo Hwang, Xiaowen Zhang, Vedant Padwal

Research Track B · General AI

Autonomous web navigation remains challenging for LLM agents, and the strongest generalist systems rely on proprietary reasoning models whose inference cost is prohibitive for the repetitive tasks where such agents would be most useful. We argue this gap stems not from insufficient model capability but from agent archi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 26.0

Are Online Skill and Memory Modules Always Worth Their Tokens? A Budget-Constrained Study of Web Agents

2026-06-12 · Sina Hajimiri, Masih Aminbeidokhti, Jose Dolz, Ismail Ben Ayed, Issam H. Laradji, Spandana Gella, Nicolas Gontier

Research Track B · General AI

Online web agents often augment a base actor with memory, workflow, or skill modules. These modules can improve performance, but they also consume test-time tokens, a cost rarely reported alongside the actor's inference cost. We study online augmentation, where this overhead is paid on every task, and re-evaluate its b…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 25.9

Lifelong In-Context Learning with Transformers Requires Parametric Forms of Attention

2026-06-24 · Luke McDermott, Robert W. Heath, Rahul Parhi

Research Track A · General AI

Lifelong continual learning remains an obstacle on the path to human-like intelligence. Modern transformers show sparks of intelligence with in-context learning. The quadratic nature of attention, however, prohibits transformers from performing this process on arbitrarily long sequences. In this work, we argue that ext…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 25.8

Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles

2026-05-21 · Jinyang Wu, Guocheng Zhai, Ruihan Jin, Yuhao Shen, Zhengxi Lu, Fan Zhang, Haoran Luo, Zheng Lian, Zhengqi Wen, Jianhua Tao

General AI

The proliferation of large language models (LLMs) and modular skills has endowed autonomous agents with increasingly powerful capabilities. Existing frameworks typically rely on monolithic LLMs and fixed logic to interface with these skills. This gives rise to a critical bottleneck: different LLMs offer distinct advant…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 25.5

HiP-LoRA: Budgeted Spectral Plasticity for Robust Low-Rank Adaptation

2026-04-20 · Lixian Chen, Jianhong Tan

Research Track A

Adapting foundation models under resource budgets relies heavily on Parameter-Efficient Fine-Tuning (PEFT), with LoRA being a standard modular solution. However, LoRA suffers from spectral interference. Low-rank updates often concentrate energy on the leading singular directions of pretrained weights, perturbing genera…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 25.0

Towards Lifelong Aerial Autonomy: Geometric Memory Management for Continual Visual Place Recognition in Dynamic Environments

2026-04-10 · Xingyu Shao, Zhiqiang Yan, Liangzheng Sun, Mengfan He, Chao Chen, Jinhui Zhang, Chunyu Li, Ziyang Meng

Research Track A · General AI

Robust geo-localization in changing environmental conditions is critical for long-term aerial autonomy. While visual place recognition (VPR) models perform well when airborne views match the training domain, adapting them to shifting distributions during sequential missions triggers catastrophic forgetting. Existing co…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 25.0

Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability

2026-04-23 · Nicolae Filat, Ahmed Hussain, Konstantinos Kalogiannis, Elena Burceanu

Research Track A · General AI

Streaming Continual Learning (CL) typically converts a continuous stream into a sequence of discrete tasks through temporal partitioning. We argue that this temporal taskification step is not a neutral preprocessing choice, but a structural component of evaluation: different valid splits of the same stream can induce d…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 25.0

PRISM: Exposing and Resolving Spurious Isolation in Federated Multimodal Continual Learning

2026-05-01 · Beining Wu, Zihao Ding, Jun Huang

Research Track A · General AI

While current federated multimodal continual learning over mixture-of-experts low-rank adaptation (MoE-LoRA) is built on the unverified assumption that routing isolates task-specific knowledge into disjoint experts, we argue that routing operates per-sample, while forgetting accumulates across the task sequence, and gr…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 25.0

CLaaS: Continual learning as a service for sample efficient online learning

2026-06-04 · Kion Fallah, Silen Naihin, Barak Widawsky, Qingqing Mao

Research Track A · General AI

Deployed large language model agents must adapt to distribution shift in dynamic environments. Ideally, adaptation can be performed from accumulated agent experiences and retain prior capabilities while transferring to future tasks. However, agent actions and environmental transitions can only be sampled once per scena…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 24.5

Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

2026-04-15 · Xiaohua Wang, Muzhao Tian, Yuqi Zeng, Zisu Huang, Jiakang Yuan, Bowen Chen, Jingwen Xu, Mingbo Zhou, Wenhao Liu, Muling Wu, Zhengkang Guo, Qi Qian, Yifei Wang, Feiran Zhang, Ruicheng Yin, Shihan Dou, Changze Lv, Tao Chen, Kaitao Song, Xu Tan, Tao Gui, Xiaoqing Zheng, Xuanjing Huang

General AI

Reinforcement Learning from Human Feedback (RLHF) and related alignment paradigms have become central to steering large language models (LLMs) and multimodal large language models (MLLMs) toward human-preferred behaviors. However, these approaches introduce a systemic vulnerability: reward hacking, where models exploit…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 24.5

MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism

2026-06-05 · Cong Chen, Guo Gan, Kaixiang Ji, ChaoYang Zhang, Zhen Yang, Guangming Yao, Hao Chen, Jingdong Chen, Yi Yuan, Chunhua Shen

General AI

Current Vision-Language Models struggle with hours-long videos because processing full-length visual sequences induces prohibitive token explosion and attention dilution. To overcome this, we introduce MemDreamer to decouple perception and reasoning, shifting long-video understanding into an agentic exploration process…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 24.5

Parametric Skills

2026-06-29 · Xuan Zhao, Haonan He, Qingyu Yang, Minglei Li, Jingqi Ye, Zelin Tan, Bo Wan, Peng Ye

Research Track A · General AI

Since intelligence fundamentally relies on efficient skill acquisition (Chollet, 2019), the ability to leverage skills is critical. For LLMs, skills, manually authored or extracted from task trajectories, are textual recipes encoding mature problem-solving experience and are critical to agentic capabilities. Despite wi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 24.4

MixedPEFT: Combining Multiple PEFT Methods with Mixed Objectives for Unsupervised Domain Adaptation

2026-06-20 · Mohammed Rawhani, Dervis Karaboga, Ozkan Ufuk Nalbantoglu, Alper Basturk, Bahriye Akay

Research Track A · General AI

Pre-trained language models struggle when applied to new domains, as full fine-tuning is computationally expensive and prone to catastrophic forgetting. This study addresses this challenge by presenting a novel parameter-efficient strategy for unsupervised domain adaptation that combines custom PEFT architectures with …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 24.3

FineCog-Nav: Integrating Fine-grained Cognitive Modules for Zero-shot Multimodal UAV Navigation

2026-04-17 · Dian Shao, Zhengzheng Xu, Peiyang Wang, Like Liu, Yule Wang, Jieqi Shi, Jing Huo

General AI

UAV vision-language navigation (VLN) requires an agent to navigate complex 3D environments from an egocentric perspective while following ambiguous multi-step instructions over long horizons. Existing zero-shot methods remain limited, as they often rely on large base models, generic prompts, and loosely coordinated mod…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 24.3

A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding

2026-04-21 · Shuai Wang, Hongyi Zhu, Jia-Hong Huang, Yixian Shen, Chengxi Zeng, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring

General AI

Understanding artworks requires multi-step reasoning over visual content and cultural, historical, and stylistic context. While recent multimodal large language models show promise in artwork explanation, they rely on implicit reasoning and internalized knowl- edge, limiting interpretability and explicit evidence groun…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 24.3

Context-Aware RL for Agentic and Multimodal LLMs

2026-06-15 · Peiyang Xu, Bangzheng Li, Sijia Liu, Karthik R. Narasimhan, Pramod Viswanath, Prateek Mittal, Xingyu Fu

General AI

Large language models (LLMs) often fail when answering requires identifying a small but decisive piece of evidence within a long or complex context, such as a single line in a tool trace or a subtle detail in an image. We propose ContextRL, a context-aware reinforcement learning (RL) method that improves long-horizon r…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 24.0

Universe Routing: Why Self-Evolving Agents Need Epistemic Control

2026-03-16 · Zhaohui Geoffrey Wang

Research Track A · General AI

A critical failure mode of current lifelong agents is not lack of knowledge, but the inability to decide how to reason. When an agent encounters "Is this coin fair?" it must recognize whether to invoke frequentist hypothesis testing or Bayesian posterior inference - frameworks that are epistemologically incompatible. M…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 24.0

Face-D(^2)CL: Multi-Domain Synergistic Representation with Dual Continual Learning for Facial DeepFake Detection

2026-04-09 · Yushuo Zhang, Yu Cheng, Yongkang Hu, Jiuan Zhou, Jiawei Chen, Yuan Xie, Zhaoxia Yin

Research Track A

The rapid advancement of facial forgery techniques poses severe threats to public trust and information security, making facial DeepFake detection a critical research priority. Continual learning provides an effective approach to adapt facial DeepFake detection models to evolving forgery patterns. However, existing met…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 24.0

Continual Calibration: Coverage Can Collapse Before Accuracy in Lifelong LLM Fine-Tuning

2026-04-27 · Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

Research Track A · General AI

Continual learning for large language models is typically evaluated through accuracy retention under sequential fine-tuning. We argue that this perspective is incomplete, because uncertainty reliability can degrade earlier and more sharply than top-1 performance. We study this empirically by measuring conformal coverag…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 24.0

GrepSeek: Training Search Agents for Direct Corpus Interaction

2026-05-28 · Alireza Salemi, Chang Zeng, Atharva Nijasure, Jui-Hui Chung, Razieh Rahimi, Fernando Diaz, Hamed Zamani

General AI

Large Language Model (LLM) search agents have shown strong promise for knowledge-intensive language tasks through multiple rounds of reasoning and information retrieval. Most existing systems access information using a retriever that takes a keyword or natural language query and returns a ranked list of documents using…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 23.9

CADRE: Stable, Parameter Efficient Adaptation of Medical Vision Language Models with Bounded Forgetting and Prior Drift

2026-06-22 · Amrita Singh, Rishabh Jha

Research Track A

Medical vision-language models (VLMs) such as BiomedCLIP generalize broadly, but adapting them to a clinical service is as much a safety problem as an accuracy one. Updating a deployed model for a new imaging modality can fail silently in two ways that harm patients: it can forget modalities it already handled (catastr…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 23.8

STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?

2026-05-07 · Hanxiang Chao, Yihan Bai, Rui Sheng, Tianle Li, Yushi Sun

Research Track A · General AI

Large Language Model (LLM) agents are increasingly expected to maintain coherent, long-term personalized memory, yet current benchmarks primarily measure static fact retrieval, overlooking the ability to revise stored beliefs when new evidence emerges. We identify a critical and underexplored failure mode, Implicit Con…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 23.5

PersonalAI: A Systematic Comparison of Knowledge Graph Storage and Retrieval Approaches for Personalized LLM agents

2026-04-12 · Mikhail Menschikov, Dmitry Evseev, Victoria Dochkina, Ruslan Kostoev, Ilia Perepechkin, Petr Anokhin, Nikita Semenov, Evgeny Burnaev

General AI

Personalizing language models by effectively incorporating user interaction history remains a central challenge in the development of adaptive AI systems. While large language models (LLMs), combined with Retrieval-Augmented Generation (RAG), have improved factual accuracy, they often lack structured memory and fail to…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 23.5

Fact4ac at the Financial Misinformation Detection Challenge Task: Reference-Free Financial Misinformation Detection via Fine-Tuning and Few-Shot Prompting of Large Language Models

2026-04-16 · Cuong Hoang, Le-Minh Nguyen

Research Track A · General AI

The proliferation of financial misinformation poses a severe threat to market stability and investor trust, misleading market behavior and creating critical information asymmetry. Detecting such misleading narratives is inherently challenging, particularly in real-world scenarios where external evidence or supplementar…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 23.3

Enhancing Web Agents with a Hierarchical Memory Tree

2026-03-07 · Yunteng Tan, Zhi Gao, Xinxiao Wu

Research Track B · General AI

Large language model-based web agents have shown strong potential in automating web interactions through advanced reasoning and instruction following. While retrieval-based memory derived from historical trajectories enables these agents to handle complex, long-horizon tasks, current methods struggle to generalize acro…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 23.3

MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents

2026-04-06 · Shu Wang, Edwin Yu, Oscar Love, Tom Zhang, Tom Wong, Steve Scargall, Charles Fan

General AI

Large Language Model (LLM) agents require persistent memory to maintain personalization, factual continuity, and long-horizon reasoning, yet standard context-window and retrieval-augmented generation (RAG) pipelines degrade over multi-session interactions. We present MemMachine, an open-source memory system that integr…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 23.3

A History-Aware Visually Grounded Critic for Computer Use Agents

2026-06-09 · Jaewoo Lee, Zaid Khan, Archiki Prasad, Justin Chih-Yao Chen, Supriyo Chakraborty, Kartik Balasubramaniam, Sambit Sahu, Elias Stengel-Eskin, Hyunji Lee, Mohit Bansal

Research Track A · Research Track B · General AI

Various test-time interventions for Computer Use Agents (CUAs), including critic models, have been developed to improve performance through pre-execution action evaluation in complex Graphical User Interface (GUI) environments. However, existing critics suffer from two key limitations: they (1) focus primarily on short…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 23.3

Native Active Perception as Reasoning for Omni-Modal Understanding

2026-06-17 · Zhenghao Xing, Ruiyang Xu, Yuxuan Wang, Jinzheng He, Ziyang Ma, Qize Yang, Yunfei Chu, Jin Xu, Junyang Lin, Chi-Wing Fu, Pheng-Ann Heng

General AI

Passive models for long video understanding typically rely on a "watch-it-all" paradigm, processing frames uniformly regardless of query difficulty, causing computational cost to grow with video duration. Although interactive frameworks have emerged, they often rely on global pre-scanning, and their context cost still …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 22.8

Self-Evolving World Models for LLM Agent Planning

2026-06-29 · Xuan Zhang, Wenxuan Zhang, See-Kiong Ng, Yang Deng

General AI

World models offer a principled way to equip long-horizon LLM agents with foresight: predictions of action consequences before execution. However, unreliable foresight can be ignored, misused, or even degrade downstream decision-making. In this paper, we introduce WorldEvolver, a self-evolving world model framework tha…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 22.8

AgenticSTS: A Bounded-Memory Testbed for Long-Horizon LLM Agents

2026-07-02 · Xiangchen Cheng, Yunwei Jiang, Jianwen Sun, Zizhen Li, Chuanhao Li, Xiangcheng Cao, Yihao Liu, Fanrui Zhang, Li Jin, Kaipeng Zhang

General AI

Memory for a long-horizon LLM agent is a contract about what each future decision is allowed to see. The simplest contract appends past observations, tool calls, and reflections to every prompt, which makes prior context easy to access but also turns it into a jumbled mixture in which the effect of any single memory co…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 22.5

Continual Learning in Large Language Models: Methods, Challenges, and Opportunities

2026-03-13 · Hongyang Chen, Zhongwu Sun, Hongfei Ye, Kunchi Li, Xuemin Lin

Research Track A · General AI

Continual learning (CL) has emerged as a pivotal paradigm to enable large language models (LLMs) to dynamically adapt to evolving knowledge and sequential tasks while mitigating catastrophic forgetting-a critical limitation of the static pre-training paradigm inherent to modern LLMs. This survey presents a comprehensiv…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 22.5

Temporal Memory for Resource-Constrained Agents: Continual Learning via Stochastic Compress-Add-Smooth

2026-03-31 · Michael Chertkov

Research Track A · General AI

An agent that operates sequentially must incorporate new experience without forgetting old experience, under a fixed memory budget. We propose a framework in which memory is not a parameter vector but a stochastic process: a Bridge Diffusion on a replay interval $[0,1]$, whose terminal marginal encodes the present and …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 22.5

Memory Intelligence Agent

2026-04-06 · Jingyang Qiao, Weicheng Meng, Yu Cheng, Zhihang Lin, Zhizhong Zhang, Xin Tan, Jingyu Gong, Kun Shao, Yuan Xie

General AI

Deep research agents (DRAs) integrate LLM reasoning with external tools. Memory systems enable DRAs to leverage historical experiences, which are essential for efficient reasoning and autonomous evolution. Existing methods rely on retrieving similar trajectories from memory to aid reasoning, while suffering from key li…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 22.5

Information as Structural Alignment: A Dynamical Theory of Continual Learning

2026-04-08 · Radu Negulescu

Research Track A · General AI

Catastrophic forgetting is not an engineering failure. It is a mathematical consequence of storing knowledge as global parameter superposition. Existing methods, such as regularization, replay, and frozen subnetworks, add external mechanisms to a shared-parameter substrate. None derives retention from the learning dyna…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 22.5

BID-LoRA: A Parameter-Efficient Framework for Continual Learning and Unlearning

2026-04-14 · Jagadeesh Rachapudi, Ritali Vatsi, Praful Hambarde, Amit Shukla

Research Track A · General AI

Recent advances in deep learning underscore the need for systems that can not only acquire new knowledge through Continual Learning (CL) but also remove outdated, sensitive, or private information through Machine Unlearning (MU). However, while CL methods are well-developed, MU techniques remain in early stages, creati…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 22.5

TSN-Affinity: Similarity-Driven Parameter Reuse for Continual Offline Reinforcement Learning

2026-04-28 · Dominik Żurek, Kamil Faber, Marcin Pietron, Paweł Gajewski, Roberto Corizzo

Research Track A · General AI

Continual offline reinforcement learning (CORL) aims to learn a sequence of tasks from datasets collected over time while preserving performance on previously learned tasks. This setting corresponds to domains where new tasks arise over time, but adapting the model in live environment interactions is expensive, risky, …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 22.5

Beyond Domains: Reusing Web Skills via Transferable Interaction Patterns

2026-06-16 · Shiqi He, Yue Cui, Feijie Wu, Xinyu Ma, Jiaheng Lu, Yaliang Li, Bolin Ding, Mosharaf Chowdhury

Research Track B · General AI

Large language model (LLM) web agents are usually deployed as tool callers: each turn, the model reads a fresh page observation and emits one structured tool action. When every action is a low-level primitive, horizons grow quickly and so do policy-facing LLM completions, dominating latency and cost on benchmarks such …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 22.4

Empowering GUI Agents via Autonomous Experience Exploration and Hindsight Experience Utilization for Task Planning

2026-06-25 · Tianyi Men, Zhuoran Jin, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao

Research Track B · General AI

Multimodal web agents can assist humans in operating repetitive GUI tasks, where effective task planning is essential for decomposing complex tasks into executable actions. While small open source MLLMs are cost efficient and privacy preserving compared with commercial large models, they suffer from weak planning and l…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 22.4

When Search Agents Should Ask: DiscoBench for Clarification-Aware Deep Search

2026-06-26 · Yiling Tao, Shihan Deng, Meiling Tao, Pengzhi Wei, Zhichao Hu, Zhihao Zhu

General AI

Search agents powered by large language models (LLMs) are increasingly used to solve complex information-seeking tasks, requiring multi-step retrieval and reasoning to fulfill user goals. However, existing benchmarks often assume that user queries are complete and explicit, overlooking the fact that real-world search r…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 22.3

A Subgoal-driven Framework for Improving Long-Horizon LLM Agents

2026-03-20 · Taiyi Wang, Sian Gooding, Florian Hartmann, Oriana Riva, Edward Grefenstette

Research Track B · General AI

Large language model (LLM)-based agents have emerged as powerful autonomous controllers for digital environments, including mobile interfaces, operating systems, and web browsers. Web navigation, for example, requires handling dynamic content and long sequences of actions, making it particularly challenging. Existing L…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 22.3

Multi-Agent Video Recommenders: Evolution, Patterns, and Open Challenges

2026-04-02 · Srivaths Ranganathan, Abhishek Dharmaratnakar, Anushree Sinha, Debanshu Das

General AI

Video recommender systems are among the most popular and impactful applications of AI, shaping content consumption and influencing culture for billions of users. Traditional single-model recommenders, which optimize static engagement metrics, are increasingly limited in addressing the dynamic requirements of modern pla…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 22.3

OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks

2026-04-09 · Wenbo Hu, Xin Chen, Yan Gao-Tian, Yihe Deng, Nanyun Peng, Kai-Wei Chang

General AI

Group Relative Policy Optimization (GRPO) has emerged as the de facto Reinforcement Learning (RL) objective driving recent advancements in Multimodal Large Language Models. However, extending this success to open-source multimodal generalist models remains heavily constrained by two primary challenges: the extreme vari…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 22.3

SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models

2026-04-21 · Josue Torres-Fonseca, Naihao Deng, Yinpei Dai, Shane Storks, Yichi Zhang, Rada Mihalcea, Casey Kennington, Joyce Chai

General AI

Multimodal Large Language Models are increasingly adopted as autonomous agents in interactive environments, yet their ability to proactively address safety hazards remains insufficient. We introduce SafetyALFRED, built upon the embodied agent benchmark ALFRED, augmented with six categories of real-world kitchen hazards…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 22.3

K-MetBench: A Multi-Dimensional Benchmark for Fine-Grained Evaluation of Expert Reasoning, Locality, and Multimodality in Meteorology

2026-04-27 · Soyeon Kim, Cheongwoong Kang, Myeongjin Lee, Eun-Chul Chang, Jaedeok Lee, Jaesik Choi

General AI

The development of practical (multimodal) large language model assistants for Korean weather forecasters is hindered by the absence of a multidimensional, expert-level evaluation framework grounded in authoritative sources. To address this, we introduce K-MetBench, a diagnostic benchmark grounded in national qualificat…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 22.3

Echo-α: Large Agentic Multimodal Reasoning Model for Ultrasound Interpretation

2026-04-30 · Jing Zhang, Wentao Jiang, Tao Huang, Zhiwei Wang, Jianxin Liu, Jian Chen, Ping Ye, Gang Wang, Zengmao Wang, Bo Du, Dacheng Tao

General AI

Ultrasound interpretation requires both precise lesion localization and holistic clinical reasoning, yet existing methods typically excel at only one of these capabilities: specialized detectors offer strong localization but limited reasoning, whereas multimodal large language models (MLLMs) provide flexible reasoning …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 22.3

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents

2026-06-01 · Rui Yang, Qianhui Wu, Yuxi Chen, Hao Bai, Wenlin Yao, Hao Cheng, Baolin Peng, Huan Zhang, Tong Zhang, Jianfeng Gao

Research Track B · General AI

Building capable visual web agents requires long-horizon reasoning, precise grounding, and robust interaction with dynamic real-world websites. Despite rapid progress, the strongest systems remain largely proprietary, while open agents still depend heavily on supervised post-training over large collections of curated w…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 22.3

Learning While Acting: A Skill-Enhanced Test-Time Co-Evolution Framework for Online Lifelong Learning Agents

2026-06-03 · Bo Mao, Jie Zhou, Yutao Yang, Xin Li, Xian Wei, Qin Chen, Xingjiao Wu, Liang He

Research Track A · General AI

Lifelong learning is essential for Large Language Model (LLM) agents operating in dynamic, interactive environments. However, existing lifelong learning agents for long-horizon tasks typically depend on discrete skill or past experiences retrieval with static parameters during inference, which prevents them from contin…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 22.3

MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning

2026-06-10 · Shang Ma, Jisheng Dang, Wencan Zhang, Yifan Zhang, Bimei Wang, Hong Peng, Bin Hu, Qi Tian, Tat-Seng Chua

General AI

We propose a multi-agent collaborative framework built upon a lightweight Multimodal Large Language Model (MLLM), specifically designed for social intelligence reasoning. A key feature of our approach is that both the training and inference phases are augmented via knowledge distillation. Within this architecture, mult…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 22.3

Hidden Forgetting in Continual Multimodal Learning: When Accuracy Survives but Grounding Fails

2026-07-02 · Qianyu Chen, Canran Xiao, Runxuan Tang

Research Track A · General AI

Multimodal large language models must continually adapt to evolving tasks and domains, yet standard continual learning metrics mainly measure whether old answers remain correct, leaving the stability of multimodal grounding largely unexamined. We study this overlooked failure mode and ask whether a continually adapted …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 22.2

CineCap: Structured Reasoning with Spatio-Temporal Anchors for Cinematographic Video Captioning

2026-06-23 · Xinyu Mao, Yuhui Zeng, Xiaokun Liu, Wenyu Qin, Meng Wang, Xin Tao, Pengfei Wan, Xiaohan Xing, Max Meng

General AI

Cinematographic captioning aims to describe how a video is filmed using professional film-language concepts such as camera movement, shot size, depth of field, composition, and shooting angle. This capability is important for fine-grained video understanding and controllable movie-quality video generation, yet remains …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 22.2

TRUSTMEM: Learning Trustworthy Memory Consolidation for LLM Agents with Long-Term Memory

2026-06-23 · Tianyu Yang, Sudipta Paul, Vijay Srinivasan, Vivek Kulkarni, Srinivas Chappidi

Research Track A · General AI

Large language model (LLM) agents rely on long-term memory to support extended interactions and personalized assistance beyond finite context windows. Existing memory agents actively update external memory through generated write, revise, and delete operations, but these updates may omit important information, corrupt …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 22.0

Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents

2026-05-11 · Shijue Huang, Hangyu Guo, Chenxin Li, Junting Lu, Xinyu Geng, Zhaochen Su, Zhenyu Li, Shuang Chen, Hongru Wang, Yi R. Fung

General AI

Multimodal deep search requires an agent to solve open-world problems by chaining search, tool use, and visual reasoning over evolving textual and visual context. Two bottlenecks limit current systems. First, existing tool-use harnesses treat images returned by search, browsing, or transformation as transient outputs, …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 22.0

Learning, Fast and Slow: Towards LLMs That Adapt Continually

2026-05-12 · Rishabh Tiwari, Kusha Sareen, Lakshya A Agrawal, Joseph E. Gonzalez, Matei Zaharia, Kurt Keutzer, Inderjit S Dhillon, Rishabh Agarwal, Devvrit Khatri

Research Track A · General AI

Large language models (LLMs) are trained for downstream tasks by updating their parameters (e.g., via RL). However, updating parameters forces them to absorb task-specific information, which can result in catastrophic forgetting and loss of plasticity. In contrast, in-context learning with fixed LLM parameters can chea…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 22.0

VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis

2026-05-21 · Jinho Park, Youbin Kim, Hogun Park, Eunbyung Park

General AI

Spatio-temporal reasoning is a core capability for Multimodal Large Language Models (MLLMs) operating in the real world. As such, evaluating it precisely has become an essential challenge. However, existing spatio-temporal reasoning benchmark datasets primarily rely on static image sets or passively curated video data,…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 22.0

Janus-LoRA: A Balanced Low-Rank Adaptation for Continual Learning

2026-05-27 · Cheng Chen, Pengpeng Zeng, Yuyu Guo, Lianli Gao, Hengtao Shen, Jingkuan Song

Research Track A · General AI

Low-Rank Adaptation (LoRA) has emerged as a promising paradigm for Continual Learning. It independently updates its low-rank factors ($A$ and $B$), creating a composite update to the full weight matrix through their interaction. To prevent catastrophic forgetting, this update should remain orthogonal to the task-specif…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 22.0

Evaluating the Impact of Task Granularity on Catastrophic Forgetting in Continual Learning

2026-06-06 · Emre Alyamac, Himanshu Janmeda, Shashwat Krishna, Yash Vijay

Research Track A

Catastrophic forgetting, the abrupt loss of previously acquired knowledge upon learning new information, remains the central challenge in Continual Learning. This project investigates whether the order in which a model learns information affects how well it retains knowledge. Specifically, we ask: does learning general…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 21.5

PersonaVLM: Long-Term Personalized Multimodal LLMs

2026-03-20 · Chang Nie, Chaoyou Fu, Yifan Zhang, Haihua Yang, Caifeng Shan

General AI

Multimodal Large Language Models (MLLMs) serve as daily assistants for millions. However, their ability to generate responses aligned with individual preferences remains limited. Prior approaches enable only static, single-turn personalization through input augmentation or output alignment, and thus fail to capture use…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 21.5

Improving Sparse Memory Finetuning

2026-04-06 · Satyam Goyal, Anirudh Kanchi, Garv Shah, Prakhar Gupta

Research Track A · General AI

Large Language Models (LLMs) are typically static after training, yet real-world applications require continual adaptation to new knowledge without degrading existing capabilities. Standard approaches to updating models, like full finetuning or parameter-efficient methods (e.g., LoRA), face a fundamental trade-off: cat…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 21.5

WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks

2026-04-07 · Guruprasad Viswanathan Ramesh, Asmit Nayak, Basieem Siddique, Kassem Fawaz

Research Track B · General AI

Web agents automate browser tasks, ranging from simple form completion to complex workflows like ordering groceries. While current benchmarks evaluate general-purpose performance~(e.g., WebArena) or safety against malicious actions~(e.g., SafeArena), no existing framework assesses an agent's ability to successfully exe…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 21.5

WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models

2026-04-20 · Xinping Lei, Xinyu Che, Junqi Xiong, Chenchen Zhang, Yukai Huang, Chenyu Zhou, Haoyang Huang, Minghao Liu, Letian Zhu, Hongyi Ye, Jinhua Hao, Ken Deng, Zizheng Zhan, Han Li, Dailin Li, Yifan Yao, Ming Sun, Zhaoxiang Zhang, Jiaheng Liu

General AI

Large language models are rapidly evolving into interactive coding agents capable of end-to-end web coding, yet existing benchmarks evaluate only narrow slices of this capability, typically text-conditioned generation with static-correctness metrics, leaving visual fidelity, interaction quality, and codebase-level reas…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 21.5

Balancing Stability and Plasticity in Sequentially Trained Early-Exiting Neural Networks

2026-05-06 · Alaa Zniber, Ouassim Karrakchou, Mounir Ghogho

Research Track A · General AI

Early-exiting neural networks enable adaptive inference by allowing inputs to exit at intermediate classifiers, reducing computation for easy samples while maintaining high accuracy. In practice, exits can be trained sequentially by incrementally adding them to a shared backbone; however, this sequential training can c…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 21.5

Online Continual Learning with Dynamic Label Hierarchies

2026-05-12 · Xinrui Wang, Shao-Yuan Li, Bartłomiej Twardowski, Alexandra Gomez-Villa, Songcan Chen

Research Track A · General AI

Online Continual Learning (OCL) aims to learn from endless non\text{-}stationary data streams, yet most existing methods assume a flat label space and overlook the hierarchical organization of real\text{-}world concepts that evolves both horizontally (sibling classes) and vertically (coarse or fine categories). To bett…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 21.5

Stop Marginalizing My Dreams: Model Inversion via Laplace Kernel for Continual Learning

2026-05-12 · Patryk Krukowski, Jacek Tabor, Przemysław Spurek, Marek Śmieja, Łukasz Struski

Research Track A · General AI

Data-free continual learning (DFCIL) relies on model inversion to synthesize pseudo-samples and mitigate catastrophic forgetting. However, existing inversion methods are fundamentally limited by a simplifying assumption: they model feature distributions using diagonal covariance, effectively ignoring correlations that …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 21.5

SeqLoRA: Bilevel Orthogonal Adaptation for Continual Multi-Concept Generation

2026-05-21 · Javad Parsa, Enis Simsar, Amir Joudaki, Thomas Hofmann, André M. H. Teixeira

Research Track A · General AI

Parameter-efficient fine-tuning enables fast personalization of text-to-image diffusion models, but composing multiple custom concepts remains challenging due to representation interference. Existing modular methods either rely on expensive post-hoc fusion or freeze adaptation subspaces, which limit expressiveness and …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 21.5

Parameter-Efficient Continual Learning for Automatic Speech Recognition

2026-06-08 · Steven Vander Eeckt, Hugo Van hamme

Research Track A

Speech foundation models enable strong general-purpose ASR and are attractive for downstream adaptation. However, their size and the catastrophic forgetting induced by sequential fine-tuning demand parameter-efficient and regularized training methods, motivating parameter-efficient continual learning (PECL). While PECL…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 21.5

The Forgetting-Retention Dilemma: Certified Unlearning Theory in Continual Learning

2026-06-29 · Yiting Hu, Lingjie Duan, Qian Zhang

Research Track A

Machine unlearning aims to eliminate the influence of specific data from trained models to safeguard privacy. However, this presents a significant challenge in the context of continual learning (CL), where models update sequentially on dynamic datasets. A major limitation is that current certified unlearning algorithms…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 21.4

Escaping the Self-Confirmation Trap: An Execute-Distill-Verify Paradigm for Agentic Experience Learning

2026-06-23 · Shiding Zhu, Yudi Qi, Yajie Wang, Jiaze Li, Chao Song, Yaorui Shi, Yibo Miao, Hanqi Gao, Kai Zhang

General AI

Experience-driven self-evolution is critical for large language model (LLM) agents to improve through open-world interaction. However, existing experience learning methods mostly rely on single-agent loops, where the same agent executes tasks, summarizes outcomes, and determines memory content. This setup makes agents …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 21.4

Qwen-AgentWorld: Language World Models for General Agents

2026-06-23 · Yuxin Zuo, Zikai Xiao, Li Sheng, Fei Huang, Jianhong Tu, Yuxuan Liu, Tianyi Tang, Xiaomeng Hu, Yang Su, Qingfeng Lan, Yantao Liu, Qin Zhu, Yinger Zhang, Bowen Yu, Haiquan Zhao, Haiyang Xu, Jianxin Yang, Jiayang Cheng, Junyang Wang, Lianghao Deng, Mingfeng Xue, Tianyi Bai, Yang Fan, Yubo Ma, Yucheng Li, Zeyu Cui, Zhihai Wang, Zhihui Xie, Zhuorui Ye, An Yang, Dayiheng Liu, Jingren Zhou, Ning Ding

General AI

A world model predicts environment dynamics based on current observations and actions, serving as a core cognitive mechanism for reasoning and planning. In this work, we investigate how world modeling based on language models can further push the boundaries of general agents. (i) We first focus on building foundation m…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 21.3

Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos

2026-03-23 · Shoubin Yu, Lei Shu, Antoine Yang, Yao Fu, Srinivas Sunkara, Maria Wang, Jindong Chen, Mohit Bansal, Boqing Gong

Research Track B · General AI

Multimodal AI agents are increasingly automating complex real-world workflows that involve online web execution. However, current web-agent benchmarks suffer from a critical limitation: they focus entirely on web-based interaction and perception, lacking grounding in the user's real-world physical surroundings. This li…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 21.3

Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models

2026-04-09 · Shilin Yan, Jintao Tong, Hongwei Xue, Xiaojun Tang, Yangyang Wang, Kunyu Shi, Guannan Zhang, Ruixuan Li, Yixiong Zou

General AI

The advent of agentic multimodal models has empowered systems to actively interact with external environments. However, current agents suffer from a profound meta-cognitive deficit: they struggle to arbitrate between leveraging internal knowledge and querying external utilities. Consequently, they frequently fall prey …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 21.3

V-tableR1: Process-Supervised Multimodal Table Reasoning with Critic-Guided Policy Optimization

2026-04-22 · Yubo Jiang, Yitong An, Xin Yang, Abudukelimu Wuerkaixi, Xuxin Cheng, Fengying Xie, Zhiguo Jiang, Cao Liu, Ke Zeng, Haopeng Zhang

General AI

We introduce V-tableR1, a process-supervised reinforcement learning framework that elicits rigorous, verifiable reasoning from multimodal large language models (MLLMs). Current MLLMs trained solely on final outcomes often treat visual reasoning as a black box, relying on superficial pattern matching rather than perform…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 21.3

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

2026-04-29 · GLM-V Team, :, Wenyi Hong, Xiaotao Gu, Ziyang Pan, Zhen Yang, Yuting Wang, Yue Wang, Yuanchang Yue, Yu Wang, Yanling Wang, Yan Wang, Xijun Liu, Wenmeng Yu, Weihan Wang, Wei Li, Shuaiqi Duan, Sheng Yang, Ruiliang Lv, Mingdao Liu, Lihang Pan, Ke Ning, Junhui Ji, Jinjiang Wang, Jing Chen, Jiazheng Xu, Jiale Zhu, Jiale Cheng, Ji Qi, Guobing Gan, Guo Wang, Cong Yao, Zijun Dou, Zihao Zhou, Zihan Wang, Zhiqi Ge, Zhijie Li, Zhenyu Hou, Zhao Xue, Zehui Wang, Zehai He, Yusen Liu, Yukuo Cen, Yuchen Li, Yuan Wang, Yijian Lu, Yanzi Wang, Yadong Xue, Xinyu Zhang, Xinyu Liu, Wenkai Li, Tianyu Tong, Tianshu Zhang, Shengdong Yan, Qinkai Zheng, Mingde Xu, Licheng Bao, Jiaxing Xu, Jiaxin Fan, Jiawen Qian, Jiali Chen, Jiahui Lin, Haozhi Zheng, Haoran Wang, Haochen Li, Fan Yang, Dan Zhang, Chuangxin Zhao, Chengcheng Wu, Boyan Shi, Bowei Jia, Baoxu Wang, Peng Zhang, Debing Liu, Bin Xu, Juanzi Li, Minlie Huang, Yuxiao Dong, Jie Tang, V Team

General AI

We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environments, agentic capability depends not only on language reasoning, but also on the ability to perceive, interpret, and act over heterogeneous contexts such as images, video…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 21.3

MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery

2026-06-04 · Shangheng Du, Xiangchao Yan, Jinxin Shi, Zongsheng Cao, Shiyang Feng, Zichen Liang, Boyuan Sun, Tianshuo Peng, Yifan Zhou, Xin Li, Jie Zhou, Liang He, Bo Zhang, Lei Bai

General AI

Large language model (LLM) agents are increasingly applied to long-horizon tasks such as scientific discovery and machine learning engineering (MLE), where sustained self-evolution becomes a key capability. However, existing MLE agents suffer from inter-branch information isolation, memoryless search, and lack of hiera…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 21.3

IS-CoT: Breaking the Long-form Generation Collapse via Interleaved Structural Thinking

2026-06-08 · Zechen Sun, Yuyang Sun, Zecheng Tang, Juntao Li, Wenpeng Hu, Wenliang Chen, Zhunchen Luo, Guotong Geng, Min Zhang

General AI

Generating coherent and controllable long-form content remains a persistent challenge for Large Language Models (LLMs). While reasoning-enhanced models have demonstrated success in logic-intensive domains, our evaluation reveals that they suffer from a severe length collapse in open-ended writing, where performance deg…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 21.3

MemVenom: Triggered Poisoning of Multimodal Memories in Web Agents

2026-06-09 · Yv Zhang, Hao Sun, Hao Fang, Kuofeng Gao, Fan Mo, Bin Chen, Shu-Tao Xia, Yaowei Wang

Research Track B · General AI

External memory has become a core component of modern web agents, enabling long-horizon reasoning through the retrieval of past experiences. However, this paradigm introduces a critical vulnerability: malicious content injected into memory can be persistently recalled and repeatedly influence agent behavior. In this wo…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 21.3

TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

2026-06-09 · Heming Zou, Qi Wang, Yun Qu, Yuhang Jiang, Lizhou Cai, Yixiu Mao, Ru Peng, Xin Xu, Weijie Liu, Kai Yang, Saiyong Yang, Xiangyang Ji

General AI

Reinforcement learning with verifiable rewards (RLVR) is a promising approach for enhancing reasoning and agentic behavior in large language models. However, rollout-intensive policy optimization is often limited by insufficient reward contrast, arising when overly simple or complex prompts generate low-variance feedba…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 21.3

ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages

2026-06-11 · Tanmoy Kanti Halder, Akash Ghosh, Subhadip Baidya, Arijit Roy, Sriparna Saha

General AI

Multimodal Large Language Models (MLLMs) have shown promising reasoning capabilities in general domains, yet their performance remains limited in specialized settings such as healthcare, especially in multilingual and low-resource scenarios. This gap is critical in regions like rural India, where patients often express…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 21.3

MiniMax Sparse Attention

2026-06-11 · Xunhao Lai, Weiqi Xu, Yufeng Yang, Qiaorui Chen, Yang Xu, Lunbin Zeng, Xiaolong Li, Haohai Sun, Haichao Zhu, Vito Zhang, Pengyu Zhao

General AI

Ultra-long-context capability is becoming indispensable for frontier LLMs: agentic workflows, repository-scale code reasoning, and persistent memory all require the model to jointly attend over hundreds of thousands to millions of tokens, yet the quadratic cost of softmax attention makes this untenable at deployment sc…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 21.3

Reward Modeling for Multi-Agent Orchestration

2026-06-11 · King Yeung Tsang, Zihao Zhao, Vishal Venkataramani, Haizhou Shi, Zixuan Ke, Semih Yavuz, Shafiq Joty, Hao Wang

General AI

Multi-Agent Systems (MAS) built on Large Language Models (LLMs) require effective orchestration to coordinate specialized agents, yet training such orchestrators is hindered by limited supervision and high computational cost. We propose Orchestration Reward Modeling (OrchRM), a self-supervised framework for evaluating …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 21.3

Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games

2026-06-17 · Shengyuan Ding, Xilin Wei, Xinyu Fang, Haodong Duan, Dahua Lin, Jiaqi Wang, Yuhang Zang

Research Track A · General AI

Deploying multimodal foundation models as closed-loop policies increasingly requires conditioning actions on observations that are no longer visible. However, existing benchmarks either expose the full state, conflate hidden-state reconstruction with other agent skills, or test recall only after an episode has ended. W…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 21.2

Are We Ready For An Agent-Native Memory System?

2026-06-23 · Wei Zhou, Xuanhe Zhou, Shaokun Han, Hongming Xu, Guoliang Li, Zhiyu Li, Feiyu Xiong, Fan Wu

Research Track A · General AI

Memory for large language model (LLM) agents has rapidly evolved from simple retrieval-augmented mechanisms into a data management system that supports persistent information storage, retrieval, update, consolidation, and dynamic lifecycle governance throughout agent execution. Despite this evolution, existing evaluati…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 21.2

Same Evidence, Different Answer: Auditing Order Sensitivity in Multimodal Large Language Models

2026-06-24 · Akshay Paruchuri, Sanmi Koyejo, Ehsan Adeli

General AI

Standard benchmarks for multimodal large language models (MLLMs) score each item on one canonical ordering and miss whether order-irrelevant shuffling changes the answer, a baseline reliability property called for by emerging AI evaluation guidelines. We introduce Facet-Probe, a five-facet audit (option, evidence-chunk…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 21.0

COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling

2026-04-22 · Noah Flynn

Research Track A · General AI

Large language models (LLMs) often exhibit performance disparities across languages, with naive multilingual fine-tuning frequently degrading performance due to negative cross-lingual interference. To address this, we introduce COMPASS (COntinual Multilingual PEFT with Adaptive Semantic Sampling), a novel data-centric …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 21.0

Audio-Visual Intelligence in Large Foundation Models

2026-05-05 · You Qin, Kai Liu, Shengqiong Wu, Kai Wang, Shijian Deng, Yapeng Tian, Junbin Xiao, Yazhou Xing, Yinghao Ma, Bobo Li, Roger Zimmermann, Lei Cui, Furu Wei, Jiebo Luo, Hao Fei

General AI

Audio-Visual Intelligence (AVI) has emerged as a central frontier in artificial intelligence, bridging auditory and visual modalities to enable machines that can perceive, generate, and interact in the multimodal real world. In the era of large foundation models, joint modeling of audio and vision has become increasing…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 21.0

Learning What to Remember: A Cognitively Grounded Multi-Factor Value Model for Agentic Memory

2026-06-11 · Zhibao Chen, Qian Cheng

Research Track A · General AI

Long-running LLM agents accumulate interaction histories far larger than any context window, forcing a standing decision: what to encode deeply, what to forget, and what to retrieve under a fixed memory budget. Production systems answer with semantic similarity or recency -- both mis-specified for the forgetting decisi…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 21.0

RL-Index: Reinforcement Learning for Retrieval Index Reasoning

2026-06-15 · Yongjia Lei, Nedim Lipka, Zhisheng Qi, Utkarsh Sahu, Koustava Goswami, Franck Dernoncourt, Ryan A. Rossi, Yu Wang

General AI

Retrieving external knowledge is essential for solving real-world tasks, yet it remains challenging when the relationship between a query and its relevant knowledge involves implicit and complex reasoning beyond surface-level semantic or lexical matching (e.g., mathematical problems relying on the same theorem or codin…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 21.0

Breaking Failure Cascades: Step-Aware Reinforcement Learning for Medical Multimodal Reasoning

2026-06-30 · Junha Jung, Minbyul Jeong, Suhyeon Lim, Sungwook Jung, Jaehoon Yun, Taeyun Roh, Mujeen Sung, Jaewoo Kang

General AI

Recent multimodal large language models have shown great promise in clinical image reasoning, but existing post-training pipelines remain predominantly outcome-centric, relying on final answer correctness or sequence-level preferences. This suffers from sparse credit assignment, making it difficult to optimize the reas…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.8

MedHorizon: Towards Long-context Medical Video Understanding in the Wild

2026-05-07 · Bodong Du, Bowen Liu, Yang Yu, Xinpeng Ding, Zhiheng Wu, Shuning Wang, Shuo Nie, Naiming Liu, Qifeng Chen, Yangqiu Song, Xiaomeng Li

General AI

Medical multimodal large language models (MLLMs) have advanced image understanding and short-video analysis, but real clinical review often requires full-procedure video understanding. Unlike general long videos, medical procedures contain highly redundant anatomical views, while decisive evidence is temporally sparse,…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.8

MEME: Multi-entity & Evolving Memory Evaluation

2026-05-12 · Seokwon Jung, Alexander Rubinstein, Arnas Uselis, Sangdoo Yun, Seong Joon Oh

General AI

LLM-based agents increasingly operate in persistent environments where they must store, update, and reason over information across many sessions. While prior benchmarks evaluate only single-entity updates, MEME defines six tasks spanning the full space defined by the multi-entity and evolving axes, including three not …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.8

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

2026-05-29 · Nianyi Lin, Jiajie Zhang, Lei Hou, Juanzi Li

General AI

Long-context reasoning remains a central challenge for large language models, which often fail to locate and integrate key information in extensive distracting content. Reinforcement learning with verifiable rewards (RLVR) has shown promise for this task, yet existing methods are limited by low-confusability distractor…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.8

SVI-Bench: A Dynamic Microworld for Strategic Video Intelligence

2026-05-29 · Yulu Pan, Han Yi, Seongsu Ha, Md Mohaiminul Islam, Benjamin Zhang, Lorenzo Torresani, Gedas Bertasius

General AI

True video intelligence demands more than recognizing what is visible: it requires reasoning about why events unfold, predicting what would change under different conditions, and deciding what to do next. We refer to this progression, from perception through causal reasoning and simulation to strategic planning, as Str…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.6

Visually Grounded Self-Reflection for Vision-Language Models via Reinforcement Learning

2026-07-02 · Liyan Tang, Fangcong Yin, Greg Durrett

General AI

Large vision-language models can reason over multimodal inputs by generating textual chains of thought (CoT). A key capability exhibited in CoT reasoning is self-reflection: revisiting earlier decisions and correcting previous errors. However, existing LVLMs often fail to properly attend to visual inputs during reflect…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 20.5

FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol

2026-03-26 · Jie Zhu, Yimin Tian, Boyang Li, Kehao Wu, Zhongzhi Liang, Junhui Li, Xianyin Zhang, Lifan Guo, Feng Chen, Yong Liu, Chi Zhang

General AI

This paper introduces FinMCP-Bench, a novel benchmark for evaluating large language models (LLMs) in solving real-world financial problems through tool invocation of financial model context protocols. FinMCP-Bench contains 613 samples spanning 10 main scenarios and 33 sub-scenarios, featuring both real and synthetic us…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 20.5

Learning to Retrieve from Agent Trajectories

2026-03-30 · Yuqi Zhou, Sunhao Dai, Changle Qu, Liang Pang, Jun Xu, Ji-Rong Wen

General AI

Information retrieval (IR) systems have traditionally been designed and trained for human users, with learning-to-rank methods relying heavily on large-scale human interaction logs such as clicks and dwell time. With the rapid emergence of large language model (LLM) powered search agents, however, retrieval is increasi…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 20.5

Marco DeepResearch: Unlocking Efficient Deep Research Agents via Verification-Centric Design

2026-03-30 · Bin Zhu, Qianghuai Jia, Tian Lan, Junyang Ren, Feng Gu, Feihu Jiang, Longyue Wang, Zhao Xu, Weihua Luo

General AI

Deep research agents autonomously conduct open-ended investigations, integrating complex information retrieval with multi-step reasoning across diverse sources to solve real-world problems. To sustain this capability on long-horizon tasks, reliable verification is critical during both training and inference. A major bo…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 20.5

Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search

2026-04-09 · Chuzhan Hao, Wenfeng Feng, Guochao Jiang, Guofeng Quan, Guohua Liu, Yuewei Zhang

General AI

Reinforcement learning (RL) has become an effective approach for advancing the reasoning capabilities of large language models (LLMs) through the strategic integration of external search engines. However, current RL-based search agents often rely on a process of stochastic exploration guided by carefully crafted outcom…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 20.5

OmniScript: Towards Audio-Visual Script Generation for Long-Form Cinematic Video

2026-04-13 · Junfu Pu, Yuxin Chen, Teng Wang, Ying Shan

General AI

Current multimodal large language models (MLLMs) have demonstrated remarkable capabilities in short-form video understanding, yet translating long-form cinematic videos into detailed, temporally grounded scripts remains a significant challenge. This paper introduces the novel video-to-script (V2S) task, aiming to gener…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 20.5

EasyVideoR1: Easier RL for Video Understanding

2026-04-18 · Chuanyu Qin, Chenxu Yang, Qingyi Si, Naibin Gu, Dingyu Yao, Zheng Lin, Peng Fu, Nan Duan, Jiaqi Wang

General AI

Reinforcement learning from verifiable rewards (RLVR) has demonstrated remarkable effectiveness in improving the reasoning capabilities of large language models. As models evolve into natively multimodal architectures, extending RLVR to video understanding becomes increasingly important yet remains largely unexplored, …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 20.5

Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks

2026-04-22 · Xiyang Wu, Zongxia Li, Guangyao Shi, Alexander Duffy, Tyler Marques, Matthew Lyle Olson, Tianyi Zhou, Dinesh Manocha

General AI

Long horizon interactive environments are a testbed for evaluating agents skill usage abilities. These environments demand multi step reasoning, the chaining of multiple skills over many timesteps, and robust decision making under delayed rewards and partial observability. Games are a good testbed for evaluating agent …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.5

Cortex-Inspired Continual Learning: Unsupervised Instantiation and Recovery of Functional Task Networks

2026-04-27 · Kevin McKee, Thomas Hazy, Yicong Zheng, Zacharie Bugaud, Thomas Miconi

Research Track A · General AI

Block-sequential continual learning demands that a single model both protect prior solutions from catastrophic forgetting and efficiently infer at inference time which prior solution matches the current input without task labels. We present Functional Task Networks (FTN), a parameter-isolation method inspired by struct…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.5

Region4Web: Rethinking Observation Space Granularity for Web Agents

2026-05-08 · Donguk Kwon, Dongha Lee

Research Track B · General AI

Web agents perceive web pages through an observation space, yet its granularity has remained an underexamined design choice. Existing work treats observation at the same element-level granularity as the action space, leaving the page's functional organization implicit and forcing the agent to infer it from element-leve…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.5

Web Agents Should Adopt the Plan-Then-Execute Paradigm

2026-05-14 · Julien Piet, Annabella Chow, Yiwei Hou, Muxi Lyu, Sylvie Venuto, Jinhao Zhu, Raluca Ada Popa, David Wagner

Research Track B · General AI

ReAct has become the default architecture across LLM agents, and many existing web agents follow this paradigm. We argue that it is the wrong default for web agents. Instead, web agents should default to plan-then-execute: commit to a task-specific program before observing runtime web content, then execute it. The reas…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.5

Understanding Generalization and Forgetting in In-Context Continual Learning

2026-05-27 · Guangyu Li, Meng Ding, Lijie Hu

Research Track A · General AI

In-context learning (ICL) derives its power from enabling Large Language Models to adapt to new tasks via prompt-based reasoning alone, entirely bypassing the need for parameter updates. Existing theories primarily study ICL in single-task settings, while real-world prompts often contain sequences of heterogeneous task…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 20.5

DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning

2026-06-05 · Lingyong Yan, Can Xu, Yukun Zhao, Wenxuan Li, Qingyang Chen, Jiulong Wu, Wenli Song, Xiangnan Li, Weixian Shi, Yiqun Chen, Xuchen Ma, Yuchen Li, Jiashu Zhao, Shuaiqiang Wang, Jianmin Wu, Dawei Yin

General AI

Deep Research (DR) has emerged as a new agentic paradigm to tackle complex, open-ended research tasks, demanding systems that can iteratively frame problems, acquire evidence, verify sources, and synthesize long-form reports. In practice, however, current DR systems are constrained by four interrelated limitations: lon…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.5

Speculative Rollback Correction for Quality-Diverse Web Agent Imitation

2026-06-10 · Longkun Hao, Hongyu Lin, Hao Li, Zhichao Yang, Haojie Hao, Dongshuo Huang, Haitao Yang, Hongyu Ge, Ming jie Xie, Yanjun Wu, Zi Hao Yin, Yan Bai, Yihang Lou

Research Track B · General AI

Training interactive web agents through imitation learning from expert trajectories has emerged as a highly effective approach. However, determining the optimal timing for expert intervention presents a critical challenge in this context. Delayed intervention often leads to the accumulation of early-stage errors, pushi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.3

CL-VISTA: Benchmarking Continual Learning in Video Large Language Models

2026-04-01 · Haiyang Guo, Yichen Shi, Fei Zhu, Wenzhuo Liu, Hongbo Zhao, Fanhu Zeng, Shijie Ma, Da-Han Wang, Xu-Yao Zhang

Research Track A · General AI

Video Large Language Models (Video-LLMs) require continual learning to adapt to non-stationary real-world data. However, existing benchmarks fall short of evaluating modern foundation models: many still rely on models without large-scale pre-training, and prevailing benchmarks typically partition a single dataset into …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.3

MA-IDS: Multi-Agent RAG Framework for IoT Network Intrusion Detection with an Experience Library

2026-04-07 · Md Shamimul Islam, Luis G. Jaimes, Ayesha S. Dina

Research Track A · General AI

Network Intrusion Detection Systems (NIDS) face important limitations. Signature-based methods are effective for known attack patterns, but they struggle to detect zero-day attacks and often miss modified variants of previously known attacks, while many machine learning approaches offer limited interpretability. These …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.3

Collaborative Multi-Agent Scripts Generation for Enhancing Imperfect-Information Reasoning in Murder Mystery Games

2026-04-13 · Keyang Zhong, Junlin Xie, Hefeng Wu, Haofeng Li, Guanbin Li

General AI

Vision-language models (VLMs) have shown impressive capabilities in perceptual tasks, yet they degrade in complex multi-hop reasoning under multiplayer game settings with imperfect and deceptive information. In this paper, we study a representative multiplayer task, Murder Mystery Games, which require inferring hidden …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.3

GAM: Hierarchical Graph-based Agentic Memory for LLM Agents

2026-04-14 · Zhaofen Wu, Hanrong Zhang, Fulin Lin, Wujiang Xu, Xinran Xu, Yankai Chen, Henry Peng Zou, Shaowen Chen, Weizhi Zhang, Xue Liu, Philip S. Yu, Hongwei Wang

Research Track A · General AI

To sustain coherent long-term interactions, Large Language Model (LLM) agents must navigate the tension between acquiring new information and retaining prior knowledge. Current unified stream-based memory systems facilitate context updates but remain vulnerable to interference from transient noise. Conversely, discrete…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.3

MASS-RAG: Multi-Agent Synthesis Retrieval-Augmented Generation

2026-04-20 · Xingchen Xiao, Heyan Huang, Runheng Liu, Jincheng Xie

General AI

Large language models (LLMs) are widely used in retrieval-augmented generation (RAG) to incorporate external knowledge at inference time. However, when retrieved contexts are noisy, incomplete, or heterogeneous, a single generation process often struggles to reconcile evidence effectively. We propose \textbf{MASS-RAG},…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.3

SpaMEM: Benchmarking Dynamic Spatial Reasoning via Perception-Memory Integration in Embodied Environments

2026-04-24 · Chih-Ting Liao, Xi Xiao, Chunlei Meng, Zhangquan Chen, Yitong Qiao, Weilin Zhou, Tianyang Wang, Xu Zheng, Xin Cao

General AI

Multimodal large language models (MLLMs) have advanced static visual--spatial reasoning, yet they often fail to preserve long-horizon spatial coherence in embodied settings where beliefs must be continuously revised from egocentric observations under environmental change. We introduce SpaMEM (Spatial Memory from Action…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.3

DenseStep2M: A Scalable, Training-Free Pipeline for Dense Instructional Video Annotation

2026-04-29 · Mingji Ge, Qirui Chen, Zeqian Li, Weidi Xie

General AI

Long-term video understanding requires interpreting complex temporal events and reasoning over procedural activities. While instructional video corpora, like HowTo100M, offer rich resources for model training, they present significant challenges, including noisy ASR transcripts and inconsistent temporal alignments betw…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.3

Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads

2026-06-04 · Yasmine Omri, Ziyu Gan, Zachary Broveak, Robin Geens, Zexue He, Alex Pentland, Marian Verhelst, Tsachy Weissman, Thierry Tambe

General AI

LLM agents are increasingly deployed on long-horizon tasks requiring sustained reasoning over extended interaction histories. Realizing this at scale requires agents to persistently store, retrieve, and update their own memory across sessions. A rich ecosystem of agent memory systems has emerged spanning flat retrieval…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.3

Edit-R2: Context-Aware Reinforcement Learning for Multi-Turn Image Editing

2026-06-04 · Yuxiao Ye, Haoran He, Fangyuan Kong, Xintao Wang, Pengfei Wan, Kun Gai, Ling Pan

General AI

Text-guided image editing has advanced rapidly with diffusion models and unified multimodal foundation models. However, most existing methods remain confined to single-turn settings, overlooking the more realistic scenario of multi-turn in-context editing, where users iteratively refine an image through a sequence of i…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.3

LUNA-AD: Lightweight Uncertainty-Aware Language Model with Lifelong Learning for Autonomous Driving

2026-06-07 · Ruoyu Yao, Pei Liu, Ruiguo Zhong, Mingxing Peng, Rui Yang, Jun Ma

Research Track A · General AI

While large language models (LLMs) offer promising reasoning capabilities, their integration into safety-critical driving systems is hindered by limited reasoning diversity, high computational overhead, and static learning paradigms. To address these challenges, we propose LUNA-AD, a lightweight uncertainty-aware langu…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.3

Gen-VCoT: Generative Visual Chain-of-Thought Reasoning via Diffusion-Based RGB Intermediate Representations

2026-06-15 · Zhiqiang Zhou, Junliang Dai, Xu ling

General AI

Multimodal large language models (MLLMs) excel at visual reasoning but rely on text-based chain-of-thought (CoT), lacking interpretable visual intermediates. Existing methods use opaque tokens or external tools, missing key properties. We propose Gen-VCoT, a framework using expert vision models to generate RGB images a…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.2

Agentic Collaborative Cognition for Zero-Shot 3D Understanding

2026-06-23 · Wenxin Wang, Bo Zhang, Feng Chen, Zixuan Wang, Wen Li, Changsheng Li, Yinjie Lei

General AI

Recent advancements have explored agentic zero-shot 3D understanding by reformulating it as video keyframe understanding with Multimodal Large Language Models (MLLMs). However, existing methods face an intrinsic bottleneck due to the finite observation perspectives inherent in videos and the implicit perception of 3D s…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.2

TriViewBench: Controlled Complexity Scaling for Multi-View Structural Reasoning in MLLMs

2026-06-24 · Yu-Yang Chen, Lan-Zhe Guo

General AI

Multimodal Large Language Models (MLLMs) demonstrate strong performance on standard visual question answering benchmarks, yet their scalability under controlled structural complexity remains poorly understood. We introduce TriViewBench, a controlled three-view visual reasoning benchmark constructed from synthetic 3D sc…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.0

AI Planning Framework for LLM-Based Web Agents

2026-03-13 · Orit Shahnovsky, Rotem Dror

Research Track B · General AI

Developing autonomous agents for web-based tasks is a core challenge in AI. While Large Language Model (LLM) agents can interpret complex user requests, they often operate as black boxes, making it difficult to diagnose why they fail or how they plan. This paper addresses this gap by formally treating web tasks as sequ…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.0

WebXSkill: Skill Learning for Autonomous Web Agents

2026-04-14 · Zhaoyang Wang, Qianhui Wu, Xuchao Zhang, Chaoyun Zhang, Wenlin Yao, Fazle Elahi Faisal, Baolin Peng, Si Qin, Suman Nath, Qingwei Lin, Chetan Bansal, Dongmei Zhang, Saravan Rajmohan, Jianfeng Gao, Huaxiu Yao

Research Track B · General AI

Autonomous web agents powered by large language models (LLMs) have shown promise in completing complex browser tasks, yet they still struggle with long-horizon workflows. A key bottleneck is the grounding gap in existing skill formulations: textual workflow skills provide natural language guidance but cannot be directl…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.0

Recovery Guarantees for Continual Learning of Dependent Tasks: Memory, Data-Dependent Regularization, and Data-Dependent Weights

2026-04-19 · Liangzu Peng, Uday Kiran Reddy Tadipatri, Ziqing Xu, Eric Eaton, René Vidal

Research Track A · General AI

Continual learning (CL) is concerned with learning multiple tasks sequentially without forgetting previously learned tasks. Despite substantial empirical advances over recent years, the theoretical development of CL remains in its infancy. At the heart of developing CL theory lies the challenge that the data distributi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.0

WebUncertainty: Dual-Level Uncertainty Driven Planning and Reasoning For Autonomous Web Agent

2026-04-20 · Lingfeng Zhang, yongan sun, Jinpeng Hu, Hui Ma, yang ying, Kuien Liu, Zenglin Shi, Meng Wang, Yongan Sun, Yang Ying

Research Track B · General AI

Recent advancements in large language models (LLMs) have empowered autonomous web agents to execute natural language instructions directly on real-world webpages. However, existing agents often struggle with complex tasks involving dynamic interactions and long-horizon execution due to rigid planning strategies and hal…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.0

FSFM: A Biologically-Inspired Framework for Selective Forgetting of Agent Memory

2026-04-22 · Yingjie Gu, Bo Xiong, Yijuan Guo, Chao Li, Xiaojing Zhang, Liqiang Wang, Pengcheng Ren, Qi Sun, Jingyao Ma, Shidang Shi

Research Track A · General AI

For LLM agents, memory management critically impacts efficiency, quality, and security. While much research focuses on retention, selective forgetting--inspired by human cognitive processes (hippocampal indexing/consolidation theory and Ebbinghaus forgetting curve)--remains underexplored. We argue that in resource-cons…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.0

AutoSurfer -- Teaching Web Agents through Comprehensive Surfing, Learning, and Modeling

2026-04-29 · Fazle Elahi Faisal, Qianhui Wu, Baolin Peng, Jianfeng Gao

Research Track B · General AI

Recent advances in multimodal large language models (LLMs) have revolutionized web agents that can automate complex tasks on websites. However, their accuracy remains limited by the scarcity of high-quality web trajectory training data. Existing automatic trajectory generation methods suffer from incomplete website cov…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.0

HERCULES: Hardware-Efficient, Robust, Continual Learning Neural Architecture Search

2026-05-03 · Matteo Gambella, Fabrizio Pittorino, Manuel Roveri

Research Track A · General AI

Neural Architecture Search (NAS) has emerged as a powerful framework for automatically discovering neural architectures that balance accuracy and efficiency. However, as AI transitions from static benchmarks to real-world deployment, the traditional focus on hardware-aware efficiency is no longer sufficient. We observe…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.0

CRAFT: Forgetting-Aware Intervention-Based Adaptation for Continual Learning

2026-05-07 · Md Anwar Hossen, Fatema Siddika, Juan Pablo Munoz, Tanya Roosta, Ali Jannesari

Research Track A · General AI

Large language models (LLMs) can acquire new capabilities through fine-tuning, but continual adaptation often leads to catastrophic forgetting. We propose CRAFT, a continual learning framework that avoids updating model weights by instead learning low-rank interventions on hidden representations. CRAFT proceeds in thre…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.0

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

2026-05-25 · Bowen Wang, Dunjie Lu, Junli Wang, Tianyi Bai, Shixuan Liu, Zhipeng Zhang, Haiquan Wang, Hao Hu, Tianbao Xie, Shuai Bai, Dayiheng Liu, Que Shen, Junyang Lin, Tao Yu

Research Track B · General AI

Reinforcement learning with verifiable rewards (RLVR) has driven breakthroughs in domains such as math, tool-use, and software engineering, yet its extension to computer-use agents (CUAs) has been bottlenecked by the scarcity of scalable training data with deterministic rewards. Constructing such data for CUAs requires…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 20.0

Does The Way You Plan Matter? An Empirical Study of Planning Representations for LLM Web Agents

2026-05-28 · Alejandra Zambrano, Sara Vera Marjanovic, Imene Kerboua, Xing Han Lù, Leila Kosseim

Research Track B · General AI

Despite recent advances, LLM-based web agents still struggle with limited exploration, omission of critical steps, and sensitivity to task constraints. Prior work suggests that many of these failures stem from weaknesses in planning, yet the impact of alternative natural language plan representation remains unexplored.…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.9

The Gentle Collapse: Distributional Metrics for Continual Learning

2026-06-23 · Ahmed Anwar, Andreas Wagner, Federico Raue, Tobias Nauen, Andreas Dengel

Research Track A

Accuracy degradation is the standard metric for Catastrophic Forgetting (CF), however, it records only whether forgetting occurred or not. It saturates at the extremes and collapses discretely at task boundaries, hiding the internal structure of what is being forgotten. We introduce six softmax-derived metrics spanning…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.8

Towards Multi-Agent Autonomous Reasoning in Hydrodynamics

2026-05-01 · Jinpai Zhao, Albert Cerrone, Joannes Westerink, Clint Dawson

General AI

Single-agent systems (SAS) have become the default pattern for LLM-driven scientific workflows, but routing planning, tool use, and synthesis through a single context window comes with a well-known cost: as tool specifications and observational traces accumulate, the effective context available for each decision shrink…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.8

MemGym: a Long-Horizon Memory Environment for LLM Agents

2026-05-20 · Wujiang Xu, Yu Wang, Kai Mei, Kaiqu Liang, Zhenting Wang, Mingyu Jin, Han Zhang, Shi-Xiong Zhang, Wenyue Hua, Sambit Sahu, Dimitris N. Metaxas

Research Track A · Research Track B · General AI

Memory is a central capability for LLM agents operating across long-horizon tasks. Existing memory benchmarks predominantly evaluate retention of personalized information in multi-turn chat scenarios, overlooking the dynamic memory formation that occurs during extended agent execution. Consequently, the memory systems …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.8

MIRAGE: Stealthy Visual Prompt Injection for Vulnerability Detection in Web Agents

2026-06-16 · Xuelong Dai, Jianyu Ma, Boyang Ma, Biwei Yan, Yijun Yang, Yue Zhang

Research Track B · General AI

Multimodal Large Language Model (MLLM)-based web agents provide practical, high-precision solutions for visual browser automation; however, they inherently expand the attack surface, introducing novel vision-based vulnerabilities. Existing adversarial evaluations targeting these agents frequently rely on permissive thr…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.8

Customized Generative AI Agent for Transportation Engineering Practice: A Development and Continued Pre-training Guideline

2026-06-27 · Dianwei Chen, Yuan-Zheng Lei, Zifan Zhang, Yuchen Liu, Xianfeng, Yang

General AI

Recent advancements in generative artificial intelligence (AI) and large language models (LLMs) have shown significant promise in automating complex reasoning, summarization, and question-answering tasks. However, the effectiveness of general-purpose LLMs in specialized engineering domains remains limited due to insuff…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.8

OmniCoT: A Benchmark for Global and Multi-Step Panoramic Reasoning

2026-06-29 · Haocong He, Chenfei Liao, Zichen Wen, Zihao Dongfang, Xu Zheng, Bin Ren, Chang Su, Zixin Zhang, Harold Haodong Chen, Hongfei Zhang, Weijia Li, Kailun Yang, Conghui He, Xuming Hu, Nicu Sebe, Linfeng Zhang

General AI

Multimodal Large Language Models (MLLMs) have demonstrated promising spatial reasoning capabilities, while these abilities remain underexplored in the emerging visual modality of panoramic imagery. The full 360°$\times$180° field of view of panoramas essentially supports complex global multi-step reasoning, which is al…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 19.8

Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads

2026-07-01 · Aryo Pradipta Gema, Beatrice Alex, Pasquale Minervini

General AI

In long-context use, large language models frequently synthesize answers from the meaning of a relevant context span rather than literally copy-pasting them. Identifying which attention heads perform this synthesis matters for interpreting long-context model behavior. Yet existing detectors miss these heads by construc…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.6

ReContext: Recursive Evidence Replay as LLM Harness for Long-Context Reasoning

2026-07-02 · Yanjun Zhao, Ruizhong Qiu, Tianxin Wei, Yuanchen Bei, Zhining Liu, Lingjie Chen, Ismini Lourentzou, Hanghang Tong, Jingrui He

General AI

Understanding and reasoning over long contexts has become a key requirement for deploying large language models (LLMs) in realistic applications. Although recent LLMs support increasingly long context windows, they often fail to use relevant evidence that is already present in the input, revealing a gap between context…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 19.5

ImplicitMemBench: Measuring Unconscious Behavioral Adaptation in Large Language Models

2026-04-09 · Chonghan Qin, Xiachong Feng, Weitao Ma, Xiaocheng Feng, Lingpeng Kong

General AI

Existing memory benchmarks for LLM agents evaluate explicit recall of facts, yet overlook implicit memory where experience becomes automated behavior without conscious retrieval. This gap is critical: effective assistants must automatically apply learned procedures or avoid failed actions without explicit reminders. We…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 19.5

CocoaBench: Evaluating Unified Digital Agents in the Wild

2026-04-13 · CocoaBench Team, Shibo Hao, Zhining Zhang, Zhiqi Liang, Tianyang Liu, Yuheng Zha, Qiyue Gao, Jixuan Chen, Zilong Wang, Zhoujun Cheng, Haoxiang Zhang, Junli Wang, Hexi Jin, Boyuan Zheng, Kun Zhou, Yu Wang, Feng Yao, Licheng Liu, Yijiang Li, Zhifei Li, Zhengtao Han, Pracha Promthaw, Tommaso Cerruti, Xiaohan Fu, Ziqiao Ma, Jingbo Shang, Lianhui Qin, Julian McAuley, Eric P. Xing, Zhengzhong Liu, Rupesh Kumar Srivastava, Zhiting Hu

General AI

LLM agents now perform strongly in software engineering, deep research, GUI automation, and various other applications, while recent agent scaffolds and models are increasingly integrating these capabilities into unified systems. Yet, most evaluations still test these capabilities in isolation, which leaves a gap for m…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 19.5

DR^{3}-Eval: Towards Realistic and Reproducible Deep Research Evaluation

2026-04-16 · Qianqian Xie, Qingheng Xiong, He Zhu, Tiantian Xia, Xueming Han, Fanyu Meng, Jiakai Wang, Zhiqi Bai, Chengkang Jiang, Zhaohui Wang, Yubin Guo, Yuqing Wen, Jiayang Mao, Zijie Zhang, Shihao Li, Yanghai Wang, Yuxiang Ren, Junlan Feng, Jiaheng Liu

General AI

Deep Research Agents (DRAs) aim to solve complex, long-horizon research tasks involving planning, retrieval, multimodal understanding, and report generation, yet their evaluation remains challenging due to dynamic web environments and ambiguous task definitions. We propose DR^{3}-Eval, a realistic and reproducible benc…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 19.5

UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards

2026-04-16 · Jun Wang, Shuo Tan, Zelong Sun, Tiancheng Gu, Yongle Zhao, Ziyong Feng, Kaicheng Yang, Cewu Lu

General AI

Retrieval-Augmented Generation (RAG) extends Large Vision-Language Models (LVLMs) with external visual knowledge. However, existing visual RAG systems typically rely on generic retrieval signals that overlook the fine-grained visual semantics essential for complex reasoning. To address this limitation, we propose UniDo…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.5

Hidden Failure Modes of Gradient Modification under Adam in Continual Learning, and Adaptive Decoupled Moment Routing as a Repair

2026-04-24 · Yuelin Hu, Zhenbo Yu, Zhengxue Cheng, Wei Liu, Li Song

Research Track A · General AI

Many continual-learning methods modify gradients upstream (e.g., projection, penalty rescaling, replay mixing) while treating Adam as a neutral backend. We show this composition has a hidden failure mode. In a high-overlap, non-adaptive 8-domain continual LM, all shared-routing projection baselines collapse close to va…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.5

Attribution-Guided Continual Learning for Large Language Models

2026-05-06 · Yazheng Liu, Yuxuan Wan, Rui Xu, Xi Zhang, Sihong Xie, Hui Xiong

Research Track A · General AI

Large language models (LLMs) often suffer from catastrophic forgetting in continual learning: after learning new tasks sequentially, they perform worse on earlier tasks. Existing methods mitigate catastrophic forgetting by data replay, parameter freezing, or regularization. However, these methods lack semantic awarenes…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.5

Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments

2026-06-04 · Parth Asawa, Christopher M. Glaze, Gabriel Orlanski, Ramya Ramakrishnan, Benji Xu, Asim Biswal, Vincent Sunn Chen, Frederic Sala, Matei Zaharia, Joseph E. Gonzalez

Research Track A · General AI

Continual learning, the ability of AI systems to improve through sequential experience, has attracted substantial interest, but no high-quality benchmark exists to evaluate it. We introduce Continual Learning Bench (CL-Bench), the first difficult, expert-validated benchmark designed to measure whether LLM-based systems…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 19.5

SciOrch: Learning to Orchestrate Expert LLMs for Solving Frontier Multimodal Scientific Reasoning Tasks

2026-06-14 · Jingru Guo, Xiangyuan Xue, Lian Zhang, Wanghan Xu, Siki Chen, Philip Torr, Wanli Ouyang, Lei Bai, Zhenfei Yin

General AI

Frontier scientific reasoning remains a major challenge for large language models (LLMs), where even the strongest commercial systems fall short of expert-level performance. A closer look at model behavior reveals substantial complementarity that single-model evaluation hides: different frontier models excel on differe…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 19.5

Guava: An Effective and Universal Harness for Embodied Manipulation

2026-06-16 · Haowen Liu, Xirui Li, Shaoxiong Yao, Peng Shi, Tianyi Zhou, Jia-Bin Huang, Furong Huang, Jiayuan Mao

General AI

Language models trained on large-scale vision-language data have demonstrated strong potential for embodied agents. Harnessing models through embodied tools use offers a promising alternative to end-to-end vision-language-action systems by combining high-level reasoning with external modules for perception, planning, a…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.5

Social World Model for Lifelong Social Intelligence

2026-06-19 · Yu Luo

Research Track A · General AI

Social intelligence is a core competency for language agents, yet current research primarily focuses on static capability evaluation rather than how these skills are continuously shaped and accumulated. This gap calls for a shift toward sustainable learning paradigms. Currently, two methodological pain points exist: so…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.4

LiMoDE: Rethinking Lifelong Robot Manipulation from a Mixture-of-Dynamic-Experts Perspective

2026-06-24 · Zhihao Gu, Lin Wang

Research Track A · General AI

Building a generalist robot that can leverage prior knowledge for continuous task adaptation remains a significant challenge. Previous works alleviate the catastrophic forgetting problem by parameter-efficient fine-tuning for single-task adaptation. However, they fail to extract reusable skills and model the interactio…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 19.4

TUA-Bench: A Benchmark for General-Purpose Terminal-Use Agents

2026-06-26 · Shoufa Chen, Luyuan Wang, Xuan Yang, Zhiheng Liu, Yuren Cong, Yuanfeng Ji, Feiyan Zhou, Xiaohui Zhang, Fanny Yang, Belinda Zeng

General AI

As large language models and harness frameworks continue to advance, agents operating in terminals are increasingly capable of performing a broader range of general computer-use tasks beyond coding. However, existing benchmarks do not adequately evaluate general-purpose terminal computer-use agents (TUAs): general comp…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 19.4

Video-MME-Logical: A Controlled Diagnostic Benchmark for Video Temporal-Logical Reasoning

2026-06-26 · Hohin Kwan, Hongyu Li, Ray Zhang, Manyuan Zhang, Xianghao Kong, Anyi Rao, Jiahao Xie, Si Liu

General AI

Recent interest in multimodal large language models (MLLMs) raises a central question: can they reason over dynamic visual evidence rather than merely recognize objects or events in individual frames? This ability, which we refer to as video temporal-logical reasoning, requires models to maintain, update, and compose e…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.3

Optimizing LLM Prompt Engineering with DSPy Based Declarative Learning

2026-04-06 · Shiek Ruksana, Sailesh Kiran Kurra, Thipparthi Sanjay Baradwaj

General AI

Large Language Models (LLMs) have shown strong performance across a wide range of natural language processing tasks; however, their effectiveness is highly dependent on prompt design, structure, and embedded reasoning signals. Conventional prompt engineering methods largely rely on heuristic trial-and-error processes, …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.3

AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation

2026-04-09 · Ziwei Zhou, Zeyuan Lai, Rui Wang, Yifan Yang, Zhen Xing, Yuqing Yang, Qi Dai, Lili Qiu, Chong Luo

General AI

Text-to-Audio-Video (T2AV) generation is rapidly becoming a core interface for media creation, yet its evaluation remains fragmented. Existing benchmarks largely assess audio and video in isolation or rely on coarse embedding similarity, failing to capture the fine-grained joint correctness required by realistic prompt…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.3

EigentSearch-Q+: Enhancing Deep Research Agents with Structured Reasoning Tools

2026-04-09 · Boer Zhang, Mingyan Wu, Dongzhuoran Zhou, Yuqicheng Zhu, Wendong Fan, Puzhen Zhang, Zifeng Ding, Guohao Li, Yuan He

Research Track B · General AI

Deep research requires reasoning over web evidence to answer open-ended questions, and it is a core capability for AI agents. Yet many deep research agents still rely on implicit, unstructured search behavior that causes redundant exploration and brittle evidence aggregation. Motivated by Anthropic's "think" tool parad…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.3

Drawing on Memory: Dual-Trace Encoding Improves Cross-Session Recall in LLM Agents

2026-04-14 · Benjamin Stern, Peter Nadel

General AI

LLM agents with persistent memory store information as flat factual records, providing little context for temporal reasoning, change tracking, or cross-session aggregation. Inspired by the drawing effect [3], we introduce dual-trace memory encoding. In this method, each stored fact is paired with a concrete scene trace…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.3

From Recall to Forgetting: Benchmarking Long-Term Memory for Personalized Agents

2026-04-21 · Md Nayem Uddin, Kumar Shubham, Eduardo Blanco, Chitta Baral, Gengyu Wang

Research Track A · General AI

Personalized agents that interact with users over long periods must maintain persistent memory across sessions and update it as circumstances change. However, existing benchmarks predominantly frame long-term memory evaluation as fact retrieval from past conversations, providing limited insight into agents' ability to …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.3

Agentic AI in the Software Development Lifecycle: Architecture, Empirical Evidence, and the Reshaping of Software Engineering

2026-04-29 · Happy Bhati

General AI

The arrival of large language models (LLMs) capable of multi-step reasoning, tool use, and long-horizon planning has produced a qualitative shift in software engineering. Where earlier code-completion tools such as GitHub Copilot operated at the granularity of a line or function, modern agentic systems -- Claude Code, …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.3

Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

2026-05-01 · Derong Xu, Shuochen Liu, Pengfei Luo, Pengyue Jia, Yingyi Zhang, Yi Wen, Yimin Deng, Wenlin Zhang, Enhong Chen, Xiangyu Zhao, Tong Xu

General AI

Large language model (LLM) agents require long-term user memory for consistent personalization, but limited context windows hinder tracking evolving preferences over long interactions. Existing memory systems mainly rely on static, hand-crafted update rules; although reinforcement learning (RL)-based agents learn memor…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.3

Where Does the Answer Come From? Benchmarking View-Level Visual Evidence Identification in Multi-View MLLMs for Autonomous Driving

2026-06-08 · Yimu Wang, Yee Man Choi, Barry Zhang, Mozhgan Nasr Azadani, Sean Sedwards, Krzysztof Czarnecki

General AI

Multimodal large language models (MLLMs) achieve strong results on visual reasoning benchmarks, but answer accuracy alone does not indicate whether a model relied on the correct visual evidence. This gap is particularly important in multi-view driving scenes used for autonomous driving, where a model can produce a plau…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.2

InvestPhilBench: A Multi-Layer Dynamic Benchmark for Evaluating Large Language Model Procedural Reasoning in Expert Investment Philosophy

2026-06-24 · Mingguang Chen, Bo Qu

General AI

Large language models are increasingly deployed as investment research assistants, yet no benchmark tests whether they can accurately reconstruct and apply the specific procedural decision frameworks of expert investors. We introduce InvestPhilBench, a multi-layer dynamic benchmark spanning eight cognitive tiers, from …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.2

Neglected Free Lunch from Post-training: Progress Advantage for LLM Agents

2026-06-24 · Changdae Oh, Wendi Li, Seongheon Park, Samuel Yeh, Tanwi Mallick, Sharon Li

General AI

Process reward models enable fine-grained, step-level evaluation of LLMs, yet building them for agentic settings remains prohibitively difficult: long-horizon interactions, irreversible actions, and stochastic environment feedback make both human annotation and Monte Carlo estimation infeasible at scale. In this work, …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.2

Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It

2026-06-24 · Yupu Hao, Zhuoran Jin, Huanxuan Liao, Kang Liu, Jun Zhao

General AI

Tool use enables large language models (LLMs) to perform complex tasks, and recent agentic reinforcement learning (RL) methods show promise for enhancing model capabilities. However, RL alone often leads to instability or limited gains in tool-use tasks. In our experiments, some models exhibit catastrophic collapse, wh…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.0

FeDMRA: Federated Incremental Learning with Dynamic Memory Replay Allocation

2026-03-30 · Tiantian Wang, Xiang Xiang, Simon S. Du

Research Track A · General AI

In federated healthcare systems, Federated Class-Incremental Learning (FCIL) has emerged as a key paradigm, enabling continuous adaptive model learning among distributed clients while safeguarding data privacy. However, in practical applications, data across agent nodes within the distributed framework often exhibits n…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.0

GUIDE: Interpretable GUI Agent Evaluation via Hierarchical Diagnosis

2026-04-06 · Yuwen Zhai, Runze Li, Liang Wang, Nian Shi, Liwu Xu, Wei Zhang, Ran Lin, Bo Xu, Benlei Cui

Research Track B · General AI

Evaluating GUI agents presents a distinct challenge: trajectories are long, visually grounded, and open-ended, yet evaluation must be both accurate and interpretable. Existing approaches typically apply a single holistic judgment over the entire action-observation sequence-a strategy that proves unreliable on long-hori…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.0

Task Switching Without Forgetting via Proximal Decoupling

2026-04-20 · Pourya Shamsolmoali, Masoumeh Zareapoor, Eric Granger, William A. P. Smith, Yue Lu

Research Track A · General AI

In continual learning, the primary challenge is to learn new information without forgetting old knowledge. A common solution addresses this trade-off through regularization, penalizing changes to parameters critical for previous tasks. In most cases, this regularization term is directly added to the training loss and o…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 19.0

StepAudio 2.5 Technical Report

2026-05-22 · Bin Lin, Bo Zhao, Boyong Wu, Chao Yan, Chen Wu, Cheng Yi, Chengyuan Yao, Daijiao Liu, Fei Tian, Feng Tian, Haiyang Sun, Haoyang Zhang, Jiangjie Zhen, Jinglan Gong, Jun Chen, Li Xie, Peilin Li, Peng Yang, Pengfei Tan, Qingjian Lin, Runze Li, Shenghua Hu, Siyi Zhou, Wenwen Qu, Xiangyu Li, Xiangyu Tony Zhang, Xuerui Yang, Yang Yang, Yechang Huang, Yu Fu, Yuchu Luo, Yuxin Li, Yuxin Zhang, Zhengyan Sheng, Brian Li, Chang Zeng, Changlin Zhang, Chen Geng, Chenghao Dong, Chengli Feng, Dan Zhou, Danni Wan, Di Chen, Die Zhang, Dongqing Pang, Guanglong Yang, Guoqiang Hu, Huangxi Zhu, Jianzheng Gao, Jinghua Liang, Jinmei Wan, Junjie Yuan, Kang An, Lei Lei, Limin Zhong, Lun Cai, Mengqiang Ren, Min Xu, Mingliang Li, Mingxiao Li, Na Wang, Qiang Tong, Qiaoling Huang, Qingfu Du, Rui Wang, Shengchen Zhou, Shi Qiu, Shihao Peng, Shiliang Yang, Siqi Tu, Tianjiao Deng, Ting Xu, Tong Wang, WeiMing Niu, Wuxun Xie, Xianwei Zhang, Xianyu Feng, Xiaojia Liu, Xing Chen, Xiongbin Wu, Yan Wu, Yang Li, Yi Liu, Yifan Zhang, Yile Liu, Yongshen Long, Yu Luo, Yuanhao Ding, Yuhao Wang, Yuhe Yin, Yunfang Xu, Yuxiang Yang, Zhiguo Huang, Zhiyue Wu, Zichao Li, Zichao Zhou, Daxin Jiang, Future Li, Gang Yu, Xiangyu Zhang, Yibo Zhu

General AI

Unified audio-language modeling has emerged as a prominent trend in modern speech systems, promising to bring the reasoning capabilities of large language models to auditory tasks. However, existing unified foundations often struggle to match the depth of specialized systems across automatic speech recognition (ASR), t…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.0

Revisiting Observation Reduction for Web Agents: Comprehensive Evaluation with a Lightweight Framework

2026-05-28 · Masafumi Enomoto, Ryoma Obara, Haochen Zhang, Masafumi Oyamada

Research Track B · General AI

HTML observations in LLM-based web agents are extremely long, and while many reduction methods have been proposed, it remains unclear which methods reduce overall agent latency while maintaining performance. The main obstacle is the high cost of end-to-end evaluation: in our experiments, evaluating 11 methods across 32…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 19.0

iVGR: Internalizing Visually Grounded Reasoning for MLLMs with Reinforcement Learning

2026-05-29 · Chang-Bin Zhang, Yujie Zhong, Qiang Zhang, Kai Han

General AI

While visually grounded Chain-of-Thought (CoT) has emerged as a promising paradigm to enhance fine-grained perception in multimodal large language models (MLLMs), its efficacy during the inference phase remains underexplored. In this work, we empirically find that mandating explicit object boxes in visually grounded Co…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.0

Crafting Your Evolving Dreams: Concept-Incremental Versatile Customization

2026-06-03 · Jiahua Dong, Wenqi Liang, Hongliu Li, Yang Cong, Duzhen Zhang, Hanbin Zhao, Henghui Ding, Yulun Zhang, Salman Khan, Fahad Shahbaz Khan

Research Track A · General AI

Custom diffusion models (CDMs) have garnered significant interest owing to their remarkable capacity for generating personalized concepts. However, the majority of CDMs unrealistically presume that the user's collection of personalized concepts is static and incapable of incremental growth over time. Furthermore, they …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.0

Online Skill Learning for Web Agents via State-Grounded Dynamic Retrieval

2026-06-03 · Jiaxi Li, Ke Deng, Yun Wang, Jingyuan Huang, Yucheng Shi, Qiaoyu Tan, Jin Lu, Ninghao Liu

Research Track B · General AI

Language agents increasingly rely on reusable skills to improve multi-step web automation across related tasks. A growing line of work studies online skill learning, where agents continually induce skills from previous task trajectories and reuse them in future tasks on the fly. However, existing methods mainly reuse s…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.0

Bridging Geographic Bias in Urban Streetscape Inference via Lifelong Learning with Visual-Semantic Pivoting

2026-06-13 · Xinze Zhang

Research Track A · General AI

Visual perception of urban streetscapes underpins evidence-based decisions in landscape planning, public health, and place-making. Yet models trained on a few well-photographed metropolises systematically misjudge underrepresented districts, propagating geographic bias into downstream policy. We address this gap with H…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.0

KeepLoRA++: Continual Learning with Layer-Scaled Residual Gradient Adaptation

2026-06-15 · Mao-Lin Luo, Yi-Lin Zhang, Zi-Hao Zhou, Yankun Hong, Xialiang Tong, Mingxuan Yuan, Tong Wei, Min-Ling Zhang

Research Track A · General AI

Continual learning for pre-trained vision-language models requires balancing three competing objectives: retaining pre-trained knowledge, preserving knowledge from a sequence of learned tasks, and maintaining the plasticity to acquire new knowledge. This paper presents KeepLoRA++, balancing these objectives through a u…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 19.0

DuoMem: Towards Capable On-Device Memory Agents via Dual-Space Distillation

2026-06-29 · Peyman Hosseini, Ondrej Bohdal, Ahmed Alajrami, Andrea Maracani, Ignacio Castro, Matthew Purver, Mete Ozay, Savas Ozkan, Taha Ceritli

General AI

Large Language Model (LLM)-based agents can solve complex procedural tasks by interacting with environments over multiple turns, but this ability typically depends on large models, long contexts, and repeated inference calls. This makes advanced memory-augmented agents difficult to deploy on resource-constrained device…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.9

Attention-Spectrum Regularization for Replay-Free Continual Multimodal LLMs

2026-06-22 · Chuangxin Zhao, Canran Xiao, Siyuan Ma, Mengyao Lyu, Yanbiao Ma, Jun Xia, Guiguang Ding, Yang Liu

Research Track A · General AI

Multimodal large language models (MLLMs) are increasingly required to adapt to non-stationary streams of visual domains, question types, and user instructions, yet continual fine-tuning often causes severe forgetting of previously acquired multimodal skills. Existing continual vision-language methods mainly preserve ou…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.8

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

2026-05-07 · Tianle Wang, Zhaoyang Wang, Guangchen Lan, Xinpeng Wei, Sipeng Zhang, Guanwen Qiu, Abulhair Saparov

General AI

Reinforcement learning (RL) has been applied to improve large language model (LLM) reasoning, yet the systematic study of how training scales with task difficulty has been hampered by the lack of controlled, scalable environments. We introduce ScaleLogic, a synthetic logical reasoning framework that offers independent …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.8

AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

2026-05-12 · Runhui Huang, Jie Wu, Rui Yang, Zhe Liu, Hengshuang Zhao

General AI

In this paper, we propose AlphaGRPO, a novel framework that applies Group Relative Policy Optimization (GRPO) to AR-Diffusion Unified Multimodal Models (UMMs) to enhance multimodal generation capabilities without an additional cold-start stage. Our approach unlocks the model's intrinsic potential to perform advanced re…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.8

Mandol: An Agglomerative Agent Memory System for Long-Term Conversations

2026-06-29 · Yuhan Zhang, Zhiyuan Guo, Ziheng Zeng, Wei Wang, Wentao Wu, Lijie Xu

General AI

Long-term conversational agents need to remember and query cross-session, multi-typed information with complex correlations. Existing agent memory systems rely on heterogeneous vector and graph databases, which fragment memory information and cause high cross-database I/O latency. For retrieval, common RAG-style method…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.8

Training Vision-Language-Action Models with Dense Embodied Chain-of-Thought Supervision

2026-06-29 · Haoyang Li, Guanlin Li, Youhe Feng, Chen Zhao, Zhuoran Wang, Yang Li, Qizhe Wei, Shifeng Bao, Haitao Shen, Yihan Zhao, Tong Yang, Jing Zhang

General AI

Cross-embodiment transfer in vision-language-action (VLA) models remains challenging because low-level state and action spaces differ fundamentally across robot platforms. We observe that the high-level cognitive process underlying manipulation, including scene perception, object identification, task planning, and sub-…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.8

Denser $\neq$ Better: Limits of On-Policy Self-Distillation for Continual Post-Training

2026-07-02 · Meng Wang, Haohan Zhao, Wenzhuo Liu, Lu Yang, Geng Liu, Haiyang Guo, Guo-Sen Xie, Gaofeng Meng, Hongbin Liu, Fei Zhu

Research Track A · General AI

Continual post-training enables foundation models to acquire new knowledge while preserving existing capabilities. Recent work suggests that on-policy learning can mitigate forgetting, with on-policy self-distillation emerging as a particularly attractive approach. In this work, we revisit this optimistic view through …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.5

cotomi Act: Learning to Automate Work by Watching You

2026-05-04 · Masafumi Oyamada, Kunihiro Takeoka, Kosuke Akimoto, Ryoma Obara, Masafumi Enomoto, Haochen Zhang, Daichi Haraguchi, Takuya Tamura

Research Track B · General AI

What if a browser agent could learn your work simply by watching you do it? We present cotomi Act, a browser-based computer-using agent that combines reliable multi-step task execution with persistent organizational knowledge learned from user behavior. For execution, an agent scaffold with adaptive lazy observation, v…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.5

Continual Knowledge Updating in LLM Systems: Learning Through Multi-Timescale Memory Dynamics

2026-05-06 · Andreas Pattichis, Constantine Dovrolis

Research Track A · General AI

LLMs are trained once, then deployed into a world that never stops changing. External memory compensates for this, but most systems manage it explicitly rather than letting it adapt on its own. Biological memory works differently: coupled multi-timescale dynamics make new associations immediately usable, strengthen wha…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.5

Joint sparse coding and temporal dynamics support context reconfiguration

2026-05-11 · Qianqian Shi, Yue Che, Faqiang Liu, Hongyi Li, Mingkun Xu, Sandra Reinert, Pieter M. Goltstein, Rong Zhao, Luping Shi

Research Track A

Adaptive behavior requires the brain to transition between distinct contexts while maintaining representations of prior experience. The ability to reconfigure neural representations without erasing previously acquired knowledge is central to learning in dynamic environments, yet the neural mechanisms that support this …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.5

Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack

2026-05-12 · Hao Wang, Hanchen Li, Qiuyang Mang, Alvin Cheung, Koushik Sen, Dawn Song

Research Track B · General AI

Agent benchmarks have become the de facto measure of frontier AI competence, guiding model selection, investment, and deployment. However, reward hacking, where agents maximize a score without performing the intended task, emerges spontaneously in frontier models without overfitting. We argue that benchmarks must be se…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.5

Learning When to Adapt

2026-05-18 · Ali Zindari, Xiaowen Jiang, Rotem Mulayoff, Sebastian U. Stich

Research Track A · General AI

Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method, yet its learned correction is static: the same low-rank update is applied to every input. This input-agnostic approach creates an inevitable compromise between adapting to the fine-tuning distribution and preserving pre-trained behavior…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.5

PANDO: Efficient Multimodal AI Agents via Online Skill Distillation

2026-05-24 · Yubo Li, Yidi Miao, Yuntian Shen, Yuxin Liu

Research Track B · General AI

Recent advances in multimodal web agents often rely on increased inference-time computation, including rollout search, verifier passes, offline skill discovery, and specialist model stacks. This raises a central question: can a web agent become more efficient as it accumulates experience, rather than more expensive? We…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.5

Mask the Target: A Plug-and-Play Regularizer Against LoRA Forgetting

2026-05-28 · Runze Xu, Arpit Garg, Hemanth Saratchandran, Simon Lucey

Research Track A · General AI

Low-Rank Adaptation (LoRA) has become one of the most widely used fine-tuning mechanisms for adapting large language models to new domains, tasks, and users. Yet adaptation performance alone can obscure an important failure mode: LoRA updates may improve performance on the target distribution while degrading prior capa…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 18.5

MechVQA: Benchmarking and Enhancing Multimodal LLMs on Comprehensive Mechanical Drawing Understanding

2026-05-29 · Qian Kou, Xiaofeng Shi, Yulin Li, Xiaosong Qiu, Xinyang Wang, Hua Zhou, Cao Dongxing

General AI

Multimodal Large Language Models (MLLMs) have demonstrated significant achievements in general visual question answering (VQA) tasks. However, they remain brittle on mechanical engineering drawings, where high annotation density and weak domain knowledge, compounded by unreliable spatial relation reasoning under strict…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.5

Federated continual learning: A comprehensive survey on lifelong and privacy-preserving learning over distributed and non-stationary data

2026-06-09 · Masoume Gholizade, Fabrizio Ruffini, Pietro Ducange, Francesco Marcelloni

Research Track A · General AI

Federated Learning (FL) enables collaborative and privacy-preserving model training across distributed clients, but most existing FL systems implicitly assume data stationarity. In real-world settings-such as healthcare, industrial IoT (IIOT), cybersecurity, and smart cities-data streams are inherently non-stationary, …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 18.5

Kwai Keye-VL-2.0 Technical Report

2026-06-09 · Kwai Keye Team, Bin Wen, Changyi Liu, Chengru Song, Chongling Rao, Guowang Zhang, Han Li, Haonan Fan, Hengrui Ju, Jiankang Chen, Jiapeng Chen, Jiawei Yuan, Kaixuan Yang, Kaiyu Jiang, Kun Gai, Lingzhi Zhou, Na Nie, Sen Na, Tianke Zhang, Tingting Gao, Xuanyu Zheng, Yulong Chen, Fan Yang, Haixuan Gao, Lele Yang, Mingqiao Liu, Muxi Diao, Qi Zhang, Qile Su, Wei Chen, Wentao Hong, Xingyu Lu, Yancheng Long, Yankai Yang, Yingxin Li, Yiyang Fan, Yu Xia, Yuzhe Chen, Ziliang Lai, Chuan Yi, Haonan Jia, Tianming Liang, Weixin Xu, Xiaoxiao Ma, Yang Tian, Yufei Han, Feng Han, Hang Li, Jing Wang, Jinghui Jia, Junmin Chen, Junyu Shi, Ruilin Zhang

General AI

We introduce Kwai Keye-VL-2.0-30B-A3B, an open-source Mixture-of-Experts (MoE) multimodal foundation model designed to advance long-video understanding and agentic intelligence. To address the challenges of ultra-long contexts, information redundancy, and prohibitive computational costs inherent in hour-level videos, K…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 18.5

IndustryBench-MIPU: Benchmarking Multi-Image Attribute Value Extraction for Industrial Products

2026-06-12 · Haonan Qi, Jin Cao, Yongqi Zhang, Xintong Wang, Weidong Tang, Bin Chen, Chengfu Huo, Haojun Pan, Hengyu You, Jing Li, Yingde Wang, Liang Ding

General AI

Industrial products such as valves and circuit breakers are defined by dense technical specifications that govern procurement, compatibility, and safety across supply chains. These specifications are scattered across multiple heterogeneous product images, including specification tables, nameplates, and technical drawin…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.5

Learning New Tasks via Reusable Skills: Skill-Compositional Experts for Embodied Continual Learning

2026-06-14 · Shuaike Zhang, Shaokun Wang, Haoyu Tang, Jianlong Wu, Liqiang Nie

Research Track A · General AI

Embodied Continual Learning (ECL) aims to enable robots to continually acquire new manipulation tasks while retaining previously learned behaviors under closed-loop control. Compared with conventional continual learning, ECL suffers from more severe catastrophic forgetting. Feature drift accumulated under closed-loop c…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.3

Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective

2026-03-15 · Mohamed Aghzal, Gregory J. Stein, Ziyu Yao

Research Track B · General AI

Large language model (LLM) web agents are increasingly used for web navigation but remain far from human reliability on realistic, long-horizon tasks. Existing evaluations focus primarily on end-to-end success, offering limited insight into where failures arise. We propose a hierarchical planning framework to analyze w…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.3

Demographic Fairness in Multimodal LLMs: A Benchmark of Gender and Ethnicity Bias in Face Verification

2026-03-26 · Ünsal Öztürk, Hatef Otroshi Shahreza, Sébastien Marcel

General AI

Multimodal Large Language Models (MLLMs) have recently been explored as face verification systems that determine whether two face images are of the same person. Unlike dedicated face recognition systems, MLLMs approach this task through visual prompting and rely on general visual and reasoning abilities. However, the d…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.3

ElephantBroker: A Knowledge-Grounded Cognitive Runtime for Trustworthy AI Agents

2026-03-26 · Cristian Lupascu, Alexandru Lupascu

Research Track A · General AI

Large Language Model based agents increasingly operate in high stakes, multi turn settings where factual grounding is critical, yet their memory systems typically rely on flat key value stores or plain vector retrieval with no mechanism to track the provenance or trustworthiness of stored knowledge. We present Elephant…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.3

Seeing with You: Perception-Reasoning Coevolution for Multimodal Reasoning

2026-03-30 · Ziqi Miao, Haonan Jia, Lijun Li, Chen Qian, Yuan Xiong, Wenting Yan, Jing Shao

General AI

Reinforcement learning with verifiable rewards (RLVR) has substantially enhanced the reasoning capabilities of multimodal large language models (MLLMs). However, existing RLVR approaches typically rely on outcome-driven optimization that updates both perception and reasoning using a shared reward based solely on the fi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.3

Hybrid Framework for Robotic Manipulation: Integrating Reinforcement Learning and Large Language Models

2026-03-31 · Md Saad, Sajjad Hussain, Mohd Suhaib

General AI

This paper introduces a new hybrid framework that combines Reinforcement Learning (RL) and Large Language Models (LLMs) to improve robotic manipulation tasks. By utilizing RL for accurate low-level control and LLMs for high level task planning and understanding of natural language, the proposed framework effectively co…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.3

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

2026-04-13 · Fei Tang, Zhiqiong Lu, Boxuan Zhang, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen

Research Track B · General AI

GUI agents drive applications through their visual interfaces instead of programmatic APIs, interacting with arbitrary software via taps, swipes, and keystrokes, reaching a long tail of applications that CLI-based agents cannot. Yet progress in this area is bottlenecked less by modeling capacity than by the absence of …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.3

Escaping the Context Bottleneck: Active Context Curation for LLM Agents via Reinforcement Learning

2026-04-13 · Xiaozhe Li, Tianyi Lyu, Yizhao Yang, Liang Shan, Siyi Yang, Ligao Zhang, Zhuoyi Huang, Qingwen Liu, Yang Li

Research Track B · General AI

Large Language Models (LLMs) struggle with long-horizon tasks due to the "context bottleneck" and the "lost-in-the-middle" phenomenon, where accumulated noise from verbose environments degrades reasoning over multi-turn interactions. To address this issue, we introduce a symbiotic framework that decouples context manag…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.3

Multi-ORFT: Stable Online Reinforcement Fine-Tuning for Multi-Agent Diffusion Planning in Cooperative Driving

2026-04-13 · Haojie Bai, Aimin Li, Ruoyu Yao, Xiongwei Zhao, Tingting Zhang, Xing Zhang, Lin Gao, and Jun Ma

General AI

Closed-loop cooperative driving requires planners that generate realistic multimodal multi-agent trajectories while improving safety and traffic efficiency. Existing diffusion planners can model multimodal behaviors from demonstrations, but they often exhibit weak scene consistency and remain poorly aligned with closed…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.3

From Reactive to Proactive: Assessing the Proactivity of Voice Agents via ProVoice-Bench

2026-04-16 · Ke Xu, Yuhao Wang, Yu Wang

General AI

Recent advancements in LLM agents are gradually shifting from reactive, text-based paradigms toward proactive, multimodal interaction. However, existing benchmarks primarily focus on reactive responses, overlooking the complexities of proactive intervention and monitoring. To bridge this gap, we introduce ProVoice-Benc…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.3

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

2026-04-16 · Yan Li, Zezi Zeng, Yifan Yang, Yuqing Yang, Ning Liao, Weiwei Guo, Lili Qiu, Mingxi Cheng, Qi Dai, Zhendong Wang, Zhengyuan Yang, Xue Yang, Ji Li, Lijuan Wang, Chong Luo

Research Track B · General AI

The rapid progress of Artificial Intelligence Generated Content (AIGC) tools enables images, videos, and visualizations to be created on demand for webpage design, offering a flexible and increasingly adopted paradigm for modern UI/UX. However, directly integrating such tools into automated webpage generation often lea…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.3

SSL-R1: Self-Supervised Visual Reinforcement Post-Training for Multimodal Large Language Models

2026-04-22 · Jiahao Xie, Alessio Tonioni, Nathalie Rauschmayr, Federico Tombari, Bernt Schiele

General AI

Reinforcement learning (RL) with verifiable rewards (RLVR) has demonstrated the great potential of enhancing the reasoning abilities in multimodal large language models (MLLMs). However, the reliance on language-centric priors and expensive manual annotations prevents MLLMs' intrinsic visual understanding and scalable …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.3

Exploration Hacking: Can LLMs Learn to Resist RL Training?

2026-04-30 · Eyon Jang, Damon Falck, Joschka Braun, Nathalie Kirch, Achu Menon, Perusha Moodley, Scott Emmons, Roland S. Zimmermann, David Lindner

General AI

Reinforcement learning (RL) has become essential to the post-training of large language models (LLMs) for reasoning, agentic capabilities and alignment. Successful RL relies on sufficient exploration of diverse actions by the model during training, which creates a potential failure mode: a model could strategically alt…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.3

Benchmark Everything Everywhere All at Once

2026-06-04 · Shiyun Xiong, Dongming Wu, Peiwen Sun, Yuang Ai, Bokang Yang, Wencheng Han, Xiao-Hui Li, Xiangyu Yue

General AI

Benchmarks are fundamental for evaluating and advancing LLMs and MLLMs by providing standardized and explicit measures of performance. However, their construction is labor-intensive and hard to reuse, raising concerns about sustainability and scalability. Moreover, existing benchmarks often quickly reach performance sa…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.3

Self-Augmenting Retrieval for Diffusion Language Models

2026-06-04 · Paul Jünger, Justin Lovelace, Linxi Zhao, Dongyoung Go, Kilian Q. Weinberger

General AI

Discrete diffusion language models generate text by iteratively denoising an entire response in parallel. At each step, they predict tentative tokens for every masked position, committing the confident predictions to the output and discarding the unconfident ones. We show that the discarded tokens are in fact a useful …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.3

SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating

2026-06-05 · Zequn Xie, Junjie Wang, Dan Yang, Jie Feng, Yue Shen, Jian Wang, Jinjie Gu

Research Track B · General AI

Deep research agents have demonstrated remarkable capabilities in complex information-seeking tasks, yet this power comes at a steep computational cost. Driven by accuracy-focused training paradigms, current models adopt brute-force strategies characterized by blind tool dependency and performative reasoning-generating…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.3

ABC-Bench: An Agentic Bio-Capabilities Benchmark for Biosecurity

2026-06-09 · Andrew Bo Liu, Samira Nedungadi, Bryce Cai, Alex Kleinman, Harmon Bhasin, Seth Donoughe

General AI

Large language models (LLMs) are rapidly acquiring capabilities relevant to biological research, from literature synthesis to interpretation of experimental data. Increasingly, LLM agents can also perform in silico biology tasks that previously required experienced human biologists. These emerging AI capabilities offer…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.3

P3D-Bench: Benchmarking MLLMs for Parametric 3D Generation and Structural Reasoning

2026-06-09 · Yikang Yang, Zhanpeng Hu, Youtian Lin, Mengqi Zhou, Jingxi Xu, Feihu Zhang, Jiaheng Liu, Yao Yao

General AI

Multimodal large language models can write code to produce complex programs as well as use programs to do 3D modeling, which opens up a new avenue for 3D generation powered by their priors, world knowledge and reasoning. Yet existing benchmarks rarely evaluate 3D modeling through code. Such modeling demands more than r…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.3

Agents-K1: Towards Agent-native Knowledge Orchestration

2026-06-11 · Zongsheng Cao, Bihao Zhan, Jinxin Shi, Jiong Wang, Fangchen Yu, Zhijie Zhong, Zijie Guo, Tianshuo Peng, Zhuo Liu, Yi Xie, Xiang Zhuang, Yue Fan, Runmin Ma, Shiyang Feng, Xiangchao Yan, Anran Liu, Peng Ye, Wenlong Zhang, Shufei Zhang, Chunfeng Song, Fenghua Ling, Jie Zhou, Liang He, Bo Zhang, Lei Bai

General AI

Current LLM-based research agents have advanced through agent orchestration, yet largely overlook scientific knowledge orchestration. Existing works often reduce papers to abstracts, surface mentions, and flat \texttt{cites} edges, omitting key entities, claims, evidence, mechanisms, and method lineages essential for s…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.3

InterleaveThinker: Reinforcing Agentic Interleaved Generation

2026-06-11 · Dian Zheng, Harry Lee, Manyuan Zhang, Kaituo Feng, Zoey Guo, Ray Zhang, Hongsheng Li

General AI

Recent image generators have demonstrated impressive photorealism and instruction-following capabilities in single-image generation and editing. However, constrained by their architectures, they cannot achieve interleaved generation (text-image sequence), which has crucial applications in visual narratives, guidance, a…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.3

Consensus-based Agentic Large Language Model Framework for Harmonized Tariff Schedule Code Classification

2026-06-15 · Truong Thanh Hung Nguyen, Khanh Van Quynh Nguyen, Hoang-Loc Cao, Tri Duong, Phuc Ho, Van Pham, Loc Nguyen, Hung Cao

General AI

Accurate Harmonized Tariff Schedule (HTS) code classification is essential for customs clearance, duty assessment, trade statistics, and regulatory compliance in maritime logistics. However, exact HTS classification remains challenging because product descriptions are often short, incomplete, or ambiguous, while correc…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.3

UBP2: Uncertainty-Balanced Preference Planning for Efficient Preference-based Reinforcement Learning

2026-06-17 · Mohamed Nabail, Leo Cheng, Jingmin Wang, Nicholas Rhinehart

General AI

Preference-based RL provides an approach to learning reward models from pairwise comparisons of behaviors, bypassing the need for explicit reward design. However, existing methods typically rely on passive data collection and suffer from poor sample efficiency, especially during the early stages of learning. We introdu…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.2

SHERLOC: Structured Diagnostic Localization for Code Repair Agents

2026-06-23 · Hovhannes Tamoyan, Sean Narenthiran, Erik Arakelyan, Mira Mezini, Boris Ginsburg

General AI

LLM agents solve repository-level coding tasks through multi-turn tool use, but utilize half their budget on locating faults before editing. Dedicated localization frameworks have emerged, yet are still evaluated as file retrieval rather than actionable diagnosis, producing locations without the diagnostic context a re…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.2

UniDrive: A Unified Vision-Language and Grounding Framework for Interpretable Risk Understanding in Autonomous Driving

2026-06-23 · Xiaowei Gao, Pengxiang Li, Yitai Cheng, Ruihan Xu, James Haworth, Stephen Law, Yun Ye

General AI

Recent multimodal large language models (MLLMs) have shown strong potential for autonomous driving scene understanding, yet existing methods still face a fundamental trade-off between temporal reasoning and spatial precision. Models that rely on single-frame or low-resolution inputs often miss small, distant, or partia…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.0

ColorBrowserAgent: Complex Long-Horizon Browser Agent with Adaptive Knowledge Evolution

2026-01-12 · Jihong Wang, Jiamu Zhou, Weiming Zhang, Weiwen Liu, Zhuosheng Zhang, Xingyu Lou, Weinan Zhang, Huarong Deng, Jun Wang

Research Track B · General AI

With the advancement of vision-language models, web automation has made significant progress. However, deploying autonomous agents in real-world settings remains challenging, primarily due to site heterogeneity, where generalist models lack domain-specific priors for diverse interfaces, and long-horizon instability, ch…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.0

Ares: Adaptive Reasoning Effort Selection for Efficient LLM Agents

2026-03-09 · Jingbo Yang, Bairu Hou, Wei Wei, Yujia Bao, Shiyu Chang

Research Track B · General AI

Modern agents powered by thinking LLMs achieve high accuracy through long chain-of-thought reasoning but incur substantial inference costs. While many LLMs now support configurable reasoning levels (e.g., high/medium/low), static strategies are often ineffective: using low-effort modes at every step leads to significan…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.0

Poison Once, Exploit Forever: Environment-Injected Memory Poisoning Attacks on Web Agents

2026-04-03 · Wei Zou, Mingwen Dong, Miguel Romero Calvo, Shuaichen Chang, Jiang Guo, Dongkyu Lee, Xing Niu, Xiaofei Ma, Yanjun Qi, Jiarong Jiang

Research Track B · General AI

Memory makes LLM-based web agents personalized, powerful, yet exploitable. By storing past interactions to personalize future tasks, agents inadvertently create a persistent attack surface that spans websites and sessions. While existing security research on memory assumes attackers can directly inject into memory stor…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.0

CoMemNet: Contrastive Sampling with Memory Replay Network for Continual Traffic Prediction

2026-05-07 · Mei Wu, Wenchao Weng, Wenxin Su, Wenjie Tang, Wei Zhou

Research Track A · General AI

In recent years, the integration of non-topological space modeling with temporal learning methods has emerged as an effective approach for capturing spatio-temporal information in non-Euclidean graphs. However, most existing methods rely on static underlying graph structures, which are inadequate for capturing the cont…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.0

Consolidation-Expansion Operator Mechanics:A Unified Framework for Adaptive Learning

2026-05-11 · Debashis Guha

Research Track A · General AI

Every adaptive learning system must alternate between two operations: consolidating what it already knows and expanding into new evidence. We propose \emph{Consolidation-Expansion Operator Mechanics} (OpMech), a framework that makes this structure precise. The central object is the \emph{order-gap} $\Ogap(θ; e)$, the d…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.0

LabOSBench: Benchmarking Computer Use Agents for Scientific Instrument Control

2026-06-15 · Anqi Zou, Han Deng, Chengyu Zhang, Junquan Hu, Yu Wang, Yuxiang Xing, Aokai Zhang, Hanling Zhang, Zhaoyang Liu, Ben Fei, Zhihui Wang, Wanli Ouyang

Research Track B · General AI

Current computer-use benchmarks primarily focus on software operation tasks in virtualized systems, whereas scientific instrumentation scenarios require coordinated control over complex interfaces, and feedback-driven parameter adjustment. However, directly evaluating agents on physical high-precision instruments is im…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 18.0

EVAF: A Test-Retest Protocol for Selective Parametric Consolidation

2026-06-29 · Haoliang Han

Research Track A · General AI

Long-running language agents need mechanisms for deciding which experiences should persist after the working context is gone. Retrieval systems can reinsert past text, but they do not by themselves show that an experience has been selectively consolidated into the model's own behavior. We introduce EVAF, an Echo-Valenc…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.9

Forget to Improve: On-Device LLM-Agent Continual Learning via Budget-Curated Memory

2026-06-23 · Beining Wu, Zihao Ding, Jun Huang, Yanxiao Zhao

Research Track A · General AI

On-device language-model agents improve by accumulating experience in retrieved memory rather than by updating weights. This memory is hard-bounded and exposed: it consumes RAM and energy, reaches peers through a thin uplink, and becomes an attack surface because it is writable by what the agent reads. Existing systems…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.8

FT-RAG: A Fine-grained Retrieval-Augmented Generation Framework for Complex Table Reasoning

2026-05-02 · Zebin Guo, Weidong Geng, Ruichen Mao

General AI

Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by grounding responses in external knowledge during inference. However, conventiona RAG systems under-perform on structured tabular data, largely due to coarse retrieval granularity and insufficient table semantic comprehension. To address these…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.8

Beyond Localization: A Comprehensive Diagnosis of Perspective-Conditioned Spatial Reasoning in MLLMs from Omnidirectional Images

2026-05-12 · Yuangong Chen, Wai Keung Wong, Jiaxing Li, Ioannis Patras, Xu Zheng

General AI

Multimodal Large Language Models (MLLMs) show strong visual perception, yet remain limited in reasoning about space under changing viewpoints. We study this challenge as Perspective-Conditioned Spatial Reasoning (PCSR) in 360-degree omnidirectional images, where broad scene coverage reduces ambiguity from partial obser…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.8

KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference

2026-05-12 · Alireza Nadali, Patrick Cooper, Ashutosh Trivedi, Alvaro Velasquez

General AI

We introduce KV-Fold, a simple, training-free long-context inference protocol that treats the key-value (KV) cache as the accumulator in a left fold over sequence chunks. At each step, the model processes the next chunk conditioned on the accumulated cache, appends the newly produced keys and values, and passes the enl…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.8

PhotoFlow: Agentic 3D Virtual Photography Missions

2026-05-22 · Jiarui Guo, Haojia Wei, Yiming Zhang, Yifei Liu, Yuning Gong, Hongjie Zhang, Xue Yang, Zhihang Zhong

General AI

Virtual photography asks an agent to enter a prepared 3D scene with no preselected camera pose or reference image, infer a suitable shot from scene information and a language intent, choose executable camera parameters, and render the final photograph. Recent progress in vision-language models makes this kind of spatia…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.8

Rethinking Memory as Continuously Evolving Connectivity

2026-05-27 · Jizhan Fang, Buqiang Xu, Zhixian Wang, Haoliang Cao, Xinle Deng, Baohua Dong, Hangcheng Zhu, Ruohui Huang, Gang Yu, Ying Wei, Guozhou Zheng, Feiyu Xiong, Haofen Wang, Huajun Chen, Ningyu Zhang

Research Track A · General AI

Existing memory-augmented LLM agents often treat memory as a static repository with pre-defined representations and fixed retrieval pipelines, which is brittle in dynamic agentic environments where feedback, task variation, and heterogeneous signals continuously reshape what should be remembered and how it should be co…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.8

RoboWits: Unexpected Challenges for Robotic Creative Problem Solving

2026-05-28 · Chunru Lin, Hongxin Zhang, Fenghao Yu, Zhehuan Chen, Thomas L. Griffiths, Yejin Choi, David Held, Chuang Gan

General AI

The ability to reason, adapt, and creatively solve problems under unexpected challenges is essential for robots operating in real-world environments. However, current robotic benchmarks primarily emphasize skill-level execution and provide limited insight into such cognitive reasoning capabilities. We introduce RoboWit…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.8

Geometry-Preserving Orthonormal Initialization for Low-Rank Adaptation in RLVR

2026-06-30 · Ruijia Zhang, Jiacheng Zhu, Hanqing Zhu, Laixi Shi

General AI

Low-rank adaptation (LoRA) and its variants enable parameter-efficient fine-tuning of large language models under the supervised fine-tuning (SFT) paradigm. However, their efficacy and behavior under Reinforcement learning with verifiable rewards (RLVR) are less well understood. In particular, two structurally initiali…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 17.7

VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation

2026-04-23 · Qijun Han, Haoqin Tu, Zijun Wang, Haoyue Dai, Yiyang Zhou, Nancy Lau, Alvaro A. Cardenas, Yuhui Xu, Ran Xu, Caiming Xiong, Zeyu Zheng, Huaxiu Yao, Yuyin Zhou, Cihang Xie

Research Track B · General AI

Autonomous GUI agents face two fundamental challenges: early stopping, where agents prematurely declare success without verifiable evidence, and repetitive loops, where agents cycle through the same failing actions without recovery. We present VLAA-GUI, a modular GUI agentic framework built around three integrated comp…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.6

DemoPSD: Disagreement-Modulated Policy Self-Distillation

2026-07-02 · Yunhe Li, Hao Shi, Wenhao Liu, Mengzhe Ruan, Hanxu Hou, Zhongxiang Dai, Shuang Qiu, Linqi Song

General AI

On-policy self-distillation (OPSD) has emerged as a practical method for training large language models (LLMs) to reason, where a single model acts as both the teacher and the student with different levels of information access. However, recent studies have found that the teacher's dense token-level supervision, condit…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.6

VisionAId: An Offline-First Multimodal Android Assistant for People with Visual Impairment, Featuring Personalized Object Retrieval

2026-07-02 · Cristian-Gabriel Florea, Stelian Spînu

General AI

Over 285 million people worldwide live with a visual impairment, for whom everyday tasks such as avoiding obstacles, locating personal belongings, recognizing familiar faces, or handling cash remain persistent obstacles to personal autonomy. Existing assistive applications are typically limited to recognizing predefine…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.5

All-day Multi-scenes Lifelong Vision-and-Language Navigation with Tucker Adaptation

2026-03-15 · Xudong Wang, Gan Li, Zhiyu Liu, Yao Wang, Lianqing Liu, Zhi Han

Research Track A · General AI

Deploying vision-and-language navigation (VLN) agents requires adaptation across diverse scenes and environments, but fine-tuning on a specific scenario often causes catastrophic forgetting in others, which severely limits flexible long-term deployment. We formalize this challenge as the all-day multi-scenes lifelong V…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 17.5

Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math

2026-03-26 · Dingjie Song, Tianlong Xu, Yi-Fan Zhang, Hang Li, Zhiling Yan, Xing Fan, Haoyang Li, Lichao Sun, Qingsong Wen

General AI

Assessing student handwritten scratchwork is crucial for personalized educational feedback but presents unique challenges due to diverse handwriting, complex layouts, and varied problem-solving approaches. Existing educational NLP primarily focuses on textual responses and neglects the complexity and multimodality inhe…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 17.5

MuSEAgent: A Multimodal Reasoning Agent with Stateful Experiences

2026-03-29 · Shijian Wang, Jiarui Jin, Runhao Fu, Zexuan Yan, Xingjian Wang, Mengkang Hu, Eric Wang, Xiaoxi Li, Kangning Zhang, Li Yao, Wenxiang Jiao, Xuelian Cheng, Yuan Lu, Zongyuan Ge

General AI

Research agents have recently achieved significant progress in information seeking and synthesis across heterogeneous textual and visual sources. In this paper, we introduce MuSEAgent, a multimodal reasoning agent that enhances decision-making by extending the capabilities of research agents to discover and leverage st…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 17.5

PRBench: End-to-end Paper Reproduction in Physics Research

2026-03-29 · Shi Qiu, Junyi Deng, Yiwei Deng, Haoran Dong, Jieyu Fu, Mao Li, Zeyu Li, Zhaolong Zhang, Huiwen Zheng, Leidong Bao, Anqi Lv, Zihan Mo, Yadi Niu, Yiyang Peng, Yu Tian, Yili Wang, Ziyu Wang, Zi-Yu Wang, Jiashen Wei, Liuheng Wu, Aoran Xue, Leyi Yang, Guanglu Yuan, Xiarui Zhan, Jingjun Zhang, Zifan Zheng, Pengfei Liu, Linrui Zhen, Kaiyang Li, Qichang Li, Ziheng Zhou, Guo-En Nian, Yunwei Xiao, Qing-Hong Cao, Linjie Dai, Xu Feng, Peng Gao, Ying Gu, Chang Liu, Jia Liu, Ming-xing Luo, Yan-Qing Ma, Liang-You Peng, Huichao Song, Shufeng Wang, Chenxu Wang, Tao Wang, Yi-Nan Wang, Chengyin Wu, Pengwei Zhao, Hua Xing Zhu

General AI

AI agents powered by large language models exhibit strong reasoning and problem-solving capabilities, enabling them to assist scientific research tasks such as formula derivation and code generation. However, whether these agents can reliably perform end-to-end reproduction from real scientific papers remains an open q…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 17.5

3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding

2026-04-09 · Makanjuola Ogunleye, Eman Abdelrahman, Ismini Lourentzou

General AI

Large multimodal models are increasingly used as the reasoning core of embodied agents operating in 3D environments, yet they remain prone to hallucinations that can produce unsafe and ungrounded decisions. Existing inference-time hallucination mitigation methods largely target 2D vision-language settings and do not tr…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 17.5

Towards Autonomous Mechanistic Reasoning in Virtual Cells

2026-04-14 · Yunhui Jang, Lu Zhu, Jake Fawkes, Alisandra Kaye Denton, Dominique Beaini, Emmanuel Noutahi

General AI

Large language models (LLMs) have recently gained significant attention as a promising approach to accelerate scientific discovery. However, their application in open-ended scientific domains such as biology remains limited, primarily due to the lack of factually grounded and actionable explanations. To address this, w…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 17.5

Mind's Eye: A Benchmark of Visual Abstraction, Transformation and Composition for Multimodal LLMs

2026-04-17 · Rohit Sinha, Aditya Kanade, Sai Srinivas Kancheti, Vineeth N Balasubramanian, Tanuja Ganu

General AI

Multimodal large language models (MLLMs) have achieved impressive progress on vision language benchmarks, yet their capacity for visual cognitive and visuospatial reasoning remains less understood. We introduce "Mind's Eye", a multiple-choice benchmark of eight visuo-cognitive tasks inspired by classic human intelligen…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 17.5

Exploring Spatial Intelligence from a Generative Perspective

2026-04-22 · Muzhi Zhu, Shunyao Jiang, Huanyi Zheng, Zekai Luo, Hao Zhong, Anzhou Li, Kaijun Wang, Jintao Rong, Yang Liu, Hao Chen, Tao Lin, Chunhua Shen

General AI

Spatial intelligence is essential for multimodal large language models, yet current benchmarks largely assess it only from an understanding perspective. We ask whether modern generative or unified multimodal models also possess generative spatial intelligence (GSI), the ability to respect and manipulate 3D spatial cons…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 17.5

WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning

2026-04-22 · Juyong Jiang, Chenglin Cai, Chansung Park, Jiasi Shen, Sunghun Kim, Jianguo Li, Yue Wang

General AI

While Large Language Models (LLMs) excel at function-level code generation, project-level tasks such as generating functional and visually aesthetic multi-page websites remain highly challenging. Existing works are often limited to single-page static websites, while agentic frameworks typically rely on multi-turn execu…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.5

When Continual Learning Moves to Memory: A Study of Experience Reuse in LLM Agents

2026-04-29 · Qisheng Hu, Quanyu Long, Wenya Wang

Research Track A · General AI

Memory-augmented LLM agents offer an appealing shortcut to continual learning: rather than updating model parameters, they accumulate experience in external memory, seemingly sidestepping the stability-plasticity dilemma of parametric learning. We show that this challenge does not disappear but resurfaces at the memory…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 17.5

Heterogeneous Scientific Foundation Model Collaboration

2026-04-30 · Zihao Li, Jiaru Zou, Feihao Fang, Xuying Ning, Mengting Ai, Tianxin Wei, Sirui Chen, Xiyuan Yang, Jingrui He

General AI

Agentic large language model systems have demonstrated strong capabilities. However, their reliance on language as the universal interface fundamentally limits their applicability to many real-world problems, especially in scientific domains where domain-specific foundation models have been developed to address special…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 17.5

AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints

2026-06-04 · Jiayu Liu, Cheng Qian, Zhenhailong Wang, Bingxuan Li, Jiateng Liu, Heng Wang, Jeonghwan Kim, Yumeng Wang, Xiusi Chen, Yi R. Fung, Heng Ji

General AI

Planning for real-world problems by language models often involves both world and user constraints, which may not be fully specified upfront and are progressively disclosed through interaction. However, existing benchmarks still underexplore adaptive planning under such progressively revealed dual constraints. To addre…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 17.5

ToolSense: A Diagnostic Framework for Auditing Parametric Tool Knowledge in LLMs

2026-06-04 · Ashutosh Hathidara, Sai Shruthi Sistla, Sebastian Schreiber, Sahil Bansal

General AI

Large language models deployed as agents over large tool catalogs face a critical tool-retrieval bottleneck. As embedding-based retrieval approaches rely on compact encoders that may under-capture specialized tool semantics, parametric tool retrieval addresses this by encoding each tool as a virtual token appended to t…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 17.5

Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory

2026-06-08 · Haoran Sun, Wenjie Li, Yujie Zhang, Zekai Lin, Fanrui Zhang, Kaitao Chen, Xingqi He, Yichen Li, Mianxin Liu, Lei Liu, Yankai Jiang

General AI

Medical agent systems are increasingly expected to support interactive clinical decision making rather than only static question answering. In such settings, effective agents must reuse prior experience across evolving cases, yet existing memory mechanisms often retain raw historical traces that are redundant, noisy, a…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 17.5

Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short

2026-06-08 · Han Zhou, Adam X. Yang, Laurence Aitchison, Anna Korhonen, Albert Q. Jiang

General AI

Reinforcement learning with verifiable rewards (RLVR) has become a leading paradigm for improving the reasoning ability of large language models through outcome-based supervision. However, verifiable rewards frequently become uninformative at the group level: when all sampled traces of a given prompt receive identical …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.5

Amnesia: A Stealthy Replay Attack on Continual Learning Dreams

2026-06-10 · Ahmed Sharshar, Naveen Kumar Kummari, Mohsen Guizani

Research Track A · General AI

Continual learning (CL) models often use experience replay to reduce catastrophic forgetting, but their robustness to replay sampling interference remains underexplored. Existing CL attacks alter inputs or training pipelines (poisoning/backdoors) and rarely include explicit auditable constraints, limiting realism. Here…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 17.5

LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling

2026-06-16 · Jian Yang, Shawn Guo, Wei Zhang, Tianyu Zheng, Yaxin Du, Haau-Sing Li, Jiajun Wu, Yue Song, Yan Xing, Qingsong Cai, Zelong Huang, Chuan Hao, Ran Tao, Xianglong Liu, Wayne Xin Zhao, Mingjie Tang, Weifeng Lv, Ming Zhou, Bryan Dai

General AI

Looped Transformers scale latent computation by repeatedly applying shared blocks, but sequential looping increases latency and KV-cache memory with the loop count. Parallel loop Transformers (PLT) alleviate this cost through cross-loop position offsets (CLP) and shared-KV gated sliding-window attention, making loop co…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.5

Gradient-Free Warm-Start Library Recovery: an Amortized-Regret Separation

2026-06-19 · Jianwei Lou

Research Track A · General AI

Continual learning that is gradient-free, local, online, and append-only is attractive for edge and streaming deployment, but its value is usually argued informally. We give a provable account on recurring-regime streams. Given segmentation, a warm-start library learner attains amortized recovery cost $O\!\big(KD/\vare…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

Colon-Bench: An Agentic Workflow for Scalable Dense Lesion Annotation in Full-Procedure Colonoscopy Videos

2026-03-26 · Abdullah Hamdi, Changchun Yang, Xin Gao

General AI

Early screening via colonoscopy is critical for colon cancer prevention, yet developing robust AI systems for this domain is hindered by the lack of densely annotated, long-sequence video datasets. Existing datasets predominantly focus on single-class polyp detection and lack the rich spatial, temporal, and linguistic …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

Is Mathematical Problem-Solving Expertise in Large Language Models Associated with Assessment Performance?

2026-03-26 · Liang Zhang, Yu Fu, Xinyi Jin

General AI

Large Language Models (LLMs) are increasingly used in math education not only as problem solvers but also as assessors of learners' reasoning. However, it remains unclear whether stronger math problem-solving ability is associated with stronger step-level assessment performance. This study examines that relationship us…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

LanteRn: Latent Visual Structured Reasoning

2026-03-26 · André G. Viveiros, Nuno Gonçalves, Matthias Lindemann, André Martins

General AI

While language reasoning models excel in many tasks, visual reasoning remains challenging for current large multimodal models (LMMs). As a result, most LMMs default to verbalizing perceptual content into text, a strong limitation for tasks requiring fine-grained spatial and visual understanding. While recent approaches…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

SlotVTG: Object-Centric Adapter for Generalizable Video Temporal Grounding

2026-03-26 · Jiwook Han, Geo Ahn, Youngrae Kim, Jinwoo Choi

General AI

Multimodal Large Language Models (MLLMs) have shown strong performance on Video Temporal Grounding (VTG). However, their coarse recognition capabilities are insufficient for fine-grained temporal understanding, making task-specific fine-tuning indispensable. This fine-tuning causes models to memorize dataset-specific s…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

2026-03-26 · Jingwei Ni, Yihao Liu, Xinpeng Liu, Yutao Sun, Mengyu Zhou, Pengyu Cheng, Dexin Wang, Xiaoxi Jiang, Guanjun Jiang

General AI

Equipping Large Language Model (LLM) agents with domain-specific skills is critical for tackling complex tasks. Yet, manual authoring creates a severe scalability bottleneck. Conversely, automated skill generation often yields fragile or fragmented results because it either relies on shallow parametric knowledge or seq…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

ResAdapt: Adaptive Resolution for Efficient Multimodal Reasoning

2026-03-30 · Huanxuan Liao, Zhongtao Jiang, Yupu Hao, Yuqiao Tan, Shizhu He, Jun Zhao, Kun Xu, Kang Liu

General AI

Multimodal Large Language Models (MLLMs) achieve stronger visual understanding by scaling input fidelity, yet the resulting visual token growth makes jointly sustaining high spatial resolution and long temporal context prohibitive. We argue that the bottleneck lies not in how post-encoding representations are compresse…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

EC-Bench: Enumeration and Counting Benchmark for Ultra-Long Videos

2026-03-31 · Fumihiko Tsuchiya, Taiki Miyanishi, Mahiro Ukai, Nakamasa Inoue, Shuhei Kurita, Yusuke Iwasawa, Yutaka Matsuo

General AI

Counting in long videos remains a fundamental yet underexplored challenge in computer vision. Real-world recordings often span tens of minutes or longer and contain sparse, diverse events, making long-range temporal reasoning particularly difficult. However, most existing video counting benchmarks focus on short clips …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

DIRECT: Video Mashup Creation via Hierarchical Multi-Agent Planning and Intent-Guided Editing

2026-04-06 · Ke Li, Maoliang Li, Jialiang Chen, Jiayu Chen, Zihao Zheng, Shaoqi Wang, Xiang Chen

General AI

Video mashup creation represents a complex video editing paradigm that recomposes existing footage to craft engaging audio-visual experiences, demanding intricate orchestration across semantic, visual, and auditory dimensions and multiple levels. However, existing automated editing frameworks often overlook the cross-l…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

2026-04-06 · Shuai Liu, Shulin Tian, Kairui Hu, Yuhao Dong, Zhe Yang, Bo Li, Jingkang Yang, Chen Change Loy, Ziwei Liu

General AI

Coworking AI agents operating within local file systems are rapidly emerging as a paradigm in human-AI interaction; however, effective personalization remains limited by severe data constraints, as strict privacy barriers and the difficulty of jointly collecting multimodal real-world traces prevent scalable training an…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

ACE-Bench: Agent Configurable Evaluation with Scalable Horizons and Controllable Difficulty under Lightweight Environments

2026-04-07 · Wang Yang, Chaoda Song, Xinpeng Li, Debargha Ganguly, Chuang Ma, Shouren Wang, Zhihao Dou, Yuli Zhou, Vipin Chaudhary, Xiaotian Han

General AI

Existing Agent benchmarks suffer from two critical limitations: high environment interaction overhead (up to 41\% of total evaluation time) and imbalanced task horizon and difficulty distributions that make aggregate scores unreliable. To address these issues, we propose ACE-Bench built around a unified grid-based plan…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents

2026-04-07 · Bowen Ye, Rang Li, Qibin Yang, Yuanxin Liu, Linli Yao, Hanglong Lv, Zhihui Xie, Chenxin An, Lei Li, Lingpeng Kong, Qi Liu, Zhifang Sui, Tong Yang

General AI

Large language models are increasingly deployed as autonomous agents executing multi-step workflows in real-world software environments. However, existing agent benchmarks suffer from three critical limitations: (1) trajectory-opaque grading that checks only final outputs, (2) underspecified safety and robustness evalu…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

Scientific Graphics Program Synthesis via Dual Self-Consistency Reinforcement Learning

2026-04-07 · Juekai Lin, Yun Zhu, Honglin Lin, Sijing Li, Tianwei Lin, Zheng Liu, Xiaoyang Wang, Wenqiao Zhang, Lijun Wu

General AI

Graphics Program Synthesis is pivotal for interpreting and editing visual data, effectively facilitating the reverse-engineering of static visuals into editable TikZ code. While TikZ is the de facto standard for scientific schematics due to its programmatic flexibility, its requirement for rigorous spatial precision pr…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

MolmoWeb: Open Visual Web Agent and Open Data for the Open Web

2026-04-09 · Tanmay Gupta, Piper Wolters, Zixian Ma, Peter Sushko, Rock Yuren Pang, Diego Llanes, Yue Yang, Taira Anderson, Boyuan Zheng, Zhongzheng Ren, Harsh Trivedi, Taylor Blanton, Caleb Ouellette, Winson Han, Ali Farhadi, Ranjay Krishna

Research Track B · General AI

Web agents--autonomous systems that navigate and execute tasks on the web on behalf of users--have the potential to transform how people interact with the digital world. However, the most capable web agents today rely on proprietary models with undisclosed training data and recipes, limiting scientific understanding, r…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

Visually-grounded Humanoid Agents

2026-04-09 · Hang Ye, Xiaoxuan Ma, Fan Lu, Wayne Wu, Kwan-Yee Lin, Yizhou Wang

General AI

Digital human generation has been studied for decades and supports a wide range of real-world applications. However, most existing systems are passively animated, relying on privileged state or scripted control, which limits scalability to novel environments. We instead ask: how can digital humans actively behave using…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

CodaRAG: Connecting the Dots with Associativity Inspired by Complementary Learning

2026-04-12 · Cheng-Yen Li, Xuanjun Chen, Claire Lin, Wei-Yu Chen, Wenhua Nie, Hung-Yi Lee, Jyh-Shing Roger Jang

Research Track A · General AI

Large Language Models (LLMs) struggle with knowledge-intensive tasks due to hallucinations and fragmented reasoning over dispersed information. While Retrieval-Augmented Generation (RAG) grounds generation in external sources, existing methods often treat evidence as isolated units, failing to reconstruct the logical c…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

Solving Physics Olympiad via Reinforcement Learning on Physics Simulators

2026-04-13 · Mihir Prabhudesai, Aryan Satpathy, Yangmin Li, Zheyang Qin, Nikash Bhardwaj, Amir Zadeh, Chuan Li, Katerina Fragkiadaki, Deepak Pathak

General AI

We have witnessed remarkable advances in LLM reasoning capabilities with the advent of DeepSeek-R1. However, much of this progress has been fueled by the abundance of internet question-answer (QA) pairs, a major bottleneck going forward, since such data is limited in scale and concentrated mainly in domains like mathem…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

Synthius-Mem: Brain-Inspired Hallucination-Resistant Persona Memory Achieving 94.4% Memory Accuracy and 99.6% Adversarial Robustness on LoCoMo

2026-04-13 · Artem Gadzhiev, Andrew Kislov

General AI

Providing AI agents with reliable long-term memory that does not hallucinate remains an open problem. Current approaches to memory for LLM agents -- sliding windows, summarization, embedding-based RAG, and flat fact extraction -- each reduce token cost but introduce catastrophic information loss, semantic drift, or unc…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

Boosting Visual Instruction Tuning with Self-Supervised Guidance

2026-04-14 · Sophia Sirko-Galouchenko, Monika Wysoczanska, Andrei Bursuc, Nicolas Thome, Spyros Gidaris

General AI

Multimodal large language models (MLLMs) perform well on many vision-language tasks but often struggle with vision-centric problems that require fine-grained visual reasoning. Recent evidence suggests that this limitation arises not from weak visual representations, but from under-utilization of visual information duri…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

WebAgentGuard: A Reasoning-Driven Guard Model for Detecting Prompt Injection Attacks in Web Agents

2026-04-14 · Yulin Chen, Tri Cao, Haoran Li, Yue Liu, Yibo Li, Yufei He, Le Minh Khoi, Yangqiu Song, Shuicheng Yan, Bryan Hooi

Research Track B · General AI

Web agents powered by vision-language models (VLMs) enable autonomous interaction with web environments by perceiving and acting on both visual and textual webpage content to accomplish user-specified tasks. However, they are highly vulnerable to prompt injection attacks, where adversarial instructions embedded in HTML…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

AutoPKG: An Automated Framework for Dynamic E-commerce Product-Attribute Knowledge Graph Construction

2026-04-18 · Pollawat Hongwimol, Haoning Shang, Chutong Wang, Zhichao Wan, Yi Gao, Yuanming Li, Lin Gui, Wenhao Sun, Cheng Yu

Research Track A · General AI

Product attribute extraction in e-commerce is bottlenecked by ontologies that are inconsistent, incomplete, and costly to maintain. We present AutoPKG, a multi-agent Large Language Model (LLM) framework that automatically constructs a Product-attribute Knowledge Graph (PKG) from multimodal product content. AutoPKG indu…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

Benchmarking System Dynamics AI Assistants: Cloud Versus Local LLMs on CLD Extraction and Discussion

2026-04-20 · Terry Leitch

General AI

We present a systematic evaluation of large language model families -- spanning both proprietary cloud APIs and locally-hosted open-source models -- on two purpose-built benchmarks for System Dynamics AI assistance: the \textbf{CLD Leaderboard} (53 tests, structured causal loop diagram extraction) and the \textbf{Discu…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

2026-04-20 · Shaden Alshammari, Kevin Wen, Abrar Zainal, Mark Hamilton, Navid Safaei, Sultan Albarakati, William T. Freeman, Antonio Torralba

General AI

Mathematical problem solving remains a challenging test of reasoning for large language and multimodal models, yet existing benchmarks are limited in size, language coverage, and task diversity. We introduce MathNet, a high-quality, large-scale, multimodal, and multilingual dataset of Olympiad-level math problems toget…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

WorldDB: A Vector Graph-of-Worlds Memory Engine with Ontology-Aware Write-Time Reconciliation

2026-04-20 · Harish Santhanalakshmi Ganesan

General AI

Persistent memory is the bottleneck separating stateless chatbots from long-running agentic systems. Retrieval-augmented generation (RAG) over flat vector stores fragments facts into chunks, loses cross-session identity, and has no first-class notion of supersession or contradiction. Recent bitemporal knowledge-graph s…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

Time Series Augmented Generation for Financial Applications

2026-04-21 · Anton Kolonin, Alexey Glushchenko, Evgeny Bochkov, Abhishek Saxena

General AI

Evaluating the reasoning capabilities of Large Language Models (LLMs) for complex, quantitative financial tasks is a critical and unsolved challenge. Standard benchmarks often fail to isolate an agent's core ability to parse queries and orchestrate computations. To address this, we introduce a novel evaluation methodol…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

Unveiling Fine-Grained Visual Traces: Evaluating Multimodal Interleaved Reasoning Chains in Multimodal STEM Tasks

2026-04-21 · Jing Jin, Hao Liu, Yan Bai, Yihang Lou, Zhenke Wang, Tianrun Yuan, Juntong Chen, Yongkang Zhu, Fanhu Zeng, Xuanyu Zhu, Yige Xu

General AI

Multimodal large language models (MLLMs) have shown promising reasoning abilities, yet evaluating their performance in specialized domains remains challenging. STEM reasoning is a particularly valuable testbed because it provides highly verifiable feedback, but existing benchmarks often permit unimodal shortcuts due to…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

Odysseys: Benchmarking Web Agents on Realistic Long Horizon Tasks

2026-04-27 · Lawrence Keunho Jang, Jing Yu Koh, Daniel Fried, Ruslan Salakhutdinov

Research Track B · General AI

Existing web agent benchmarks have largely converged on short, single-site tasks that frontier models are approaching saturation on. However, real world web use consists of long-horizon, multi-site workflows. Common web navigation tasks, such as comparing products across different domains, planning trips across multipl…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

FutureWorld: A Live Environment for Training Predictive Agents with Real-World Outcome Rewards

2026-04-29 · Zhixin Han, Yanzhi Zhang, Chuyang Wei, Maohang Gao, Xiawei Yue, Kefei Chen, Yu Zhuang, Haoxiang Guan, Jiyan He, Jian Li, Yitong Duan, Yu Shi, Mengting Hu, Shuxin Zheng

General AI

Live future prediction refers to the task of making predictions about real-world events before they unfold. This task is increasingly studied using large language model-based agent systems, and it is important for building agents that can continually learn from real-world. Just as interactive environments have often dr…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

AEGIS: A Holistic Benchmark for Evaluating Forensic Analysis of AI-Generated Academic Images

2026-04-30 · Bo Zhang, Tzu-Yen Ma, Zichen Tang, Junpeng Ding, Zirui Wang, Yizhuo Zhao, Peilin Gao, Zijie Xi, Zixin Ding, Haiyang Sun, Haocheng Gao, Yuan Liu, Liangjia Wang, Yiling Huang, Yujie Wang, Yuyue Zhang, Ronghui Xi, Yuanze Li, Jiacheng Liu, Zhongjun Yang, Haihong E

General AI

We introduce AEGIS, A holistic benchmark for Evaluating forensic analysis of AI-Generated academic ImageS. Compared to existing benchmarks, AEGIS features three key advances: (1) Domain-Specific Complexity: covering seven academic categories with 39 fine-grained subtypes, exposing intrinsic forensic difficulty, where e…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

SpecVQA: A Benchmark for Spectral Understanding and Visual Question Answering in Scientific Images

2026-04-30 · Jialu Shen, Han Lyu, Suyang Zhong, Hanzheng Li, Haoyi Tao, Nan Wang, Changhong Chen, Xi Fang

General AI

Spectra are a prevalent yet highly information-dense form of scientific imagery, presenting substantial challenges to multimodal large language models (MLLMs) due to their unstructured and domain-specific characteristics. Here we introduce SpecVQA, a professional scientific-image benchmark for evaluating multimodal mod…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

2026-05-01 · Siyuan Huang, Xiaoye Qu, Yafu Li, Tong Zhu, Zefeng He, Muxin Fu, Daizong Liu, Wei-Long Zheng, Yu Cheng

General AI

While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Dilution" phenomenon, where the accumulation of textual history expands the attention partition function, causing visual attention to decay inversely with generated sequence lengt…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

Bridging the Agent-World Gap: Text World Models for LLM-based Agents

2026-06-08 · Yixia Li, Hongru Wang, Peng Lai, Zhiwen Ruan, He Zhu, Youxin Zhu, Ganlong Zhao, Minda Hu, Yun Chen, Sibei Yang, Peng Li, Jeff Z. Pan, Jia Pan, Guanhua Chen, Yang Liu, Guanbin Li

General AI

Large language model (LLM)-based agents are increasingly used in interactive textual environments, from web navigation and code editing to tool use and long-horizon dialogue. Yet many remain largely reactive, mapping observations to actions without an explicit model of how these environments are structured and evolve. …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

HDSL: A Hierarchical Domain-Specific Language for Structured 3D Indoor Scene Generation and Localized Editing with LLM Agents

2026-06-08 · Letian Li, Chao Shen, Shuzhao Xie, Chenghao Gu, ZhengXiao He, Yu Meng, Xin Yang, Wenyuan Jiang, Zhi Wang

General AI

Text-driven indoor scene generation and editing require an intermediate representation that language models can both produce and revise. Existing LLM-based systems often rely on scene graphs or global constraint lists, which are compact but underspecify local geometry and make instruction-based edits difficult to local…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

2026-06-11 · Jundong Xu, Qingchuan Li, Jiaying Wu, Yihuai Lan, Shuyue Stella Li, Huichi Zhou, Bowen Jiang, Lei Wang, Jun Wang, Anh Tuan Luu, Caiming Xiong, Hae Won Park, Bryan Hooi, Zhiyuan Hu

General AI

Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. In contrast, real-world deployment is inherently dynamic, requiring agents to continually align their knowledge, skills, and behavior with changing environments and updated …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

Benchmarking LLM Agents on Meta-Analysis Articles from Nature Portfolio

2026-06-15 · Anzhe Xie, Weihang Su, Yujia Zhou, Yiqun Liu, Qingyao Ai

General AI

Meta-analysis is a demanding form of evidence synthesis that combines literature retrieval, PI/ECO-guided study selection, and statistical aggregation. Its structured, verifiable workflow makes it an ideal substrate for evaluating systematic scientific reasoning, yet existing benchmarks lack ground truth across the ful…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

DEEPRUBRIC: Evidence-Tree Rubric Supervision for Efficient Reinforcement Learning of Deep Research Agents

2026-06-15 · Minghang Zhu, Chuyang Wei, Junhao Xu, Yilin Cheng, Zhumin Chen, Jiyan He

General AI

Deep research agents synthesize long-form reports by searching and reasoning over retrieved evidence. Reinforcement learning with rubric-based rewards improves these agents by optimizing them against checkable criteria that translate report quality into reward signals, but its efficiency depends on whether those criter…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.3

What Should a Streaming Video Model Remember?

2026-06-15 · Haonan Ge, Yiwei Wang, Hang Wu, Yujun Cai

Research Track A · General AI

Streaming video understanding models must answer queries at any moment during an ongoing stream, using only what they have observed so far and under fixed memory and computation budgets. Existing methods address this by adding memory banks, retrieval modules, or visual token compression to preserve long-range history. …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 17.2

OSWorld2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks

2026-06-28 · Mengqi Yuan, Zilong Zhou, Xinzhuang Xiong, Weiming Wu, Jiayang Sun, Jiamin Song, Kaiqian Cui, Bowen Wang, Haoyuan Wu, Yitong Li, Dunjie Lu, Haikong Lu, Qi Zhen, Xinyuan Wang, Jiaqi Deng, Yuhao Yang, Cheng Chen, Boyuan Zheng, Alex Su, Xiao Yu, Hao Zou, Saaket Agashe, Xing Han Lu, Manpreet Kaur, Zhengyang Qi, Vincent Sunn Chen, Frederic Sala, Dayiheng Liu, Junyang Lin, Zhou Yu, Yu Su, Siva Reddy, Xin Eric Wang, Peng Qi, Tianbao Xie, Tao Yu

Research Track B · General AI

Existing computer-use benchmarks fail to capture the realism, complexity, and long-horizon demands of real-world computer use, limiting their ability to reveal the limitations of frontier agents. We introduce OSWorld 2.0, a benchmark of 108 long-horizon computer-use workflows across everyday and professional tasks, des…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 17.0

Balanced Aggregation: Understanding and Fixing Aggregation Bias in GRPO

2026-04-14 · Zhiyuan Zeng, Jiameng Huang, Zhangyue Yin, Jiashuo Liu, Ziniu Li, Bingrui Li, Yuhao Wu, Yining Zheng, Ge Zhang, Wenhao Huang, Xipeng Qiu

General AI

Reinforcement learning with verifiable rewards (RLVR) has become a central paradigm for improving reasoning and code generation in large language models, and GRPO-style training is widely adopted for its simplicity and effectiveness. However, an important design choice remains underexplored: how token-level policy grad…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.0

MCPO: Mastery-Consolidated Policy Optimization for Large Reasoning Models

2026-04-18 · Zhaokang Liao, Yingguo Gao, Yi Yang, Yongheng Hu, Jingting Ding

Research Track A · General AI

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising approach to improve the reasoning abilities of Large Language Models (LLMs). Among RLVR algorithms, Group Relative Policy Optimization (GRPO) and its variants have demonstrated strong performance and high training efficiency. However, GRPO…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.0

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

2026-04-24 · Meng Chu, Xuan Billy Zhang, Kevin Qinghong Lin, Lingdong Kong, Jize Zhang, Teng Tu, Weijian Ma, Ziqi Huang, Senqiao Yang, Wei Huang, Yeying Jin, Zhefan Rao, Jinhui Ye, Xinyu Lin, Xichen Zhang, Qisheng Hu, Shuai Yang, Leyang Shen, Wei Chow, Yifei Dong, Fengyi Wu, Quanyu Long, Bin Xia, Shaozuo Yu, Mingkang Zhu, Wenhu Zhang, Jiehui Huang, Haokun Gui, Haoxuan Che, Long Chen, Qifeng Chen, Wenxuan Zhang, Wenya Wang, Xiaojuan Qi, Yang Deng, Yanwei Li, Mike Zheng Shou, Zhi-Qi Cheng, See-Kiong Ng, Ziwei Liu, Philip Torr, Jiaya Jia

Research Track B · General AI

As AI systems move from generating text to accomplishing goals through sustained interaction, the ability to model environment dynamics becomes a central bottleneck. Agents that manipulate objects, navigate software, coordinate with others, or design experiments require predictive environment models, yet the term world…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.0

Forager: a lightweight testbed for continual learning with partial observability in RL

2026-05-01 · Steven Tang, Xinze Xiong, Anna Hakhverdyan, Andrew Patterson, Jacob Adkins, Jiamin He, Esraa Elelimy, Parham Mohammad Panahi, Martha White, Adam White

Research Track A · General AI

In continual reinforcement learning (CRL), good performance requires never-ending learning, acting, and exploration in a big, partially observable world. Most CRL experiments have focused on loss of plasticity -- the inability to keep learning -- in one-off experiments where some unobservable non-stationarity is added …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 17.0

Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation

2026-05-01 · Ziwen Zhao, Menglin Yang

General AI

Retrieval-augmented generation (RAG) enhances large language models with external knowledge, and tree-based RAG organizes documents into hierarchical indexes to support queries at multiple granularities. However, existing Tree-RAG methods designed for single-document retrieval face critical challenges in scaling to cro…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 17.0

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

2026-05-03 · Zhuofeng Li, Haoxiang Zhang, Cong Wei, Pan Lu, Ping Nie, Yi Lu, Yuyang Bai, Shangbin Feng, Hangxiao Zhu, Ming Zhong, Yuyu Zhang, Jianwen Xie, Yejin Choi, James Zou, Jiawei Han, Wenhu Chen, Jimmy Lin, Dongfu Jiang, Yu Zhang

General AI

Modern retrieval systems, whether lexical or semantic, expose a corpus through a fixed similarity interface that compresses access into a single top-k retrieval step before reasoning. This abstraction is efficient, but for agentic search, it becomes a bottleneck: exact lexical constraints, sparse clue conjunctions, loc…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 17.0

PhysicianBench: Evaluating LLM Agents in Real-World EHR Environments

2026-05-04 · Ruoqi Liu, Imran Q. Mohiuddin, Austin J. Schoeffler, Kavita Renduchintala, Ashwin Nayak, Prasantha L. Vemu, Shivam C. Vedak, Kameron C. Black, John L. Havlik, Isaac Ogunmola, Stephen P. Ma, Roopa Dhatt, Jonathan H. Chen

General AI

We introduce PhysicianBench, a benchmark for evaluating LLM agents on physician tasks grounded in real clinical setting within electronic health record (EHR) environments. Existing medical agent benchmarks primarily focus on static knowledge recall, single-step atomic actions, or action intent without verifiable execut…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.0

You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translation

2026-05-06 · Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera, Stjepan Picek, Saraga Sakthidharan

Research Track A · General AI

The open-source ecosystem has accelerated the democratization of Large Language Models (LLMs) through the public distribution of specialized Low-Rank Adaptation (LoRA) modules. However, integrating these third-party adapters often induces catastrophic forgetting of the base model's foundational safety alignment. Restor…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 17.0

GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models

2026-05-28 · Xiaohang Tang, Keyue Jiang, Che Liu, Qifang Zhao, Xiaoxiao Xu, Sangwoong Yoon, Ilija Bogunovic

General AI

Reinforcement learning (RL) can be used to improve the policy (denoiser) of diffusion large language models (dLLMs), while being hindered by the intractability of the policy likelihood. A dominant and efficient family of methods replaces the likelihood in standard RL with its evidence lower bound (ELBO), estimated from…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.0

Theoretical Foundations of Continual Learning via Drift-Plus-Penalty

2026-06-07 · Nazreen Shah, Govinda Arya, Bharath B. N., Ranjitha Prasad

Research Track A · General AI

In many real-world settings, data streams are nonstationary and arrive sequentially, requiring learning systems to adapt continuously without retraining from scratch. Continual learning (CL) addresses this challenge by incorporating new tasks while mitigating catastrophic forgetting, where learning new information degr…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.0

TaskFusion: Continual Anomaly Detection for Heterogeneous Tabular Data

2026-06-10 · Dayananda Herurkar, Federico Raue, Joachim Folz, Jörn Hees, Andreas Dengel

Research Track A · General AI

Continual anomaly detection in tabular data is challenging and remains largely underexplored, particularly in settings with heterogeneous feature schemas, distribution shifts, and severe class imbalance. In many real-world applications, data arrive sequentially from diverse domains, rendering conventional continual lea…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.0

Continual Backdoor Training in IoT/CPS

2026-06-12 · Oxana Salish, Kuniyilh S

Research Track A

Internet of Things (IoT) and Cyber-physical systems (CPS) increasingly rely on continual learning (CL) to adapt to evolving environments, device heterogeneity, and concept drift, thereby improving overall utility. While continual adaptation is essential for long-lived IoT deployments where data patterns evolve, it also…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.0

Task-Differentiated Atomic Skill Expansion and Routing for Continual Learning Across Highly Heterogeneous Tasks

2026-06-19 · Jiacheng Wang, Xinjia He, Qi Ding, Yutao Yang, Jie Zhou, Liyang Yu, Liang Dou, Qin Chen

Research Track A · General AI

Continual learning (CL) is commonly studied under the assumption that sequential tasks are semantically related or structurally similar. However, in highly heterogeneous settings, where tasks differ substantially in reasoning patterns and input-output formats, existing methods often suffer from catastrophic forgetting …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.8

Enhanced LLM Reasoning by Optimizing Reward Functions with Search-Driven Reinforcement Learning

2026-05-03 · Arash Ahmadi, Sarah Sharif, Yaser, Banad

General AI

Mathematical reasoning is a key benchmark for large language models. Reinforcement learning is a standard post-training mechanism for improving the reasoning capabilities of large language models, yet performance remains sensitive to the design of the reward function that drives policy optimization. This paper introduc…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.8

Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces

2026-05-04 · Chenchen Zhang

General AI

As large language model (LLM) agents evolve from isolated tool users into coordinated teams, reinforcement learning (RL) must optimize not only individual actions but also how work is spawned, delegated, communicated, aggregated, and stopped. This paper studies RL for LLM-based multi-agent systems through orchestration…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.8

Cited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agents

2026-05-07 · Hailey Onweller, Elias Lumer, Austin Huber, Pia Ramchandani, Vamse Kumar Subbiah, Corey Feld

General AI

Large language models (LLMs) power deep research agents that synthesize information from hundreds of web sources into cited reports, yet these citations cannot be reliably verified. Current approaches either trust models to self-cite accurately, risking bias, or employ retrieval-augmented generation (RAG) that does not…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.8

APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents

2026-05-20 · Yibo Li, Jiashuo Yang, Zhi Zheng, Zhiyuan Hu, Yuan Sui, Shizun Wang, Yufei He, Bryan Hooi

Research Track B · General AI

LLM agents have shown strong performance across a wide range of complex tasks, including interactive environments that require long-horizon decision making. But these agents cannot learn on the fly at test time. Self-evolving agents address this by accumulating memory and reflection across episodes rather than requirin…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.8

FashionLens: Toward Versatile Fashion Image Retrieval via Task-Adaptive Learning

2026-05-21 · Haokun Wen, Xuemeng Song, Xinghao Xie, Xiaolin Chen, Xiangyu Zhao, Weili Guan

General AI

Fashion image retrieval is a cornerstone of modern e-commerce systems. A unified framework that supports diverse query formats and search intentions is highly desired in practice. However, existing approaches focus on narrow retrieval tasks and do not fully capture such diversity. Therefore, in this work, we aim to dev…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.8

ChartFI: Benchmarking Faithfulness and Insightfulness of Chart Descriptions from Multimodal Large Language Models

2026-05-22 · Fen Wang, Zekai Shao, Qiman Kang, Chunran Hu, Zhixuan Zhang, Lexu Xie, Chao Liu, Siming Chen

General AI

Chart descriptions are essential for accessibility, cross-modal retrieval, and assisting readers in extracting insights from complex visualizations. As multimodal large language models (MLLMs) are increasingly adopted for automated chart description generation, a critical question arises: how faithfully and insightfull…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.8

DDX-TRACE: A Benchmark for Medical Diagnostic Trajectories in VLMs

2026-05-22 · Jiazhen Pan, Weixiang Shen, Jun Li, Julian Canisius, Felix Bitzer, Paula Roßmüller, Jiancheng Yang, Virginie Kreutzinger, Daniel Rueckert, Benedikt Wiestler

General AI

Medical diagnosis is not a single prediction from a fully specified vignette. It is a sequential workup: clinicians decide what evidence to obtain, revise a differential diagnosis, and stop when the diagnosis is sufficiently supported. Most medical AI benchmarks instead reveal the relevant context upfront and score onl…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.8

PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs

2026-05-22 · Rim Assouel, Amir Bar, Michal Drozdzal, Adriana Romero-Soriano

General AI

Despite remarkable progress in Multimodal Large Language Models (MLLMs), these models still struggle with fine-grained understanding tasks. In this work, we propose Procedurally Generated Tasks (PGT), a simple data-driven framework that serves a dual purpose: inducing fine-grained visual understanding and acting as a l…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.8

HEART-Bench: Do LLM Agents Exhibit Human-like Psychology?

2026-05-28 · Weihan Peng, Chenxu Zhang, Qianao Wang, Yuling Shi, Heng Lian, Qihong Mao, Jiahao Pang, Chunliang Feng, Bowen Li, Xiaodong Gu

General AI

While LLM agents have demonstrated remarkable task-oriented abilities such as planning, reasoning, and action, few works have treated them as complete human personalities where emotional dimensions hold equal importance. In this paper, we introduce a novel benchmark to systematically assess whether LLM agents can simul…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.8

Task-Focused Memorization for Multimodal Agents

2026-05-29 · Tao Zou, Yichen He, Tian Qiu, Yuan Lin, Hang Li

Research Track A · General AI

Long-term memory is essential for multimodal agents to build coherent experience, accumulate world knowledge, and achieve continual learning. However, constructing effective memory goes beyond memory module design and basic requirements such as accuracy and fidelity; the key challenge lies in determining what to memori…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.8

nuReasoning: A Reasoning-Centric Dataset and Benchmark for Long-Tail Autonomous Driving

2026-05-29 · Zhiyu Huang, Johnson Liu, Rui Song, Zewei Zhou, Ruining Yang, Yun Zhang, Tianhui Cai, Hanyin Zhang, Mingxuan Gao, Valeria Xu, Jiali Chen, Yishan Shen, Yiluan Guo, Tony, Qi, Jiaqi Ma

General AI

Reasoning is essential for autonomous driving (AD) in long-tail scenarios, where vehicles must apply commonsense knowledge, understand spatial relations, infer agent interactions, and make safe decisions. However, existing AD datasets and benchmarks mainly target perception, prediction, or planning, and provide limited…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 16.8

PACE: A Proxy for Agentic Capability Evaluation

2026-07-02 · Yueqi Song, Lintang Sutawika, Jiarui Liu, Lindia Tjuatja, Jiayi Geng, Yunze Xiao, Daniel Lee, Aditya Bharat Soni, Vincent Lo, Xiang Yue, Graham Neubig

General AI

Evaluating LLM agents on benchmarks like SWE-Bench and GAIA can be expensive, time-consuming, and requires complex infrastructure. A single evaluation can cost thousands of dollars and take days to complete. In contrast, non-agentic LLM benchmarks that test individual capabilities (e.g., reasoning, code generation) are…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.6

Reasoning LLM Improves Speaker Recognition in Long-form TV Dramas

2026-07-02 · Yuxuan Li, Lingxi Xie, Xinyue Huo, Jihao Qiu, Jiacheng Shao, Pengfei Chen, Jiannan Ge, Kaiwen Duan, Qi Tian

General AI

Long-form TV dramas present a formidable challenge for comprehensive video understanding, where deciphering complex storyline often relies on \textbf{speaker recognition}, the task of accurately attributing each spoken utterance to its respective character. In this paper, we advance this field through two primary contr…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.6

Seek to Segment: Active Perception for Panoramic Referring Segmentation

2026-07-02 · Song Tang, Shuming Hu, Xincheng Shuai, Henghui Ding, Yu-Gang Jiang

General AI

Existing referring segmentation models passively process static images captured from fixed perspectives, limiting their applicability in Embodied AI, where agents must perform active perception in the continuous 360$^\circ$ environments. To bridge this gap, we introduce a novel task: Active Panoramic Referring Segmenta…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.6

Will Scaling Improve Social Simulation with LLMs?

2026-07-02 · Caleb Ziems, William Held, Su Doga Karaca, David Grusky, Tatsunori Hashimoto, Diyi Yang

General AI

Large Language Model (LLM) social simulations are a promising research method, but they are not yet faithful enough to be adopted widely. In this work, we investigate whether the current scaling paradigm in language modeling is likely to close these gaps, or whether simulation fidelity is orthogonal to general capabili…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 16.5

Web Retrieval-Aware Chunking (W-RAC) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems

2026-01-08 · Uday Allu, Sonu Kedia, Tanmay Odapally, Biddwan Ahmed

General AI

Retrieval-Augmented Generation (RAG) systems critically depend on effective document chunking strategies to balance retrieval quality, latency, and operational cost. Traditional chunking approaches, such as fixed-size, rule-based, or fully agentic chunking, often suffer from high token consumption, redundant text gener…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 16.5

MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution

2026-03-19 · Minhua Lin, Zhiwei Zhang, Hanqing Lu, Hui Liu, Xianfeng Tang, Qi He, Xiang Zhang, Suhang Wang

General AI

Memory-augmented LLM agents maintain external memory banks to support long-horizon interaction, yet most existing systems treat construction, retrieval, and utilization as isolated subroutines. This creates two coupled challenges: strategic blindness on the forward path of the memory cycle, where construction and retri…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 16.5

Signals: Trajectory Sampling and Triage for Agentic Interactions

2026-04-01 · Shuguang Chen, Adil Hafeez, Salman Paracha

General AI

Agentic applications based on large language models increasingly rely on multi-step interaction loops involving planning, action execution, and environment feedback. While such systems are now deployed at scale, improving them post-deployment remains challenging. Agent trajectories are voluminous and non-deterministic,…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 16.5

GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers

2026-04-03 · Shufan Jiang, Chios Chen, Zhiyang Chen

General AI

The autonomous discovery of bugs remains a significant challenge in modern software development. Compared to code generation, the complexity of dynamic runtime environments makes bug discovery considerably harder for large language models (LLMs). In this paper, we take game development as a representative domain and in…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.5

SuperLocalMemory V3.3: The Living Brain -- Biologically-Inspired Forgetting, Cognitive Quantization, and Multi-Channel Retrieval for Zero-LLM Agent Memory Systems

2026-04-06 · Varun Pratap Bhardwaj

Research Track A · General AI

AI coding agents operate in a paradox: they possess vast parametric knowledge yet cannot remember a conversation from an hour ago. Existing memory systems store text in vector databases with single-channel retrieval, require cloud LLMs for core operations, and implement none of the cognitive processes that make human m…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.5

In-Place Test-Time Training

2026-04-07 · Guhao Feng, Shengjie Luo, Kai Hua, Ge Zhang, Di He, Wenhao Huang, Tianle Cai

Research Track A · General AI

The static ``train then deploy" paradigm fundamentally limits Large Language Models (LLMs) from dynamically adapting their weights in response to continuous streams of new information inherent in real-world tasks. Test-Time Training (TTT) offers a compelling alternative by updating a subset of model parameters (fast we…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 16.5

Faithful GRPO: Improving Visual Spatial Reasoning in Multimodal Language Models via Constrained Policy Optimization

2026-04-09 · Sai Srinivas Kancheti, Aditya Kanade, Rohit Sinha, Vineeth N Balasubramanian, Tanuja Ganu

General AI

Multimodal reasoning models (MRMs) trained with reinforcement learning with verifiable rewards (RLVR) show improved accuracy on visual reasoning benchmarks. However, we observe that accuracy gains often come at the cost of reasoning quality: generated Chain-of-Thought (CoT) traces are frequently inconsistent with the f…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 16.5

LPM 1.0: Video-based Character Performance Model

2026-04-09 · Ailing Zeng, Casper Yang, Chauncey Ge, Eddie Zhang, Garvey Xu, Gavin Lin, Gilbert Gu, Jeremy Pi, Leo Li, Mingyi Shi, Sheng Bi, Steven Tang, Thorn Hang, Tobey Guo, Vincent Li, Xin Tong, Yikang Li, Yuchen Sun, Yue, Zhao, Yuhan Lu, Yuwei Li, Zane Zhang, Zeshi Yang, Zi Ye

General AI

Performance, the externalization of intent, emotion, and personality through visual, vocal, and temporal behavior, is what makes a character alive. Learning such performance from video is a promising alternative to traditional 3D pipelines. However, existing video models struggle to jointly achieve high expressiveness,…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 16.5

You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass

2026-04-13 · Yinuo Yang, Zixian Ma, Manasi Ganti, Jieyu Zhang, Ranjay Krishna

General AI

We present a discriminative multimodal reward model that scores all candidate responses in a single forward pass. Conventional discriminative reward models evaluate each response independently, requiring multiple forward passes, one for each potential response. Our approach concatenates multiple responses with separato…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 16.5

Masked by Consensus: Disentangling Privileged Knowledge in LLM Correctness

2026-04-14 · Tomer Ashuach, Liat Ein-Dor, Shai Gretz, Yoav Katz, Yonatan Belinkov

General AI

Humans use introspection to evaluate their understanding through private internal states inaccessible to external observers. We investigate whether large language models possess similar privileged knowledge about answer correctness, information unavailable through external observation. We train correctness classifiers …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 16.5

AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization

2026-04-15 · Genghan Zhang, Shaowei Zhu, Anjiang Wei, Zhenyu Song, Allen Nie, Zhen Jia, Nandita Vijaykumar, Yida Wang, Kunle Olukotun

General AI

We present AccelOpt, a self-improving large language model (LLM) agentic system that autonomously optimizes kernels for emerging AI acclerators, eliminating the need for expert-provided hardware-specific optimization knowledge. AccelOpt explores the kernel optimization space through iterative generation, informed by an…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 16.5

GTA-2: Benchmarking General Tool Agents from Atomic Tool-Use to Open-Ended Workflows

2026-04-17 · Jize Wang, Xuanxuan Liu, Yining Li, Songyang Zhang, Yijun Wang, Zifei Shan, Xinyi Le, Cailian Chen, Xinping Guan, Dacheng Tao

General AI

The development of general-purpose agents requires a shift from executing simple instructions to completing complex, real-world productivity workflows. However, current tool-use benchmarks remain misaligned with real-world requirements, relying on AI-generated queries, dummy tools, and limited system-level coordination…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 16.5

CreativeGame:Toward Mechanic-Aware Creative Game Generation

2026-04-21 · Hongnan Ma, Han Wang, Shenglin Wang, Tieyue Yin, Yiwei Shi, Yucong Huang, Yingtian Zou, Muning Wen, Mengyue Yang

General AI

Large language models can generate plausible game code, but turning this capability into iterative creative improvement remains difficult. In practice, single-shot generation often produces brittle runtime behavior, weak accumulation of experience across versions, and creativity scores that are too subjective to serve …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 16.5

MMCORE: MultiModal COnnection with Representation Aligned Latent Embeddings

2026-04-21 · Zijie Li, Yichun Shi, Jingxiang Sun, Ye Wang, Yixuan Huang, Zhiyao Guo, Xiaochen Lian, Peihao Zhu, Yu Tian, Zhonghua Zhai, Peng Wang

General AI

We present MMCORE, a unified framework designed for multimodal image generation and editing. MMCORE leverages a pre-trained Vision-Language Model (VLM) to predict semantic visual embeddings via learnable query tokens, which subsequently serve as conditioning signals for a diffusion model. This streamlined design effect…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 16.5

SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution

2026-04-21 · Xiachong Feng, Yi Jiang, Xiaocheng Feng, Deyi Yin, Libo Qin, Yangfan Ye, Lei Huang, Weitao Ma, Yuxuan Gu, Chonghan Qin, Bing Qin, Lingpeng Kong

General AI

Social intelligence, the ability to navigate complex interpersonal interactions, presents a fundamental challenge for language agents. Training such agents via reinforcement learning requires solving the credit assignment problem: determining how individual utterances contribute to multi-turn dialogue outcomes. Existin…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.5

ZenBrain: A Neuroscience-Inspired 7-Layer Memory Architecture for Autonomous AI Systems

2026-04-26 · Alexander Bering

Research Track A · General AI

Despite a century of empirical memory research, existing AI agent memory systems rely on system-engineering metaphors (virtual-memory paging, flat LLM storage, Zettelkasten notes), none integrating principles of consolidation, forgetting, and reconsolidation. We present ZenBrain, a multi-layer memory architecture integ…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 16.5

Improving Vision-language Models with Perception-centric Process Reward Models

2026-04-27 · Yingqian Min, Kun Zhou, Yifan Li, Yuhuan Wu, Han Peng, Yifan Du, Wayne Xin Zhao, Min Yang, Ji-Rong Wen

General AI

Recent advancements in reinforcement learning with verifiable rewards (RLVR) have significantly improved the complex reasoning ability of vision-language models (VLMs). However, its outcome-level supervision is too coarse to diagnose and correct errors within the reasoning chain. To this end, we propose Perceval, a pro…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.5

MPCS: Neuroplastic Continual Learning via Multi-Component Plasticity and Topology-Aware EWC

2026-05-04 · Joern Hentsch

Research Track A · General AI

Continual learning systems face a fundamental tension between plasticity -- acquiring new knowledge -- and stability -- retaining prior knowledge. We introduce MPCS (Multi-Plasticity Continual System), a neuroplastic architecture that integrates eleven complementary mechanisms: task-driven neurogenesis, Fourier-encoded…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.5

Intrinsic Vicarious Conditioning for Deep Reinforcement Learning

2026-05-12 · Rodney A Sanchez, Ferat Sahin, Alex Ororbia, Jamison Heard

Research Track A · General AI

Advancements in reinforcement learning have produced a variety of complex and useful intrinsic driving forces; crucially, these drivers operate under a direct conditioning paradigm. This form of conditioning limits our agents' capacity by restricting how they learn from the environment as well as from others. Off-polic…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.5

Dynamic Mixture of Latent Memories for Self-Evolving Agents

2026-05-21 · Dianzhi Yu, Vireo Zhang, Hongru Wang, Yanyu Chen, Minda Hu, Wanghan Xu, Siki Chen, Philip Torr, Zhenfei Yin, Irwin King

Research Track A · General AI

Achieving self-evolution in intelligent agents requires the continual accumulation of new knowledge across changing task sequences without forgetting previously acquired abilities. Existing approaches either internalize knowledge by updating model parameters, which induces catastrophic forgetting, or rely on external m…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.5

Active Continual Learning with Metaplastic Binary Bayesian Neural Networks

2026-05-28 · Kellian Cottart, Théo Ballet, Djohan Bonnet, Damien Querlioz

Research Track A · General AI

Always-on edge systems must keep learning as conditions change under tight compute budgets and must detect unreliable predictions. Bayesian binary neural networks are attractive in this setting, but mean-field Bernoulli posteriors can saturate on long non-stationary streams, wiping out epistemic uncertainty and freezin…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 16.5

Absorbing Complexity: An Interaction-Native Knowledge Harness for Financial LLM Agents

2026-06-01 · Ailiya Borjigin, Igor Stadnyk, Ben Bilski, Maksym Chikita, Dmytro Kyrylenko, Sofiia Pidturkina, Julia Stadnyk

General AI

Financial AI agents often fail for a simple reason: they make users carry the complexity. A user must repeatedly restate goals, risk preferences, portfolio context, past judgments, and shifting market assumptions, while the agent answers, retrieves, acts, and forgets. In finance, this is not just inconvenient. In tasks…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 16.5

Learning Geometric Representations from Videos for Spatial Intelligent Multimodal Large Language Models

2026-06-04 · Haibo Wang, Lifu Huang

General AI

Multimodal Large Language Models (MLLMs) excel at 2D semantic understanding but lack intrinsic 3D awareness, resulting in representations that fail to maintain geometric and spatial consistency across video frames. Given the scarcity of large-scale 3D data, we present GeoVR, a novel framework that learns geometric repr…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.5

Not Just After One: Sleep-Inspired Replay Prevents Catastrophic Forgetting After Sequential Tasks

2026-06-07 · Anthony Bazhenov, Jean Erik Delanois, Giri P. Krishnan

Research Track A

One of the critical limitations of artificial neural networks is their lack of ability to continually learn: training on new tasks often leads to interference and forgetting of the previous ones. While several algorithms have been proposed to protect old memories from interference, they are typically applied during or …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 16.5

Late-Layer Fusion is Enough: Dual-Path Vision Token Routing for Multimodal Large Language Models under Visual Saturation

2026-06-08 · Siyuan Liu, Jinyang Wu

General AI

Multimodal large language models (MLLMs) commonly inherit the deep, symmetric Transformer backbone designed for unimodal text modeling, and apply the same computation uniformly to image and language tokens. This design overlooks a key modality asymmetry: image and text tokens differ substantially in information density…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 16.5

Optical Reasoning: Rethinking Images as an Expressive Reasoning Medium Beyond Text

2026-06-08 · Yutong Bian, Dongjie Cheng, Heming Xia, Yongqi Li, Wenjie Li

General AI

Chain-of-Thought (CoT) improves the performance of Large Language Models (LLMs) and has been extended to Multimodal Large Language Models (MLLMs). More recent work further moves from text-based multimodal reasoning toward interleaved-modal reasoning, where intermediate steps can incorporate both textual rationales and …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 16.5

Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution

2026-06-09 · Xucong Wang, Ziyu Ma, Shidong Yang, Tongwen Huang, Pengkun Wang, Yong Wang, Xiangxiang Chu

General AI

Although Large Language Model (LLM) agents have demonstrated strong performance on complex tasks, their learning is often limited by inefficient interaction feedback and static training environments, which hinder broader generalization. To address these limitations, this paper introduces Role-Agent, black{a framework} …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.5

Medical Heuristic Learning: An LLM-Driven Framework for Interpretable and Auditable Clinical Decision Rules

2026-06-15 · Wei Xu, Ke Yang, Gang Luo, Keli Zheng, Lingyan Hu, Jing Wang, Kefeng Li

Research Track A · General AI

Predictive modeling for clinical tabular data is central to clinical decision support and therefore requires not only strong predictive performance but also transparent decision logic. Although deep learning and tree-based ensemble methods can achieve high accuracy, their black-box nature remains a major obstacle to cl…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 16.4

Look Light, Think Heavy: What Multimodal Chain-of-Thought Reasoning Can and Cannot Do

2026-06-21 · Zhuoran Jin, Kejian Zhu, Hongbang Yuan, Yupu Hao, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao

General AI

Chain-of-Thought (CoT) has become a standard method for improving reasoning capabilities in large language models (LLMs) by eliciting step-by-step thinking, but its effectiveness in multimodal tasks remains unclear. In this paper, we aim to systematically investigate the key question: What can multimodal Chain-of-Thoug…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 16.4

ReMMD: Realistic Multilingual Multi-Image Agentic Verification for Multimodal Misinformation Detection

2026-06-23 · Chenhao Dang, Dantong Zhu, Jun Yang, Conghui He, Weijia Li

General AI

Multimodal misinformation detection is increasingly important because viral posts now combine long multilingual narratives, several images, mixed provenance, and subtle text--image framing errors. Existing benchmarks and methods remain poorly matched to this setting: they usually isolate short captions, single images, …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.3

Environment Maps: Structured Environmental Representations for Long-Horizon Agents

2026-03-24 · Yenchia Feng, Chirag Sharma, Karime Maamari

Research Track B · General AI

Although large language models (LLMs) have advanced rapidly, robust automation of complex software workflows remains an open problem. In long-horizon settings, agents frequently suffer from cascading errors and environmental stochasticity; a single misstep in a dynamic interface can lead to task failure, resulting in h…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.3

Seeing to Ground: Visual Attention for Hallucination-Resilient MDLLMs

2026-03-26 · Vishal Narnaware, Animesh Gupta, Kevin Zhai, Zhenyi Wang, Mubarak Shah

General AI

Multimodal Diffusion Large Language Models (MDLLMs) achieve high-concurrency generation through parallel masked decoding, yet the architectures remain prone to multimodal hallucinations. This structural vulnerability stems from an algorithmic flaw: the decoder ranks candidate tokens based on textual likelihood without …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.3

BACE: LLM-based Code Generation through Bayesian Anchored Co-Evolution of Code and Test Populations

2026-03-30 · Kaushitha Silva, Srinath Perera

General AI

Large Language Models (LLMs) have demonstrated impressive capabilities in code generation. While an interactive feedback loop can improve performance, writing effective tests is a non-trivial task. Early multi-agent frameworks, such as AgentCoder, automated this process but relied on generated tests as absolute ground …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.3

Gen-Searcher: Reinforcing Agentic Search for Image Generation

2026-03-30 · Kaituo Feng, Manyuan Zhang, Shuang Chen, Yunlong Lin, Kaixuan Fan, Yilei Jiang, Hongyu Li, Dian Zheng, Chenyang Wang, Xiangyu Yue

General AI

Recent image generation models have shown strong capabilities in generating high-fidelity and photorealistic images. However, they are fundamentally constrained by frozen internal knowledge, thus often failing on real-world scenarios that are knowledge-intensive or require up-to-date information. In this paper, we pres…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.3

HandX: Scaling Bimanual Motion and Interaction Generation

2026-03-30 · Zimu Zhang, Yucheng Zhang, Xiyan Xu, Ziyin Wang, Sirui Xu, Kai Zhou, Bing Zhou, Chuan Guo, Jian Wang, Yu-Xiong Wang, Liang-Yan Gui

General AI

Synthesizing human motion has advanced rapidly, yet realistic hand motion and bimanual interaction remain underexplored. Whole-body models often miss the fine-grained cues that drive dexterous behavior, finger articulation, contact timing, and inter-hand coordination, and existing resources lack high-fidelity bimanual …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.3

An Empirical Study of Multi-Agent Collaboration for Automated Research

2026-03-31 · Yang Shen, Zhenyi Yi, Ziyi Zhao, Lijun Sun, Dongyang Li, Chin-Teng Lin, Yuhui Shi

Research Track A · General AI

As AI agents evolve, the community is rapidly shifting from single Large Language Models (LLMs) to Multi-Agent Systems (MAS) to overcome cognitive bottlenecks in automated research. However, the optimal multi-agent coordination framework for these autonomous agents remains largely unexplored. In this paper, we present …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.3

Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning

2026-04-06 · Lei Zhang, Junjiao Tian, Zhipeng Fan, Kunpeng Li, Jialiang Wang, Weifeng Chen, Markos Georgopoulos, Felix Juefei-Xu, Yuxiang Bao, Julian McAuley, Manling Li, Zecheng He

General AI

Humans paint images incrementally: they plan a global layout, sketch a coarse draft, inspect, and refine details, and most importantly, each step is grounded in the evolving visual states. However, can unified multimodal models trained on text-image interleaved datasets also imagine the chain of intermediate states? In…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.3

MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive Control

2026-04-07 · Yuchi Wang, Haiyang Yu, Weikang Bian, Jiefeng Long, Xiao Liang, Chao Feng, Hongsheng Li

General AI

MLLMs have been successfully applied to multimodal embedding tasks, yet their generative reasoning capabilities remain underutilized. Directly incorporating chain-of-thought reasoning into embedding learning introduces two fundamental challenges. First, structural misalignment between instance-level reasoning and pairw…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.3

Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework

2026-04-07 · Komal Kumar, Aman Chadha, Salman Khan, Fahad Shahbaz Khan, Hisham Cholakkal

General AI

The rapid growth of scientific literature has made it increasingly difficult for researchers to efficiently discover, evaluate, and synthesize relevant work. Recent advances in multi-agent large language models (LLMs) have demonstrated strong potential for understanding user intent and are being trained to utilize vari…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.3

Towards Long-horizon Agentic Multimodal Search

2026-04-14 · Yifan Du, Zikang Liu, Jinbiao Peng, Jie Wu, Junyi Li, Jinyang Li, Wayne Xin Zhao, Ji-Rong Wen

General AI

Multimodal deep search agents have shown great potential in solving complex tasks by iteratively collecting textual and visual evidence. However, managing the heterogeneous information and high token costs associated with multimodal inputs over long horizons remains a critical challenge, as existing methods often suffe…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.3

OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

2026-04-20 · Jinghui Lu, Jiayi Guan, Zhijian Huang, Jinlong Li, Guang Li, Lingdong Kong, Yingyan Li, Han Wang, Shaoqing Xu, Yuechen Luo, Fang Li, Chenxu Dang, Junli Wang, Tao Xu, Jing Wu, Jianhua Wu, Xiaoshuai Hao, Wen Zhang, Tianyi Jiang, Lingfeng Zhang, Lei Zhou, Yingbo Tang, Jie Wang, Yinfeng Gao, Xizhou Bu, Haochen Tian, Yihang Qiu, Feiyang Jia, Lin Liu, Yigu Ge, Hanbing Li, Yuannan Shen, Jianwei Cui, Hongwei Xie, Bing Wang, Haiyang Sun, Jingwei Zhao, Jiahui Huang, Pei Liu, Zeyu Zhu, Yuncheng Jiang, Zibin Guo, Chuhong Gong, Hanchao Leng, Kun Ma, Naiyang Wang, Guang Chen, Kuiyuan Yang, Hangjun Ye, Long Chen

General AI

Chain-of-Thought (CoT) reasoning has become a powerful driver of trajectory prediction in VLA-based autonomous driving, yet its autoregressive nature imposes a latency cost that is prohibitive for real-time deployment. Latent CoT methods attempt to close this gap by compressing reasoning into continuous hidden states, …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.3

Self-Aware Vector Embeddings for Retrieval-Augmented Generation: A Neuroscience-Inspired Framework for Temporal, Confidence-Weighted, and Relational Knowledge

2026-04-22 · Naizhong Xu

Research Track A · General AI

Modern retrieval-augmented generation (RAG) systems treat vector embeddings as static, context-free artifacts: an embedding has no notion of when it was created, how trustworthy its source is, or which other embeddings depend on it. This flattening of knowledge has a measurable cost: recent work on VersionRAG reports t…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.3

CGC: Compositional Grounded Contrast for Fine-Grained Multi-Image Understanding

2026-04-24 · Lihao Zheng, Zhenwei Shao, Yu Zhou, Yan Yang, Xintian Shen, Jiawei Chen, Hao Ma, Tao Wei

General AI

Although Multimodal Large Language Models (MLLMs) have advanced rapidly, they still face notable challenges in fine-grained multi-image understanding, often exhibiting spatial hallucination, attention leakage, and failures in object constancy. In addition, existing approaches typically rely on expensive human annotatio…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.3

SOLAR-RL: Semi-Online Long-horizon Assignment Reinforcement Learning

2026-04-24 · Jichao Wang, Liuyang Bian, Yufeng Zhou, Han Xiao, Yue Pan, Guozhi Wang, Hao Wang, Zhaoxiong Wang, Yafei Wen, Xiaoxin Chen, Shuai Ren, Lingfang Zeng

Research Track B · General AI

As Multimodal Large Language Models (MLLMs) mature, GUI agents are evolving from static interactions to complex navigation. While Reinforcement Learning (RL) has emerged as a promising paradigm for training MLLM agents on dynamic GUI tasks, its effective application faces a dilemma. Standard Offline RL often relies on …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.3

MEG-RAG: Quantifying Multi-modal Evidence Grounding for Evidence Selection in RAG

2026-04-27 · Xihang Wang, Zihan Wang, Chengkai Huang, Quan Z. Sheng, Lina Yao

General AI

Multimodal Retrieval-Augmented Generation (MRAG) addresses key limitations of Multimodal Large Language Models (MLLMs), such as hallucination and outdated knowledge. However, current MRAG systems struggle to distinguish whether retrieved multimodal data truly supports the semantic core of an answer or merely provides s…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.3

MEMCoder: Multi-dimensional Evolving Memory for Private-Library-Oriented Code Generation

2026-04-27 · Mofei Li, Taozhi Chen, Guowei Yang, Jia Li

Research Track A · General AI

Large Language Models (LLMs) excel at general code generation, but their performance drops sharply in enterprise settings that rely on internal private libraries absent from public pre-training corpora. While Retrieval-Augmented Generation (RAG) offers a training-free alternative by providing static API documentation, …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.3

Contextual Agentic Memory is a Memo, Not True Memory

2026-04-30 · Binyan Xu, Xilin Dai, Kehuan Zhang

Research Track A · General AI

Current agentic memory systems (vector stores, retrieval-augmented generation, scratchpads, and context-window management) do not implement memory: they implement lookup. We argue that treating lookup as memory is a category error with provable consequences for agent capability, long-term learning, and security. Retrie…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.3

PRISM: Pre-alignment via Black-box On-policy Distillation for Multimodal Reinforcement Learning

2026-04-30 · Sudong Wang, Weiquan Huang, Xiaomin Yu, Zuhao Yang, Hehai Lin, Keming Wu, Chaojun Xiao, Chen Chen, Wenxuan Wang, Beier Zhu, Yunjian Zhang, Chengwei Qin

General AI

The standard post-training recipe for large multimodal models (LMMs) applies supervised fine-tuning (SFT) on curated demonstrations followed by reinforcement learning with verifiable rewards (RLVR). However, SFT introduces distributional drift that neither preserves the model's original capabilities nor faithfully matc…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.3

ActiveMem: Distributed Active Memory for Long-Horizon LLM Reasoning

2026-06-09 · Yunhan Jiang, Wenbin Duan, Shasha Guo, Liang Pang, Xiaoqian Sun, Huawei Shen

General AI

Memory is essential for enabling large language model (LLM) agents to handle long-horizon reasoning tasks. Existing memory mechanisms are largely centralized, typically organizing retrieved information and interaction history within a single model context. This design imposes a fundamental trade-off: scaling reasoning …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.3

Modeling Complex Behaviors: Multi-Personality Composition and Dynamic Switching in Vision-Language Models

2026-06-09 · Peiqi Jia, Haonan Jia, Ziqi Miao, Linkang Du, Yuntao Wang, Zhou Su

General AI

With the widespread deployment of Multimodal Large Language Models (MLLMs) in social interaction, understanding and controlling their behavior under complex personality conditions is essential. This paper introduces explicit personality conditioning and establishes a systematic evaluation framework encompassing single-…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 16.2

Recovering Policy-Induced Errors: Benchmarking and Trajectory Synthesis for Robust GUI Agents

2026-05-28 · Tianpeng Bu, Xin Liu, Qihua Chen, Hao Jiang, Shurui Li, Hongtao Duan, Lu Jiang, Lulu Hu, Bin Yang, Minying Zhang

Research Track B · General AI

While GUI agents have advanced rapidly, they often lack the robustness to recover from their own errors, hindering real-world deployment. To bridge this gap at both the evaluation and data levels, we introduce GUI-RobustEval and propose Robustness-driven Trajectory Synthesis. GUI-RobustEval contains 1,216 executable te…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.2

EG-VQA: Benchmarking Verifiable Video Question Answering with Grounded Temporal Evidence

2026-06-23 · Linpeng Huang, Weixing Chen, Zexin Chen, Yang Liu, Liang Lin

General AI

Recent advances in Video Large Language Models (Video-LLMs) have yielded promising performance on video question answering (VideoQA). Nevertheless, existing benchmarks are predominantly evaluated through answer correctness, while the grounding of predictions in relevant video evidence remains largely unexamined. This d…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.2

IV-CoT: Implicit Visual Chain-of-Thought for Structure-Aware Text-to-Image Generation

2026-06-23 · Zixuan Li, Haokun Lin, Yicheng Xiao, Zhiwei Li, Xinyang Song, Zelong Zheng, Yong He, Heng Yao, Ke Ding, Chao Yu, Chuan Yuan, Qi Li, Zhenan Sun

General AI

Unified multi-modal large language models (MLLMs) have achieved strong text-to-image generation quality, but still struggle with structure-aware prompt following, where object counts, spatial relations, attribute bindings, and coarse layouts must be preserved. We attribute this limitation in part to the entanglement of…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.2

SpeechEQ: Benchmarking Emotional Intelligence Quotient in Socially Aware Voice Conversational Models

2026-06-24 · Liang-Yuan Wu, Zih-Ching Chen, Tongshuang Wu, Chao-Han Huck Yang, Hua Shen

General AI

As multimodal conversational systems increasingly engage in spoken interaction, their ability to navigate paralinguistic social cues has become a critical bottleneck for natural human-AI communication. However, existing evaluations of machine emotional intelligence assess reasoning exclusively through isolated text or …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.0

Non-Equilibrium Stochastic Dynamics as a Unified Framework for Insight and Repetitive Learning: A Kramers Escape Approach to Continual Learning

2026-04-05 · Gunn Kim

Research Track A · General AI

Continual learning in artificial neural networks is fundamentally limited by the stability--plasticity dilemma: systems that retain prior knowledge tend to resist acquiring new knowledge, and vice versa. Existing approaches, most notably elastic weight consolidation~(EWC), address this empirically without a physical ac…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.0

Continual Visual Anomaly Detection on the Edge: Benchmark and Efficient Solutions

2026-04-07 · Manuel Barusco, Francesco Borsatti, David Petrovic, Davide Dalle Pezze, Gian Antonio Susto

Research Track A · General AI

Visual Anomaly Detection (VAD) is a critical task for many applications including industrial inspection and healthcare. While VAD has been extensively studied, two key challenges remain largely unaddressed in conjunction: edge deployment, where computational resources are severely constrained, and continual learning, w…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.0

ELC: Evidential Lifelong Classifier for Uncertainty Aware Radar Pulse Classification

2026-04-08 · Mohamed Rabie, Chinthana Panagamuwa, Konstantinos G. Kyriakopoulos

Research Track A

Reliable radar pulse classification is essential in Electromagnetic Warfare for situational awareness and decision support. Deep Neural Networks have shown strong performance in radar pulse and RF emitter recognition; however, on their own they struggle to efficiently learn new pulses and lack mechanisms for expressing…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.0

Leveraging Complementary Embeddings for Replay Selection in Continual Learning with Small Buffers

2026-04-09 · Danit Yanowsky, Daphna Weinshall

Research Track A · General AI

Catastrophic forgetting remains a key challenge in Continual Learning (CL). In replay-based CL with severe memory constraints, performance critically depends on the sample selection strategy for the replay buffer. Most existing approaches construct memory buffers using embeddings learned under supervised objectives. Ho…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.0

Towards a Data-Parameter Correspondence for LLMs: A Preliminary Discussion

2026-04-19 · Ou Wu

Research Track A · General AI

Large language model optimization has historically bifurcated into isolated data-centric and model-centric paradigms: the former manipulates involved samples through selection, augmentation, or poisoning, while the latter tunes model weights via masking, quantization, or low-rank adaptation. This paper establishes a un…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.0

Incremental learning for audio classification with Hebbian Deep Neural Networks

2026-04-20 · Riccardo Casciotti, Francesco De Santis, Alberto Antonietti, Annamaria Mesaros

Research Track A

The ability of humans for lifelong learning is an inspiration for deep learning methods and in particular for continual learning. In this work, we apply Hebbian learning, a biologically inspired learning process, to sound classification. We propose a kernel plasticity approach that selectively modulates network kernels…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.0

Sentinel-VLA: A Metacognitive VLA Model with Active Status Monitoring for Dynamic Reasoning and Error Recovery

2026-05-02 · Wenhao Li, Xiu Su, Yichao Cao, Hongyan Xu, Xiaobo Xia, Shan You, Yi Chen, Chang Xu

Research Track A · General AI

Vision-language-action (VLA) models have advanced the field of embodied manipulation by harnessing broad world knowledge and strong generalization. However, current VLA models still face several key challenges, including limited reasoning capability, lack of status monitoring, and difficulty in self-correction. In this…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 16.0

PAAC: Privacy-Aware Agentic Device-Cloud Collaboration

2026-05-09 · Liangqi Yuan, Wenzhi Fang, Shiqiang Wang, Christopher G. Brinton

General AI

Large language model (LLM) agents face a structural tension: cloud agents provide strong reasoning but expose user data, while on-device agents preserve privacy at the cost of overall capability. Existing device-cloud designs treat this boundary as a compute split rather than a trust boundary suited to agentic workload…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.0

KAN-CL: Per-Knot Importance Regularization for Continual Learning with Kolmogorov-Arnold Networks

2026-05-12 · Minjong Cheon

Research Track A · General AI

Catastrophic forgetting remains the central obstacle in continual learning (CL): parameters shared across tasks interfere with one another, and existing regularization methods such as EWC and SI apply uniform penalties without awareness of which input region a parameter serves. We propose KAN-CL, a continual learning f…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.0

ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging

2026-05-12 · Neha Verma, Nikhil Mehta, Shao-Chuan Wang, Naijing Zhang, Alicia Tsai, Li Wei, Lukasz Heldt, Lichan Hong, Ed Chi, Xinyang Yi

Research Track A · General AI

Despite the rapid advancements in large language model (LLM) development, fine-tuning them for specific tasks often results in the catastrophic forgetting of their general, language-based reasoning abilities. This work investigates and addresses this challenge in the context of the Generative Retrieval (GenRetrieval) t…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.0

Tunable MAGMAX: Preference-Aware Model Merging for Continual Learning

2026-05-20 · Kei Hiroshima, Kento Uchida, Shinichi Shirakawa

Research Track A · General AI

Continual learning (CL) aims to train models sequentially on multiple tasks while mitigating catastrophic forgetting of previously learned knowledge. Recent advances in large pre-trained models (LPMs) and model merging techniques, such as MAGMAX, have demonstrated effective CL performance by combining task-specific par…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.0

CLANE: Continual Learning of Actions on Neuromorphic Hardware from Event Cameras

2026-05-27 · Elvin Hajizada, Michael Neumeier, Edward Paxon Frady, Yulia Sandamirskaya, Axel von Arnim, Bing Li, Eyke Hüllermeier

Research Track A · General AI

Recognizing and continuously learning novel human actions without forgetting prior classes is a requirement for emerging AR/VR and robotics applications. For these applications, both on-device processing and learning are essential for privacy and low-latency adaptation. Event cameras address the efficiency of visual se…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.0

Overcoming Forgetting in LLM Fine-Tuning with Evolution Strategies

2026-05-28 · Kajetan Schweighofer, Conor F. Hayes, Roberto Dailey, Risto Miikkulainen, Xin Qiu

Research Track A · General AI

Evolution Strategies (ES) has recently emerged as a competitive alternative to reinforcement learning (RL) for large language model (LLM) fine-tuning, offering advantages through simplicity, scalability, and inference-only training. However, recent work suggests that ES fine-tuning on new tasks may induce forgetting of…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.0

Null-Space Constrained Low-Rank Adaptation for Response-Specified Large Language Model Unlearning

2026-06-09 · Bocheng Ju, Jianhua Wang, Chengliang Liu, Xiaolin Chang

Research Track A · General AI

Large language model unlearning aims to suppress designated undesirable knowledge while preserving benign capabilities. Many unlearning objectives focus on suppressing undesired answers, while recent target-guided variants specify replacement behavior but still leave update locality largely unconstrained. This paper in…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.9

Towards Continuous Power Forecasting: Practical Continual Learning for Real-World Energy Systems in Nonstationary Time Series

2026-06-23 · Yujiang He, Frederic Uhrweiller, Bernhard Sick

Research Track A

Power forecasting models deployed in real-world energy markets must operate under nonstationary conditions, where data distributions continually evolve due to weather variability, infrastructure upgrades, and changing consumption behaviors. In practice, these models face strict operational constraints: historical data …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.8

ORCE: Order-Aware Alignment of Verbalized Confidence in Large Language Models

2026-05-12 · Chen Li, Xiaoling Hu, Songzhu Zheng, Jiawei Zhou, Chao Chen

General AI

Large language models (LLMs) often produce answers with high certainty even when they are incorrect, making reliable confidence estimation essential for deployment in real-world scenarios. Verbalized confidence, where models explicitly state their confidence in natural language, provides a flexible and user-facing unce…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.8

Unlocking the Working Memory of Large Language Models for Latent Reasoning

2026-05-28 · Lukas Aichberger, Sepp Hochreiter

General AI

To improve the reasoning capabilities of large language models, test-time compute is typically scaled by generating intermediate tokens before the final answer. However, this couples reasoning to autoregressive generation and thereby conflates internal computation with external communication. In contrast, human cogniti…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.8

Learning to Adapt: Self-Improving Web Agent via Cognitive-Aware Exploration

2026-05-29 · Weile Chen, Bingchen Miao, Qifan Yu, Wendong Bu, Guoming Wang, Wenqiao Zhang, Shengyu Zhang, Juncheng Li, Siliang Tang

Research Track B · General AI

Recent advances in Multimodal Large Language Models (MLLMs) have led to promising progress in web agents. However, existing web agents often rely on handcrafted execution pipelines or expensive expert trajectories, limiting their adaptability to complex, dynamic environments. To address these challenges, we propose SCA…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.8

SADL: What to Ignore? A Benchmark for Subject-Aware Distractor Localization

2026-06-29 · Cao-Tri Nguyen, Nguyen-Khoa Luong, Vinh-Tiep Nguyen, Minh-Triet Tran

General AI

Photographs frequently contain \emph{visual distractors} besides foregrounds and backgrounds of the intended subject, competing for attention and weakening composition. While modern editing tools streamline object removal, identifying which objects to remove remains a mostly manual process. Existing saliency models and…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.8

ASPIRE: Agentic /Skills Discovery for Robotics

2026-06-30 · Runyu Lu, Yubo Wu, Ethan Kou, Letian Fu, Wenli Xiao, Ajay Mandlekar, Yinzhen Xu, Guanya Shi, Ken Goldberg, Ang Chen, Mosharaf Chowdhury, Yuke Zhu, Linxi "Jim" Fan, Guanzhi Wang

Research Track A · General AI

Traditional robot programming is challenging: it requires orchestrating multimodal perception, managing physical contact dynamics, and handling diverse configurations and execution failures. We introduce ASPIRE (Agentic Skill Programming through Iterative Robot Exploration), a continual learning system that autonomousl…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 15.8

EvoPolicyGym: Evaluating Autonomous Policy Evolution in Interactive Environments

2026-07-02 · Zhilin Wang, Han Song, Runzhe Zhan, Jusen Du, Jiacheng Chen, Tianle Li, Qingyu Yin, Yulun Wu, Zhennan Shen, Tong Zhu, Yanshu Li, Guanjie Chen, Derek F. Wong, Yafu Li, Yu Cheng, Yang Yang

General AI

Autonomous agents are increasingly expected to improve executable policies through feedback, yet existing evaluations often collapse this process into a final score or confound it with open-ended software-engineering progress. We introduce Autonomous Policy Evolution, a controlled evaluation setting in which a harness-…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.6

Multi-Head Recurrent Memory Agents

2026-07-01 · Jiatong Li, Samuel Yeh, Sharon Li

Research Track A · General AI

Recurrent memory agents extend LLMs to arbitrarily long contexts by iteratively consolidating input into a fixed-size memory window. Despite their scalability, these agents exhibit a well-documented reliability problem: end-to-end performance degrades systematically as context length grows. We diagnose this failure by …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.6

EAGLE-360: Embodied Active Global-to-Local Exploration in 360$^\circ$

2026-07-02 · Jingtao Xu, Zizhuo Lin, Jianwen Sun, Yi Yang, Yawei Luo

General AI

While Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in standard visual understanding, adapting them for active visual search in 360$^\circ$ panoramic environments exposes fundamental limitations. Specifically, standard MLLMs struggle to effectively model inherent panoramic properti…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.6

Hawk: Harnessing Hardware-Aware Knowledge for High-Performance NPU Kernel Generation

2026-07-02 · Junyi Wen, Ruiyan Zhuang, Yongjia Xu, Pengtu Li, Rui Zou, Hongyi Chen, Chingman Wan, Puxu Yang, Wuhui Chen, Yanlin Wang

General AI

Developing high-performance kernels for Neural Processing Units (NPUs) is a critical industry bottleneck, requiring developers to manually navigate implicit hardware constraints and strict memory hierarchies. While large language models offer immense automation potential, they fail catastrophically on NPUs due to a fun…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.6

Learning to Evolve Scenes: Reasoning about Human Activities with Scene Graphs

2026-07-02 · Francesca Pistilli, Simone Alberto Peirone, Giuseppe Averta

General AI

Understanding human behavior while interacting with the surrounding world is crucial for many applications of embodied AI. First-person videos are particularly informative for this problem, as they well capture how activities reshape the scene over time. However, existing approaches often rely on implicit visual or lan…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.6

Many Voices, One Reward: Multi-Role Rubric Generation for LLM Judging and Reward Modeling

2026-07-02 · Dazhi Fu, Jiuding Yang, Yiwen Guo, Jicong Fan

General AI

Reliable reward and preference signals are critical for evaluating and optimizing large language models on open-ended tasks. Rubric-based judges offer a transparent way to decompose such judgments into explicit evaluation criteria, but existing annotation-free rubric generators typically rely on a single generic evalua…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 15.5

IQuest-Coder-V1 Technical Report

2026-03-17 · Jian Yang, Wei Zhang, Shawn Guo, Zhengmao Ye, Lin Jing, Shark Liu, Yizhi Li, Jiajun Wu, Cening Liu, X. Ma, Yuyang Song, Siwei Wu, Yuwen Li, L. Liao, T. Zheng, Ziling Huang, Zelong Huang, Che Liu, Yan Xing, Renyuan Li, Qingsong Cai, Hanxu Yan, Siyue Wang, Shikai Li, Jason Klein Liu, An Huang, Yongsheng Kang, Jinxing Zhang, Chuan Hao, Haowen Wang, Weicheng Gu, Ran Tao, Mingjie Tang, Peihao Wu, Jianzhou Wang, Xianglong Liu, Weifeng Lv, Bryan Dai

General AI

In this report, we introduce the IQuest-Coder-V1 series-(7B/14B/40B/40B-Loop), a new family of code large language models (LLMs). Moving beyond static code representations, we propose the code-flow multi-stage training paradigm, which captures the dynamic evolution of software logic through different phases of the pipe…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 15.5

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

2026-03-26 · Yicheng Zou, Dongsheng Zhu, Lin Zhu, Tong Zhu, Yunhua Zhou, Peiheng Zhou, Xinyu Zhou, Dongzhan Zhou, Zhiwang Zhou, Yuhao Zhou, Bowen Zhou, Zhanping Zhong, Zhijie Zhong, Haiteng Zhao, Penghao Zhao, Xiaomeng Zhao, Zhiyuan Zhao, Yechen Zhang, Jin Zhang, Wenwei Zhang, Hongjie Zhang, Zhuo Zhang, Wenlong Zhang, Bo Zhang, Chao Zhang, Chen Zhang, Yuhang Zang, Fei Yuan, Jiakang Yuan, Jiashuo Yu, Jinhui Yin, Haochen Ye, Qian Yao, Bowen Yang, Danni Yang, Kaichen Yang, Ziang Yan, Jun Xu, Yicheng Xu, Wanghan Xu, Xuenan Xu, Chao Xu, Ruiliang Xu, Shuhao Xing, Long Xing, Xinchen Xie, Ling-I Wu, Zijian Wu, Zhenyu Wu, Lijun Wu, Yue Wu, Jianyu Wu, Wen Wu, Fan Wu, Xilin Wei, Qi Wei, Bingli Wang, Rui Wang, Ziyi Wang, Zun Wang, Yi Wang, Haomin Wang, Yizhou Wang, Lintao Wang, Yiheng Wang, Longjiang Wang, Bin Wang, Jian Tong, Zhongbo Tian, Huanze Tang, Chen Tang, Shixiang Tang, Yu Sun, Qiushi Sun, Xuerui Su, Qisheng Su, Chenlin Su, Demin Song, Jin Shi, Fukai Shang, Yuchen Ren, Pengli Ren, Xiaoye Qu, Yuan Qu, Jiantao Qiu, Yu Qiao, Runyu Peng, Tianshuo Peng, Jiahui Peng, Qizhi Pei, Zhuoshi Pan, Linke Ouyang, Wenchang Ning, Yichuan Ma, Zerun Ma, Ningsheng Ma, Runyuan Ma, Chengqi Lyu, Haijun Lv, Han Lv, Lindong Lu, Kuikun Liu, Jiangning Liu, Yuhong Liu, Kai Liu, Hongwei Liu, Zhoumianze Liu, Mengjie Liu, Ziyu Liu, Wenran Liu, Yang Liu, Liwei Liu, Kaiwen Liu, Junyao Lin, Junming Lin, Tianyang Lin, Dahua Lin, Jianze Liang, Linyang Li, Peiji Li, Zonglin Li, Zehao Li, Pengze Li, Guoyan Li, Lingkai Kong, Linglin Jing, Zhenjiang Jin, Feifei Jiang, Qian Jiang, Junhao Huang, Zixian Huang, Haian Huang, Zhouqi Hua, Han Hu, Linfeng Hou, Yinan He, Conghui He, Tianyao He, Xu Guo, Qipeng Guo, Aijia Guo, Yuzhe Gu, Lixin Gu, Jingyang Gong, Qiming Ge, Jiaye Ge, Songyang Gao, Jianfei Gao, Xinyu Fang, Caihua fan, Yue Fan, Yanhui Duan, Zichen Ding, Shengyuan Ding, Xuanlang Dai, Erfei Cui, Ganqu Cui, Pei Chu, Tao Chu, Guangran Cheng, Yu Cheng, Kai Chen, Yongkang Chen, Chiyu Chen, Guanzhou Chen, Qiaosheng Chen, Sitao Chen, Xin Chen, Haojiong Chen, Yicheng Chen, Weihan Cao, Yuhang Cao, Qinglong Cao, Lei Bai

General AI

We introduce Intern-S1-Pro, the first one-trillion-parameter scientific multimodal foundation model. Scaling to this unprecedented size, the model delivers a comprehensive enhancement across both general and scientific domains. Beyond stronger reasoning and image-text understanding capabilities, its intelligence is aug…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.5

On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models

2026-03-29 · Chongyang Zhao, Mingsong Li, Haodong Lu, Dong Gong

Research Track A · General AI

Multimodal Continual Instruction Tuning aims to continually enhance Large Vision Language Models (LVLMs) by learning from new data without forgetting previously acquired knowledge. Mixture of Experts (MoE) architectures naturally facilitate this by incrementally adding new experts and expanding routers while keeping th…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 15.5

Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis

2026-03-31 · Shuang Chen, Quanxin Shou, Hangting Chen, Yucheng Zhou, Kaituo Feng, Wenbo Hu, Yi-Fan Zhang, Yunlong Lin, Wenxuan Huang, Mingyang Song, Dasen Dai, Bolin Jiang, Manyuan Zhang, Shi-Xue Zhang, Zhengkai Jiang, Lucas Wang, Zhao Zhong, Yu Cheng, Nanyun Peng

General AI

Unified multimodal models provide a natural and promising architecture for understanding diverse and complex real-world knowledge while generating high-quality images. However, they still rely primarily on frozen parametric knowledge, which makes them struggle with real-world image generation involving long-tail and kn…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.5

Analytic Drift Resister for Non-Exemplar Continual Graph Learning

2026-04-03 · Lei Song, Shihan Guan, Youyong Kong

Research Track A · General AI

Non-Exemplar Continual Graph Learning (NECGL) seeks to eliminate the privacy risks intrinsic to rehearsal-based paradigms by retaining solely class-level prototype representations rather than raw graph examples for mitigating catastrophic forgetting. However, this design choice inevitably precipitates feature drift. As…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 15.5

LightThinker++: From Reasoning Compression to Memory Management

2026-04-04 · Yuqi Zhu, Jintian Zhang, Zhenjie Wan, Yujie Luo, Shuofei Qiao, Zhengke Gui, Da Zheng, Lei Liang, Huajun Chen, Ningyu Zhang

General AI

Large language models (LLMs) excel at complex reasoning, yet their efficiency is limited by the surging cognitive overhead of long thought traces. In this paper, we propose LightThinker, a method that enables LLMs to dynamically compress intermediate thoughts into compact semantic representations. However, static compr…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.5

Is Prompt Selection Necessary for Task-Free Online Continual Learning?

2026-04-06 · Seoyoung Park, Haemin Lee, Hankook Lee

Research Track A · General AI

Task-free online continual learning has recently emerged as a realistic paradigm for addressing continual learning in dynamic, real-world environments, where data arrive in a non-stationary stream without clear task boundaries and can only be observed once. To consider such challenging scenarios, many recent approaches…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.5

From Selection to Scheduling: Federated Geometry-Aware Correction Makes Exemplar Replay Work Better under Continual Dynamic Heterogeneity

2026-04-09 · Zhuang Qi, Ying-Peng Tang, Lei Meng, Guoqing Chao, Lei Wu, Han Yu, Xiangxu Meng

Research Track A

Exemplar replay has become an effective strategy for mitigating catastrophic forgetting in federated continual learning (FCL) by retaining representative samples from past tasks. Existing studies focus on designing sample-importance estimation mechanisms to identify information-rich samples. However, they typically ove…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 15.5

HP-Edit: A Human-Preference Post-Training Framework for Image Editing

2026-04-21 · Fan Li, Chonghuinan Wang, Lina Lei, Yuping Qiu, Jiaqi Xu, Jiaxiu Jiang, Xinran Qin, Zhikai Chen, Fenglong Song, Zhixin Wang, Renjing Pei, Wangmeng Zuo

General AI

Common image editing tasks typically adopt powerful generative diffusion models as the leading paradigm for real-world content editing. Meanwhile, although reinforcement learning (RL) methods such as Diffusion-DPO and Flow-GRPO have further improved generation quality, efficiently applying Reinforcement Learning from H…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 15.5

Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment

2026-04-21 · Bobo Li, Rui Wu, Zibo Ji, Meishan Zhang, Hao Fei, Min Zhang, Mong-Li Lee, Wynne Hsu

General AI

Large Language Model agents have rapidly evolved from static text generators into dynamic systems capable of executing complex autonomous workflows. To enhance reliability, multi-agent frameworks assigning specialized roles are increasingly adopted to enable self-reflection and mutual auditing. While such role-playing …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 15.5

Visual Reasoning through Tool-supervised Reinforcement Learning

2026-04-21 · Qihua Dong, Gozde Sahin, Pei Wang, Zhaowei Cai, Robik Shrestha, Hao Yang, Davide Modolo

General AI

In this paper, we investigate the problem of how to effectively master tool-use to solve complex visual reasoning tasks for Multimodal Large Language Models. To achieve that, we propose a novel Tool-supervised Reinforcement Learning (ToolsRL) framework, with direct tool supervision for more effective tool-use learning.…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 15.5

Step-level Optimization for Efficient Computer-use Agents

2026-04-29 · Jinbiao Wei, Kangqi Ni, Yilun Zhao, Guo Gan, Arman Cohan

General AI

Computer-use agents provide a promising path toward general software automation because they can interact directly with arbitrary graphical user interfaces instead of relying on brittle, application-specific integrations. Despite recent advances in benchmark performance, strong computer-use agents remain expensive and …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 15.5

InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?

2026-04-30 · Qiyao Wang, Haoran Hu, Longze Chen, Hongbo Wang, Hamid Alinejad-Rokny, Yuan Lin, Min Yang

General AI

With the advancement of multimodal large language models (MLLMs) and coding agents, the website development has shifted from manual programming to agent-based project-level code synthesis. Existing benchmarks rely on idealized assumptions, especially for well-structured, information-rich inputs and static execution set…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.5

Weblica: Scalable and Reproducible Training Environments for Visual Web Agents

2026-05-07 · Oğuzhan Fatih Kar, Roman Bachmann, Yuanzheng Gong, Anders Boesen Lindbo Larsen, Afshin Dehghan

Research Track B · General AI

The web is complex, open-ended, and constantly changing, making it challenging to scale training data for visual web agents. Existing data collection attempts remain limited to offline trajectories for supervised fine-tuning or a handful of simulated environments for RL training, thus failing to capture web diversity. …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 15.5

Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents

2026-05-28 · Ziyan Liu, Zhezheng Hao, Yeqiu Chen, Hong Wang, Jingren Hou, Ruiyi Ding, Yongkang Yang, Wence Ji, Wei Xia, Feng Liu

General AI

Memory-augmented LLM agents tackle complex long-horizon tasks by recursively summarizing interaction trajectories into compact memory. However, existing approaches typically train these memory policies using outcome-based reinforcement learning, failing to localize where intermediate memory quality degrades. As interac…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 15.5

EvoDS: Self-Evolving Autonomous Data Science Agent with Skill Learning and Context Management

2026-06-02 · Zherui Yang, Fan Liu, Yansong Ning, Hao Liu

General AI

Recent progress in Large Language Model (LLM) agents has enabled promising advances in automated data science. However, existing approaches remain fundamentally limited by their static action sets and lack of principled long-horizon context management, hindering their ability to accumulate reusable experience across ta…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 15.5

OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation

2026-06-16 · Guibin Zhang, Xun Xu, Yanwei Yue, Zikun Su, Wangchunshu Zhou, Xiaobin Hu, Shuicheng Yan

Research Track A · General AI

Memory has become a standard substrate for self-evolving agents, yet retaining experience is not the same as learning how to evolve through it. Existing memory agents can store trajectories, retrieve reflections, or accumulate skills, but often lack the holistic competence to select useful experience, act on it, write …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.5

Rehearsed Multi-Agent Live Product Demonstrations with Real-Time Voice Question Answering

2026-06-29 · Rahul Khedar, Mayank Malhotra, Avinash Karn, Mouli V, Prakhar Mehrotra

Research Track B · General AI

Live product demonstrations are a recurring, high-cost activity in software organizations: a human presenter must select features, dispatch the corresponding interactions on a running application, narrate them coherently, and answer questions in real time. Existing automation addresses only fragments -- generalist brow…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 15.4

CAVEWOMAN: How Large Language Models Behave Under Linguistic Input and Output Compression

2026-06-23 · Morayo Danielle Adeyemi, Ryan A. Rossi, Franck Dernoncourt

General AI

"Talk short. Drop grammar. Save token." This caveman style is widely promoted as a way to cut inference cost, but whether it actually saves anything depends on which channel (the user's prompt or the model's response) is being compressed. We present Cavewoman, a two-channel evaluation protocol that scores every generat…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 15.4

ShutterMuse: Capture-Time Photography Guidance with MLLMs

2026-06-24 · Jiayu Li, Yixiao Fang, Tianyu Hu, Wei Cheng, Ping Huang, Zheheng Fan, Gang Yu, Xingjun Ma

General AI

Real-world photography requires capture-time guidance for both camera framing and subject pose. Yet existing aesthetic cropping benchmarks mainly evaluate post-hoc crop prediction and overlook subject-side recommendations, leaving the capture-time guidance capabilities of multimodal large language models (MLLMs) undere…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

Dynamic Dual-Granularity Skill Bank for Agentic RL

2026-03-30 · Songjun Tu, Chengdong Xu, Qichao Zhang, Yaocheng Zhang, Xiangyuan Lan, Linjing Li, Dongbin Zhao

General AI

Agentic reinforcement learning (RL) can benefit substantially from reusable experience, yet existing skill-based methods mainly extract trajectory-level guidance and often lack principled mechanisms for maintaining an evolving skill memory. We propose D2Skill, a dynamic dual-granularity skill bank for agentic RL that o…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

The Triadic Cognitive Architecture: Bounding Autonomous Action via Spatio-Temporal and Epistemic Friction

2026-03-31 · Davide Di Gioia

General AI

Current autonomous AI agents, driven primarily by Large Language Models (LLMs), operate in a state of cognitive weightlessness: they process information without an intrinsic sense of network topology, temporal pacing, or epistemic limits. Consequently, heuristic agentic loops (e.g., ReAct) can exhibit failure modes in …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

Novel Memory Forgetting Techniques for Autonomous AI Agents: Balancing Relevance and Efficiency

2026-04-02 · Payal Fofadiya, Sunil Tiwari

Research Track A · General AI

Long-horizon conversational agents require persistent memory for coherent reasoning, yet uncontrolled accumulation causes temporal decay and false memory propagation. Benchmarks such as LOCOMO and LOCCO report performance degradation from 0.455 to 0.05 across stages, while MultiWOZ shows 78.2% accuracy with 6.8% false …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

Stop Wandering: Efficient Vision-Language Navigation via Metacognitive Reasoning

2026-04-02 · Xueying Li, Feng Lyu, Hao Wu, Mingliu Liu, Jia-Nan Liu, Guozi Liu

General AI

Training-free Vision-Language Navigation (VLN) agents powered by foundation models can follow instructions and explore 3D environments. However, existing approaches rely on greedy frontier selection and passive spatial memory, leading to inefficient behaviors such as local oscillation and redundant revisiting. We argue…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

Rethinking Model Efficiency: Multi-Agent Inference with Large Models

2026-04-06 · Sixun Dong, Juhua Hu, Steven Li, Wei Wen, Qi Qian

General AI

Most vision-language models (VLMs) apply a large language model (LLM) as the decoder, where the response tokens are generated sequentially through autoregression. Therefore, the number of output tokens can be the bottleneck of the end-to-end latency. However, different models may require vastly different numbers of out…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

Figures as Interfaces: Toward LLM-Native Artifacts for Scientific Discovery

2026-04-09 · Yifang Wang, Rui Sheng, Erzhuo Shao, Yifan Qian, Haotian Li, Nan Cao, Dashun Wang

General AI

Large language models (LLMs) are transforming scientific workflows, not only through their generative capabilities but also through their emerging ability to use tools, reason about data, and coordinate complex analytical tasks. Yet in most human-AI collaborations, the primary outputs, figures, are still treated as sta…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

Memory as Metabolism: A Design for Companion Knowledge Systems

2026-04-13 · Stefan Miteski

Research Track A · General AI

Retrieval-Augmented Generation remains the dominant pattern for giving LLMs persistent memory, but a visible cluster of personal wiki-style memory architectures emerged in April 2026 -- design proposals from Karpathy, MemPalace, and LLM Wiki v2 that compile knowledge into an interlinked artifact for long-term use by a …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

Cycle-Consistent Search: Question Reconstructability as a Proxy Reward for Search Agent Training

2026-04-14 · Sohyun An, Shuibenyang Yuan, Hayeon Lee, Cho-Jui Hsieh, Alexander Min

General AI

Reinforcement Learning (RL) has shown strong potential for optimizing search agents in complex information retrieval tasks. However, existing approaches predominantly rely on gold supervision, such as ground-truth answers, which is difficult to scale. To address this limitation, we propose Cycle-Consistent Search (CCS)…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning

2026-04-16 · Zihan Liang, Yufei Ma, Ben Chen, Zhipeng Qian, Huangyu Dai, Lingtao Mao, Xuxin Zhang, Chenyi Lei, Wenwu Ou

General AI

Reinforcement learning has emerged as an effective paradigm for training large language models to perform search-augmented reasoning. However, existing approaches rely on trajectory-level rewards that cannot distinguish precise search queries from vague or redundant ones within a rollout group, and collapse to a near-z…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

MCSC-Bench: Multimodal Context-to-Script Creation for Realistic Video Production

2026-04-16 · Huanran Hu, Zihui Ren, Dingyi Yang, Liangyu Chen, Qixiang Gao, Tiezheng Ge, Qin Jin

General AI

Real-world video creation often involves a complex reasoning workflow of selecting relevant shots from noisy materials, planning missing shots for narrative completeness, and organizing them into coherent storylines. However, existing benchmarks focus on isolated sub-tasks and lack support for evaluating this full proc…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

2026-04-16 · Hao Gao, Shaoyu Chen, Yifan Zhu, Yuehao Song, Wenyu Liu, Qian Zhang, Xinggang Wang

General AI

High-level autonomous driving requires motion planners capable of modeling multimodal future uncertainties while remaining robust in closed-loop interactions. Although diffusion-based planners are effective at modeling complex trajectory distributions, they often suffer from stochastic instabilities and the lack of cor…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

A multimodal and temporal foundation model for virtual patient representations at healthcare system scale

2026-04-20 · Andrew Zhang, Tong Ding, Sophia J. Wagner, Caiwei Tian, Ming Y. Lu, Rowland Pettit, Joshua E. Lewis, Alexandre Misrahi, Dandan Mo, Long Phi Le, Faisal Mahmood

General AI

Modern medicine generates vast multimodal data across siloed systems, yet no existing model integrates the full breadth and temporal depth of the clinical record into a unified patient representation. We introduce Apollo, a multimodal temporal foundation model trained and evaluated on over three decades of longitudinal…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

Ask Only When Needed: Proactive Retrieval from Memory and Skills for Experience-Driven Lifelong Agents

2026-04-22 · Yuxuan Cai, Jie Zhou, Qin Chen, Liang He, Wei Li, Xin Li, Bo Zhang

Research Track A · General AI

Online lifelong learning enables agents to accumulate experience across interactions and continually improve on long-horizon tasks. However, existing methods typically treat retrieval from past experience as a passive operation, triggering it only at task initialization or after completing a step. Consequently, agents …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model

2026-04-22 · Inclusion AI, Tiwei Bie, Haoxing Chen, Tieyuan Chen, Zhenglin Cheng, Long Cui, Kai Gan, Zhicheng Huang, Zhenzhong Lan, Haoquan Li, Jianguo Li, Tao Lin, Qi Qin, Hongjun Wang, Xiaomei Wang, Haoyuan Wu, Yi Xin, Junbo Zhao

General AI

We present LLaDA2.0-Uni, a unified discrete diffusion large language model (dLLM) that supports multimodal understanding and generation within a natively integrated framework. Its architecture combines a fully semantic discrete tokenizer, a MoE-based dLLM backbone, and a diffusion decoder. By discretizing continuous vi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Model

2026-04-22 · Qiguang Chen, Chengyu Luan, Jiajun Wu, Qiming Yu, Yi Yang, Yizhuo Li, Jingqi Tong, Xiachong Feng, Libo Qin, Wanxiang Che

General AI

Large vision-language models (LVLMs) have made substantial advances in reasoning tasks at the Olympiad level. Nevertheless, current Olympiad-level multimodal reasoning benchmarks for these models often emphasize single-image analysis and fail to exploit contextual information across multiple images. We present OMIBench…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

Where and What: Reasoning Dynamic and Implicit Preferences in Situated Conversational Recommendation

2026-04-22 · Dongding Lin, Jian Wang, Yongqi Li, Wenjie Li

General AI

Situated conversational recommendation (SCR), which utilizes visual scenes grounded in specific environments and natural language dialogue to deliver contextually appropriate recommendations, has emerged as a promising research direction due to its close alignment with real-world scenarios. Compared to traditional reco…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

A Multimodal Text- and Graph-Based Approach for Open-Domain Event Extraction from Documents

2026-04-23 · Praval Sharma

General AI

Event extraction is essential for event understanding and analysis. It supports tasks such as document summarization and decision-making in emergency scenarios. However, existing event extraction approaches have limitations: (1) closed-domain algorithms are restricted to predefined event types and thus rarely generaliz…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models

2026-04-23 · Chee Wei Tan, Yuchen Wang, Shangxin Guo

General AI

This paper introduces a new paradigm for AI game programming, leveraging large language models (LLMs) to extend and operationalize Claude Shannon's taxonomy of game-playing machines. Central to this paradigm is Nemobot, an interactive agentic engineering environment that enables users to create, customize, and deploy L…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

BERAG: Bayesian Ensemble Retrieval-Augmented Generation for Knowledge-based Visual Question Answering

2026-04-24 · Jinghong Chen, Jingbiao Mei, Guangyu Yang, Bill Byrne

General AI

A common approach to question answering with retrieval-augmented generation (RAG) is to concatenate documents into a single context and pass it to a language model to generate an answer. While simple, this strategy can obscure the contribution of individual documents, making attribution difficult and contributing to th…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling

2026-04-28 · Jianghao Lin, Zi Ling, Chenyu Zhou, Tianyi Xu, Ruoqing Jiang, Zizhuo Wang, Dongdong Ge

General AI

Optimization modeling underpins real-world decision-making in logistics, manufacturing, energy, and public services, but reliably solving such problems from natural-language requirements remains challenging for current large language models (LLMs). In this paper, we propose \emph{Agora-Opt}, a modular agentic framework…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

RADD: Retrieval-Augmented Discrete Diffusion for Multi-Modal Knowledge Graph Completion

2026-04-28 · Guanglin Niu, Bo Li

General AI

Most multi-modal knowledge graph completion (MMKGC) models use one embedding scorer to do both retrieval over the full entity set and final decision making. We argue that this coupling is a core bottleneck: global high-recall search and local fine-grained disambiguation require different inductive biases. Therefore, we…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

Recursive Multi-Agent Systems

2026-04-28 · Xiyuan Yang, Jiaru Zou, Rui Pan, Ruizhong Qiu, Pan Lu, Shizhe Diao, Jindong Jiang, Hanghang Tong, Tong Zhang, Markus J. Buehler, Jingrui He, James Zou

General AI

Recursive or looped language models have recently emerged as a new scaling axis by iteratively refining the same model computation over latent states to deepen reasoning. We extend such scaling principle from a single model to multi-agent systems, and ask: Can agent collaboration itself be scaled through recursion? To …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

Toward Multimodal Conversational AI for Age-Related Macular Degeneration

2026-04-28 · Ran Gu, Benjamin Hou, Mélanie Hébert, Asmita Indurkar, Yifan Yang, Emily Y. Chew, Tiarnán D. L. Keenan, Zhiyong Lu

General AI

Despite strong performance of deep learning models in retinal disease detection, most systems produce static predictions without clinical reasoning or interactive explanation. Recent advances in multimodal large language models (MLLMs) integrate diagnostic predictions with clinically meaningful dialogue to support clin…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

AgentSim: A Platform for Verifiable Agent-Trace Simulation

2026-04-29 · Saber Zerhoudi, Michael Granitzer, Jelena Mitrovic

General AI

Training trustworthy agentic LLMs requires data that shows the grounded reasoning process, not just the final answer. Existing datasets fall short: question-answering data is outcome-only, chain-of-thought data is not tied to specific documents, and web-agent datasets track interface actions rather than the core retrie…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

Three-Step Nav: A Hierarchical Global-Local Planner for Zero-Shot Vision-and-Language Navigation

2026-04-29 · Wanrong Zheng, Yunhao Ge, Laurent Itti

General AI

Breakthrough progress in vision-based navigation through unknown environments has been achieved by using multimodal large language models (MLLMs). These models can plan a sequence of motions by evaluating the current view at each time step against the task and goal given to the agent. However, current zero-shot Vision-…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption

2026-04-30 · Yanting Wang, Chenlong Yin, Ying Chen, Jinyuan Jia

General AI

Long-context large language models (LLMs)-for example, Gemini-3.1-Pro and Qwen-3.5-are widely used to empower many real-world applications, such as retrieval-augmented generation, autonomous agents, and AI assistants. However, security remains a major concern for their widespread deployment, with threats such as prompt…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

CustomDancer: Customized Dance Recommendation by Text-Dance Retrieval

2026-05-01 · Yawen Qin, Ke Qiu, Qin Zhang

General AI

Dance serves as both a cultural cornerstone and a medium for personal expression, yet the rapid growth of online dance content has made personalized discovery increasingly difficult. Text-based dance retrieval offers a natural interface for users to search with choreographic intent, but it remains underexplored because…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

RunAgent: Interpreting Natural-Language Plans with Constraint-Guided Execution

2026-05-01 · Arunabh Srivastava, Mohammad A., Khojastepour, Srimat Chakradhar, Sennur Ulukus

General AI

Humans solve problems by executing targeted plans, yet large language models (LLMs) remain unreliable for structured workflow execution. We propose RunAgent, a multi-agent plan execution platform that interprets natural-language plans while enforcing stepwise execution through constraints and rubrics. RunAgent bridges …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

Using Large Language Models in Physics Education

2026-05-22 · Jonah R. Donaldson, Aliya Navaz, Konstantinos Doran, Alysta Lim, Mario Campanelli

General AI

The rapid advancement of Large Language Models (LLMs) has introduced new possibilities and challenges in physics education, necessitating rigorous evaluation of their capabilities as both problem solvers and automated assessors. This paper presents the results of three complementary studies that evaluated frontier mode…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

TALAN: Task-Aligned Latent Adaptation Networks for Targeted Post-Training of Large Language Models

2026-06-05 · Chengkai Zhang, Ziteng Liu, Junpu Wang, Zeyi Tao, Yang Wang, Sagar Chordia, Qin Huang

General AI

Targeted post-training aims to improve reasoning, math, and code without degrading strengths. Low-rank adapters are efficient but task-global; activation interventions are input-aware but often require separate probes, vectors, or inference-time steering. We introduce TALAN (Task-Aligned Latent Adaptation Networks), a …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

2026-06-11 · Seokju Cho, Ryo Hachiuma, Abhishek Badki, Hang Su, Byung-Kwan Lee, Chan Hee Song, Sifei Liu, Subhashree Radhakrishnan, Seungryong Kim, Yu-Chiang Frank Wang, Min-Hung Chen

General AI

Spatial reasoning, the ability to determine where objects are, how they relate, and how they move in 3D, remains a fundamental challenge for vision-language models (VLMs). Tool-augmented agents attempt to address this by augmenting VLMs with specialist perception modules, yet their effectiveness is bounded by the actio…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

DYNA : Dynamic Episodic Memory Networks for Augmenting Large Language Models with Temporal Knowledge Graphs in Continuous Learning

2026-06-14 · Ali Sarabadani, Mahtab Tajvidiyan

Research Track A · General AI

Large Language Models (LLMs) struggle to incorporate new knowledge without forgetting or costly retraining. We propose DYNA, a lightweight framework that augments a frozen LLM with a temporal knowledge graph where events are nodes and temporal relations are directed, timestamped edges. The graph serves as an external, …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

Compositional Reasoning Depth Predicts Clinical AI Failure: Empirical Evidence Consistent with Transformer Compositionality Limits in Electronic Health Record Question Answering

2026-06-15 · Sanjay Basu

General AI

Aggregate accuracy benchmarks conceal a systematic structure in how large language models fail at electronic health record (EHR) question answering: questions requiring more inferential steps produce disproportionately more errors. Motivated by theoretical results on transformer compositionality limits, we introduce a …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

Exploring Extrinsic and Intrinsic Properties for Effective Reasoning with Code Interpreter

2026-06-15 · Patomporn Payoungkhamdee, Napat Laosaengpha, Jenta Wonglertsakul, Pittawat Taveekitworachai, Pume Tuchinda, Panjapong Poobanchuen, Ekapol Chuangsuwanich, Can Udomcharoenchaikit, Samuel Cahyawijaya, Peerat Limkonchotiwat, Sarana Nutanong

General AI

Reasoning with a Code Interpreter (CI) has emerged as an effective paradigm for enhancing the reasoning capabilities of large language models (LLMs) through executable computation and iterative verification. Despite its growing adoption, the behavioral properties underlying effective code reasoning remain largely under…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

Learning from the Self-future: On-policy Self-distillation for dLLMs

2026-06-16 · Yifu Luo, Zeyu Chen, Haoyu Wang, Xinhao Hu, Yuxuan Zhang, Zhizhou Sha, Shiwei Liu

General AI

On-policy self-distillation (OPSD) has proven effective for post-training large language models (LLMs), yet its application to diffusion LLMs (dLLMs) remains unexplored. Existing OPSD methods are inherently autoregressive-centric. They inject privileged information via left-to-right prefix conditioning with token-level…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

The Measurement Gap in the Automation of EU Law: Benchmarking Doctrinal Legal Reasoning under the EU AI Act

2026-06-16 · Michèle Finck

General AI

Large language models now produce legal text of at least median quality, yet no existing benchmark can evaluate whether they perform doctrinal legal reasoning, which forms the interpretive core of legal work, rather than the ancillary, paralegal tasks that most current legal-AI evaluations measure. This measurement gap…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

Diffusion-Proof: Recipe for Formal Theorem Proving Beyond Auto-Regressive Generation

2026-06-17 · Ruida Wang, Rui Pan, Pengcheng Wang, Shizhe Diao, Tong Zhang

General AI

Enhancing the formal math reasoning capabilities of Large Language Models (LLMs) has become a key focus in both mathematical and computer science communities in recent years. While significant progress has been made in using state-of-the-art Auto-Regressive (AR) LLMs for formal theorem proving, these models suffer from…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.3

Enhancing Decision-Making with Large Language Models through Multi-Agent Fictitious Play

2026-06-17 · Leyang Shen, Yang Zhang, Xiaoyan Zhao, Chun Kai Ling, Tat-Seng Chua

General AI

Large language model (LLM)-based multi-agent systems (MAS) have demonstrated great potential in solving tasks with execution complexity, by distributing subtasks across cooperative agents. However, this divide-and-conquer paradigm falls short on decision-making tasks that are also prevalent in the real world. These tas…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.2

Accuracy and Satisfaction in Multi-Turn LLM Dialogues for NFR Assessment

2026-06-23 · Ali Pourghasemi Fatideh, Wilder Baldwin, Maria Dhakal, Collin McMillan, Sepideh Ghanavati

General AI

LLM-based dialogue assistants have become mainstream tools for software developers, yet current evaluation benchmarks focus exclusively on functional correctness. This leaves a critical gap in assessing the quality and accuracy of these conversations when handling Non-Functional Requirements (NFRs), which are inherentl…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.0

Fast Spatial Memory with Elastic Test-Time Training

2026-04-08 · Ziqiao Ma, Xueyang Yu, Haoyu Zhen, Yuncong Yang, Joyce Chai, Chuang Gan

Research Track A · General AI

Large Chunk Test-Time Training (LaCT) has shown strong performance on long-context 3D reconstruction, but its fully plastic inference-time updates remain vulnerable to catastrophic forgetting and overfitting. As a result, LaCT is typically instantiated with a single large chunk spanning the full input sequence, falling…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.0

Tree Learning: A Multi-Skill Continual Learning Framework for Humanoid Robots

2026-04-14 · Yifei Yan, Linqi Ye

Research Track A · General AI

As reinforcement learning for humanoid robots evolves from single-task to multi-skill paradigms, efficiently expanding new skills while avoiding catastrophic forgetting has become a key challenge in embodied intelligence. Existing approaches either rely on complex topology adjustments in Mixture-of-Experts (MoE) models…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.0

SCM: Sleep-Consolidated Memory with Algorithmic Forgetting for Large Language Models

2026-04-22 · Saish Sachin Shinde

Research Track A · General AI

We present SCM (Sleep-Consolidated Memory), a research preview of a memory architecture for large language models that draws on neuroscientific principles to address a fundamental limitation in current systems: the absence of persistent, structured, and biologically plausible memory. Existing approaches rely on truncat…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.0

Catastrophic Forgetting as Accessibility Collapse: A Three-Level Framework for Knowledge Persistence in Continual Learning

2026-06-04 · Ayushman Trivedi, Bhavika Melwani

Research Track A

Catastrophic forgetting is commonly interpreted as the irreversible erasure of previously acquired knowledge during sequential learning. In this work, we investigate an alternative perspective: that forgetting may arise not from complete destruction of task representations but from a loss of accessibility to preserved …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.0

Revisiting Prototype Rehearsal for Exemplar-Free Continual Learning: Manifold-Aware Boundary Sampling with Adaptive Class-Balanced Loss

2026-06-04 · Hongye Xu, Bartosz Krawczyk

Research Track A · General AI

Exemplar-free class-incremental learning (EFCIL) aims to acquire new classes over time without storing raw data. Historically, prototype rehearsal, which samples around stored class prototypes and mixes them with current-task data, has been a popular strategy to reduce catastrophic forgetting. However, recent drift-com…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.0

Two-Way Is Better Than One: Bidirectional Alignment with Cycle Consistency for Exemplar-Free Class-Incremental Learning

2026-06-04 · Hongye Xu, Bartosz Krawczyk

Research Track A · General AI

Continual learning (CL) seeks models that acquire new skills without erasing prior knowledge. In exemplar-free class-incremental learning (EFCIL), this challenge is amplified because past data cannot be stored, making representation drift for old classes particularly harmful. Prototype-based EFCIL is attractive for its…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.0

Continual Quadruped Robots Coordination via Semantic Skill Discovery

2026-06-06 · Daoqing Wang, Yuchen Xiao, Weixuan Huang, Zhilong Zhang, Shenghua Wan, Meng Li, Lei Yuan, Yang Yu

Research Track A · General AI

Multi-quadruped coordination has attracted increasing attention due to its enhanced payload capacity, broader contact coverage, and improved adaptability to challenging tasks. Existing methods for multi-quadruped manipulation typically focus on predefined or closed task families, often relying on multi-agent reinforcem…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 15.0

LingxiDiagBench: A Multi-Agent Framework for Benchmarking LLMs in Chinese Psychiatric Consultation and Diagnosis

2026-06-11 · Shihao Xu, Tiancheng Zhou, Jiatong Ma, Yanli Ding, Yiming Yan, Ming Xiao, Guoyi Li, Haiyang Geng, Yunyun Han, Jianhua Chen, Yafeng Deng

General AI

Mental disorders are highly prevalent worldwide, but the shortage of psychiatrists and the inherent subjectivity of interview-based diagnosis create substantial barriers to timely and consistent mental-health assessment. Progress in AI-assisted psychiatric diagnosis is constrained by the absence of benchmarks that simu…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.0

MOSAIC: Modality-Specific Adaptation for Incremental Continual Learning in Parkinson's Disease Gait Assessment

2026-06-11 · Minlin Zeng, Zhipeng Zhou, Yang Qiu, Martin J. McKeown, Zhiqi Shen

Research Track A · General AI

Gait-based Parkinson's disease assessment increasingly relies on heterogeneous sensors, but clinical systems rarely collect all modalities simultaneously. New sensors may arrive through device upgrades, protocol changes, or multi-center deployment, while historical patient data are often unavailable because of privacy …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.0

The Stable Recovery Manifold: Geometric Principles Governing Recoverability in Continual Learning

2026-06-11 · Ayushman Trivedi, Bhavika Melwani

Research Track A · General AI

Catastrophic forgetting is often viewed as the destruction of previously learned knowledge during sequential learning. Building on the Accessibility Collapse framework, we investigate the geometric structure of recoverability in continual learning. Using Split CIFAR-100 and a sequentially trained ResNet-18, we analyze …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.0

Understanding Cross-Modal Contributions in Continual Vision-Language Models: A Theoretical Perspective

2026-06-12 · Salimeh Sekeh, Mary Wisell

Research Track A · General AI

Continual vision-language models are commonly addressed through sequential fine-tuning; however, although this paradigm enables adaptation to new environments (tasks), it inherently emphasizes the contribution of previously learned environments (tasks) at the expense of the stability required to preserve previously acq…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.0

Few-Shot Domain Incremental Learning via Continual Vision-Language Consolidation

2026-06-29 · Naeem Paeedeh, Mahardhika Pratama, Wolfgang Mayer, Mukesh Prasad, Weiping Ding, Yew-Soon Ong

Research Track A · General AI

Existing domain-incremental learning (DIL) strategies call for massive amounts of data to adapt to new domains and suffer from the overfitting problem in the case of data scarcity. This paper puts forward a relatively uncharted problem, namely, few-shot domain incremental learning (FSDIL), taking into account the probl…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.9

CFAgentBench: A Reproducible Environment and Benchmark for Autonomous Construction-Finance Agents

2026-06-20 · Rishi Srivastava

Research Track B · General AI

We introduce CFAgentBench, a reproducible, self-hostable environment and benchmark for autonomous construction-finance agents: a CFO/controller-class agent operating across the real software stack a US construction finance team runs - ERP, project management, email, documents, pay applications, payroll, certified payro…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.8

Beyond Negative Rollouts: Positive-Only Policy Optimization with Implicit Negative Gradients

2026-05-07 · Mingwei Xu, Hao Fang

General AI

Reinforcement learning with verifiable rewards (RLVR), due to the deterministic verification, becomes a dominant paradigm for enhancing the reasoning ability of large language models (LLMs). The community witnesses the rapid change from the Proximal Policy Optimization (PPO) to Group Relative Policy Optimization (GRPO)…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.8

GlazyBench: A Benchmark for Ceramic Glaze Property Prediction and Image Generation

2026-05-07 · Ziyu Zhai, Siyou Li, Juexi Shao, Juntao Yu

General AI

Developing ceramic glazes is a costly, time-consuming process of trial and error due to complex chemistry, placing a significant burden on independent artists. While recent advances in multimodal AI offer a modern solution, the field lacks the large-scale datasets required to train these models. We propose GlazyBench, …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.8

SkillOS: Learning Skill Curation for Self-Evolving Agents

2026-05-07 · Siru Ouyang, Jun Yan, Yanfei Chen, Rujun Han, Zifeng Wang, Bhavana Dalvi Mishra, Rui Meng, Chun-Liang Li, Yizhu Jiao, Kaiwen Zha, Maohao Shen, Vishy Tirumalashetty, George Lee, Jiawei Han, Tomas Pfister, Chen-Yu Lee

General AI

LLM-based agents are increasingly deployed to handle streaming tasks, yet they often remain one-off problem solvers that fail to learn from past interactions. Reusable skills distilled from experience provide a natural substrate for self-evolution, where high-quality skill curation serves as the key bottleneck. Existin…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.8

Fill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Models

2026-05-12 · Yanting Miao, Yutao Sun, Dexin Wang, Mengyu Zhou, Pascal Poupart, Lei Lv, Qi Zhao, Li Wang, Hao Li, Xiaoxi Jiang, Guanjun Jiang

General AI

Visual latent reasoning lets a multimodal large language model (MLLM) create intermediate visual evidence as continuous tokens, avoiding external tools or image generators. However, existing methods usually follow an output-as-input latent paradigm and yield unstable gains. We identify evidence for a feature-space mism…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.8

TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection

2026-05-12 · Tom Sander, Hongyan Chang, Tomáš Souček, Tuan Tran, Valeriu Lacatusu, Sylvestre-Alvise Rebuffi, Alexandre Mourachko, Surya Parimi, Christophe Ropers, Rashel Moritz, Vanessa Stark, Hady Elsahar, Pierre Fernandez

General AI

We introduce TextSeal, a state-of-the-art watermark for large language models. Building on Gumbel-max sampling, TextSeal introduces dual-key generation to restore output diversity, along with entropy-weighted scoring and multi-region localization for improved detection. It supports serving optimizations such as specula…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.8

Agentic-VLA: Efficient Online Adaptation for Vision-Language-Action Models

2026-05-21 · Ruofan Jin, Zaixi Zhang

General AI

Vision-Language-Action (VLA) models have emerged as a promising paradigm for robotic manipulation by leveraging pre-trained vision-language representations. However, current VLA training methods suffer from two critical limitations: poor generalization to novel environments and low training efficiency requiring extensi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.8

ETCHR: Editing To Clarify and Harness Reasoning

2026-05-22 · Beichen Zhang, Yuhong Liu, Jinsong Li, Yuhang Zang, Jiaqi Wang, Dahua Lin

General AI

Multimodal Large Language Models have advanced visual reasoning, yet a purely textual chain of thought remains a bottleneck for questions that require fine-grained focus or view transformations. The ''think with images'' paradigm narrows this gap, but existing approaches are either constrained by fixed predefined toolk…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.8

LoMo: Local Modality Substitution for Deeper Vision-Language Fusion

2026-05-28 · Feng Han, Zhixiong Zhang, Zheming Liang, Yibin Wang, Jiaqi Wang

General AI

Vision-Language Models (VLMs) have achieved substantial progress across a wide range of understanding and reasoning tasks, driven by large-scale image-text training aimed at multimodal fusion. Ideally, replacing a textual question with its rendered-image counterpart should leave model performance essentially unaffected…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.8

MedCase-Structured: A Text-to-FHIR Dataset for Benchmarking Diagnostic Reasoning in Clinically Realistic EHR Settings

2026-05-28 · Valentina Bui Muti, Eugénie Dulout, Ziquan Fu

General AI

Large language models (LLMs) show promise for clinical reasoning and decision support, but evaluation in realistic, electronic health record-congruent settings remains limited. Existing benchmarks often rely on static datasets or unstructured inputs that do not reflect the structured, interoperable data formats used in…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.8

NaRA: Noise-Aware LoRA for Parameter-Efficient Fine-Tuning of Diffusion LLMs

2026-05-28 · Shuaidi Wang, Zhan Zhuang, Ruping Huang, Yu Zhang

General AI

Diffusion Large Language Models (dLLMs) have emerged as a promising non-autoregressive generative paradigm. Given the prohibitive computational cost of full fine-tuning, Parameter-Efficient Fine-Tuning (PEFT) has become the standard approach. However, existing PEFT methods (e.g., LoRA), originally tailored for autoregr…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.8

VideoFDB: Evaluating Full-Duplex Vision-Speech Capabilities in Conversational Agents

2026-05-28 · Amrita Mazumdar, Seonwook Park, Rajarshi Roy, Nikhil Srihari, Shengze Wang, Yuhao Zhou, Julia Wang, Koki Nagano, Shalini De Mello

General AI

Natural human conversation is full-duplex and audio-visual: people simultaneously speak and listen while continuously interpreting and producing nonverbal cues, such as nods, smiles, and gestures. To support successful human-agent interaction, agents must model full-duplex audiovisual conversation; however, existing fu…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.8

Multi-Agent Transactive Memory

2026-06-18 · To Eun Kim, Xuhong He, Dishank Jain, Ambuj Agrawal, Negar Arabzadeh, Fernando Diaz

Research Track B · General AI

The decentralized deployment of LLM agents with diverse capabilities across diverse tasks motivates infrastructure for knowledge sharing across heterogeneous agent populations. Just as search engines index human-generated artifacts to support human problem solving, retrieval systems can organize agent-generated artifac…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.8

GROW$^2$: Grounding Which and Where for Robot Tool Use

2026-06-29 · Yuhong Deng, Yuyao Liu, David Hsu

General AI

Can the robot use a plate to cut a cake if no knife is available? Tool use greatly expands robot capabilities, but to use tools creatively beyond their intended functions, the robot faces the challenge of $\textit{open-world affordance grounding}$: select an open-category object to act as a tool and localize its specif…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.8

Toward Secure and Reliable PDDL Formalization of Large Language Models with Planner-in-the-Loop Feedback

2026-06-29 · Jiamei Jiang, Jiajing Zhang, Feifei Mo, Linjing Li, Daniel Zeng

General AI

Planning often requires symbolic specifications that are both executable and verifiable. For large language models deployed in autonomous or decision-support systems, failures in such formalization may lead to unverifiable decisions, execution failures, or unsafe downstream behavior. We present NL-PDDL-Bench, a multi-d…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.8

ISM:Self-Improving Strategy Memory for Continual Mathematical Reasoning

2026-06-30 · Prakhar Dixit, Tim Oates

Research Track A · General AI

We propose Intelligent Schema Memory (ISM), a self-evolving memory-augmented system that improves mathematical reasoning for a frozen LLM under continual learning with hard episodic resets. ISM maintains a compact, self-refined bank of strategy schemas learned from both successful and failed episodes, with symbolic too…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.8

Knowledge Distillation from Large Reasoning Models to Compact Student Models: A Case Study on the John O Bryan Mathematics Competition

2026-06-30 · Gaurab Baral, Aaditya Khanal, Yangyang Tao, Junxiu Zhou

General AI

This paper investigates knowledge distillation from a large reasoning model (DeepSeek-R1) to a compact student model (Qwen2.5-7B). Using historical problems from the John O'Bryan Mathematics Competition at Northern Kentucky University (2011-2025), we build a Chain-of-Thought (CoT) training corpus through a dual-agent f…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.6

AgentsCAD: Automated Design for Manufacturing of FDM Parts via Multi-Agent LLM Reasoning and Geometric Feature Recognition

2026-07-02 · Emmanuel George, Christopher Keefe, Peter Pak, Amir Barati Farimani

General AI

Parts manufactured with Fused Deposition Modeling (FDM) often require Design for Additive Manufacturing (DFAM) modifications to ensure printability, structural integrity, and reduced post-processing. Current slicers identify defects such as steep overhangs but are unable to modify the underlying geometry. This work pre…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.5

Lifelong Embodied Navigation Learning

2026-03-06 · Xudong Wang, Jiahua Dong, Baichen Liu, Qi Lyu, Lianqing Liu, Zhi Han

Research Track A · General AI

Embodied navigation agents powered by large language models have shown strong performance on individual tasks but struggle to continually acquire new navigation skills, which suffer from catastrophic forgetting. We formalize this challenge as lifelong embodied navigation learning (LENL), where an agent is required to a…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.5

Deconfounded Lifelong Learning for Autonomous Driving via Dynamic Knowledge Spaces

2026-03-15 · Jiayuan Du, Yuebing Song, Yiming Zhao, Xianghui Pan, Jiawei Lian, Yuchu Lu, Liuyi Wang, Chengju Liu, Qijun Chen

Research Track A · General AI

End-to-End autonomous driving (E2E-AD) systems face challenges in lifelong learning, including catastrophic forgetting, difficulty in knowledge transfer across diverse scenarios, and spurious correlations between unobservable confounders and true driving intents. To address these issues, we propose DeLL, a Deconfounded…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.5

Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes

2026-03-26 · Yuqian Fu, Haohuan Huang, Kaiwen Jiang, Yuanheng Zhu, Dongbin Zhao

General AI

On-policy distillation (OPD) is appealing for large language model (LLM) post-training because it evaluates teacher feedback on student-generated rollouts rather than fixed teacher traces. In long-horizon settings, however, the common sampled-token variant is fragile: it reduces distribution matching to a one-token sig…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.5

Towards a Medical AI Scientist

2026-03-30 · Hongtao Wu, Boyun Zheng, Dingjie Song, Yu Jiang, Jianfeng Gao, Lei Xing, Lichao Sun, Yixuan Yuan

General AI

Autonomous systems that generate scientific hypotheses, conduct experiments, and draft manuscripts have recently emerged as a promising paradigm for accelerating discovery. However, existing AI Scientists remain largely domain-agnostic, limiting their applicability to clinical medicine, where research is required to be…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.5

DriveDreamer-Policy: A Geometry-Grounded World-Action Model for Unified Generation and Planning

2026-04-02 · Yang Zhou, Xiaofeng Wang, Hao Shao, Letian Wang, Guosheng Zhao, Jiangnan Shao, Jiagang Zhu, Tingdong Yu, Zheng Zhu, Guan Huang, Steven L. Waslander

General AI

Recently, world-action models (WAM) have emerged to bridge vision-language-action (VLA) models and world models, unifying their reasoning and instruction-following capabilities and spatio-temporal world modeling. However, existing WAM approaches often focus on modeling 2D appearance or latent representations, with limi…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.5

ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement

2026-04-02 · Difan Jiao, Qianfeng Wen, Blair Yang, Zhenwei Tang, Ashton Anderson

General AI

We introduce ThinkTwice, a simple two-phase framework that jointly optimizes LLMs to solve reasoning problems and refine the answers, based on Group Relative Policy Optimization (GRPO). In each pair of training steps, ThinkTwice first optimizes the model on solving reasoning problems, then optimizes it on refining its …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.5

WebForge: Breaking the Realism-Reproducibility-Scalability Trilemma in Browser Agent Benchmark

2026-04-13 · Peng Yuan, Yuyang Yin, Yuxuan Cai, Zheng Wei

Research Track B · General AI

Existing browser agent benchmarks face a fundamental trilemma: real-website benchmarks lack reproducibility due to content drift, controlled environments sacrifice realism by omitting real-web noise, and both require costly manual curation that limits scalability. We present WebForge, the first fully automated framewor…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.5

KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance

2026-04-14 · Linhao Yu, Tianmeng Yang, Siyu Ding, Renren Jin, Naibin Gu, Xiangzhao Hao, Shuaiyi Nie, Deyi Xiong, Weichong Yin, Yu Sun, Hua Wu

General AI

RLVR improves reasoning in large language models, but its effectiveness is often limited by severe reward sparsity on hard problems. Recent hint-based RL methods mitigate sparsity by injecting partial solutions or abstract templates, yet they typically scale guidance by adding more tokens, which introduce redundancy, i…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.5

Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

2026-04-14 · NVIDIA, Aakshita Chandiramani, Aaron Blakeman, Abdullahi Olaoye, Abhibha Gupta, Abhilash Somasamudramath, Abhinav Khattar, Adeola Adesoba, Adi Renduchintala, Adil Asif, Aditya Agrawal, Aditya Vavre, Ahmad Kiswani, Aishwarya Padmakumar, Ajay Hotchandani, Akanksha Shukla, Akhiad Bercovich, Aleksander Ficek, Aleksandr Shaposhnikov, Alex Gronskiy, Alex Kondratenko, Alex Neefus, Alex Steiner, Alex Yang, Alexander Bukharin, Alexander Young, Ali Hatamizadeh, Ali Taghibakhshi, Alina Galiautdinova, Alisa Liu, Alok Kumar, Ameya Sunil Mahabaleshwarkar, Amir Klein, Amit Zuker, Amnon Geifman, Anahita Bhiwandiwalla, Ananth Subramaniam, Andrew Tao, Anjaney Shrivastava, Anjulie Agrusa, Ankur Srivastava, Ankur Verma, Ann Guan, Anna Shors, Annamalai Chockalingam, Anubhav Mandarwal, Aparnaa Ramani, Arham Mehta, Arti Jain, Arun Venkatesan, Asha Anoosheh, Ashwath Aithal, Ashwin Poojary, Asif Ahamed, Asit Mishra, Asli Sabanci Demiroz, Asma Kuriparambil Thekkumpate, Atefeh Sohrabizadeh, Avinash Kaur, Ayush Dattagupta, Barath Subramaniam Anandan, Bardiya Sadeghi, Barnaby Simkin, Ben Lanir, Benedikt Schifferer, Benjamin Chislett, Besmira Nushi, Bilal Kartal, Bill Thiede, Bita Darvish Rouhani, Bobby Chen, Boris Ginsburg, Brandon Norick, Branislav Kisacanin, Brian Yu, Bryan Catanzaro, Buvaneswari Mani, Carlo del Mundo, Chankyu Lee, Chanran Kim, Chantal Hwang, Chao Ni, Charles Wang, Charlie Truong, Cheng-Ping Hsieh, Chenhan Yu, Chenjie Luo, Cherie Wang, Chetan Mungekar, Chintan Patel, Chris Alexiuk, Chris Holguin, Chris Wing, Christian Munley, Christopher Parisien, Chuck Desai, Chunyang Sheng, Collin Neale, Cyril Meurillon, Dakshi Kumar, Dan Gil, Dan Su, Dane Corneil, Daniel Afrimi, Daniel Burkhardt Eliuth Triana, Daniel Egert, Daniel Fatade, Daniel Lo, Daniel Rohrer, Daniel Serebrenik, Daniil Sorokin, Daria Gitman, Daria Levy, Darko Stosic, David Edelsohn, David Messina, David Mosallanezhad, David Tamok, Deena Donia, Deepak Narayanan, Devin O'Kelly, Dheeraj Peri, Dhruv Nathawani, Di Wu, Dima Rekesh, Dina Yared, Divyanshu Kakwani, Dmitry Konyagin Brandon Tuttle, Dong Ahn, Dongfu Jiang, Dorrin Poorkay, Douglas O'Flaherty, Duncan Riach, Dusan Stosic, Dustin Van Stee, Edgar Minasyan, Edward Lin, Eileen Peters Long, Elad Segal, Elena Lantz, Elena Lewis, Ellie Evans, Elliott Ning, Eric Chung, Eric Harper, Eric Pham-Hung, Eric W. Tramel, Erick Galinkin, Erik Pounds, Esti Etrog, Evan Briones, Evan Wu, Evelina Bakhturina, Evgeny Tsykunov, Ewa Dobrowolska, Farshad Saberi Movahed, Farzan Memarian, Fay Wang, Fei Jia, Felipe Soares, Felipe Vieira Frujeri, Feng Chen, Fengguang Lin, Ferenc Galko, Fortuna Zhang, Frankie Siino, Frida Hou, Gantavya Bhatt, Gargi Prasad, Geethapriya Venkataramani, Geetika Gupta, George Armstrong, Gerald Shen, Giulio Borghesi, Gordana Neskovic, Gorkem Batmaz, Grace Lam, Grace Wu, Greg Pauloski, Greyson Davis, Grigor Nalbandyan, Guoming Zhang, Guy Farber, Guyue Huang, Haifeng Qian, Haran Kumar Shiv Kumar, Harry Kim, Harsh Sharma, Hayate Iso, Hayley Ross, Herbert Hum, Herman Sahota, Hexin Wang, Himanshu Soni, Hiren Upadhyay, Huy Nguyen, Iain Cunningham, Ido Galil, Ido Shahaf, Igino Padovani, Igor Gitman, Igor Shovkun, Ikroop Dhillon, Ilya Loshchilov, Ingrid Kelly, Itamar Schen, Itay Levy, Ivan Moshkov, Izik Golan, Izzy Putterman, Jain Tu, Jan Baczek, Jan Kautz, Jane Polak Scowcroft, Janica Rosenberg, Jared Casper, Jarrod Pflum, Jason Grant, Jason Sewall, Jatin Mitra, Jeffrey Glick, Jenny Chen, Jesse Oliver, Jiacheng Xu, Jiafan Zhu, Jialin Song, Jian Zhang, Jiaqi Zeng, Jie Lou, Jill Milton, Jim Chow, Jimmy Zhang, Jinhang Choi, Jining Huang, Jocelyn Huang, Joel Caruso, Joey Conway, Joey Guman, Johan Jatko, John Kamalu, Johnny Greco, Jonathan Cohen, Jonathan Raiman, Joseph Jennings, Joyjit Daw, Juan Yu, Julio Tapia, Junkeun Yi, Jupinder Parmar, Jyothi Achar, Kari Briski, Kartik Mattoo, Katherine Cheung, Katherine Luna, Keith Wyss, Kevin Shih, Kezhi Kong, Khanh Nguyen, Khushi Bhardwaj, Kirill Buryak, Kirthi Shankar Sivamani, Konstantinos Krommydas, Kris Murphy, Krishna C. Puvvada, Krzysztof Pawelec, Kumar Anik, Laikh Tewari, Laya Sleiman, Leo Du, Leon Derczynski, Li Ding, Lilach Ilan, Lingjie Wu, Lizzie Wei, Luis Vega, Lun Su, Maarten Van Segbroeck, Maer Rodrigues de Melo, Magaret Zhang, Mahan Fathi, Makesh Narsimhan Sreedhar, Makesh Sreedhar, Makesh Tarun Chandran, Manuel Reyes Gomez, Maor Ashkenazi, Marc Cuevas, Marc Romeijn, Margaret Zhang, Mark Cai, Mark Gabel, Markus Kliegl, Martyna Patelka, Maryam Moosaei, Matthew Varacalli, Matvei Novikov, Mauricio Ferrato, Mehrzad Samadi, Melissa Corpuz, Meng Xin, Mengdi Wang, Mengru Wang, Meredith Price, Micah Schaffer, Michael Andersch, Michael Boone, Michael Evans, Michael Z Wang, Miguel Martinez, Mikail Khona, Mike Chrzanowski, Mike Hollinger, Mingyuan Ma, Minseok Lee, Mohammad Dabbah, Mohammad Shoeybi, Mostofa Patwary, Nabin Mulepati, Nader Khalil, Najeeb Nabwani, Nancy Agarwal, Nanthini Balasubramaniam, Narimane Hennouni, Narsi Kodukula, Natalie Hereth, Nathaniel Pinckney, Nave Assaf, Negar Habibi, Nestor Qin, Neta Zmora, Netanel Haber, Nick Reamaroon, Nickson Quak, Nidhi Bhatia, Nikhil Jukar, Nikki Pope, Nikolai Ludwig, Nima Tajbakhsh, Nir Ailon, Nirmal Juluru, Nirmalya De, Nowel Pitt, Oleg Rybakov, Oleksii Hrinchuk, Oleksii Kuchaiev, Olivier Delalleau, Oluwatobi Olabiyi, Omer Ullman Argov, Omri Almog, Omri Puny, Oren Tropp, Otavio Padovani, Ouye Xie, Parth Chadha, Pasha Shamis, Paul Gibbons, Pavlo Molchanov, Peter Belcak, Peter Jin, Pinky Xu, Piotr Januszewski, Pooya Jannaty, Prachi Shevate, Pradeep Thalasta, Pranav Prashant Thombre, Prasoon Varshney, Prerana Gambhir, Pritam Gundecha, Przemek Tredak, Qing Miao, Qiyu Wan, Quan Tran Minh, Rabeeh Karimi Mahabadi, Rachel Oberman, Rachit Garg, Rahul Kandu, Raina Zhong, Ran El-Yaniv, Ran Zilberstein, Rasoul Shafipour, Renee Yao, Renjie Pi, Richard Mazzarese, Richard Wang, Rick Izzo, Ridhima Singla, Rima Shahbazyan, Rishabh Garg, Ritika Borkar, Ritu Gala, Riyad Islam, Robert Clark, Robert Hesse, Roger Waleffe, Rohit Varma Kalidindi, Rohit Watve, Roi Koren, Ron Fan, Ruchika Kharwar, Ruisi Cai, Ruoxi Zhang, Russell J. Hewett, Ryan Prenger, Ryan Timbrook, Ryota Egashira, Sadegh Mahdavi, Sagar Singh Ashutosh Joshi, Sahil Modi, Samuel Kriman, Sandeep Pombra, Sanjay Kariyappa, Sanjeev Satheesh, Santiago Pombo, Saori Kaji, Satish Pasumarthi, Saurav Mishra, Saurav Muralidharan, Scott Hara, Sean Narenthiran, Sebastian Rogawski, Seonjin Na, Seonmyeong Bak, Sepehr Sameni, Seth Poulos, Shahar Mor, Shantanu Acharya, Shaona Ghosh Adam Lord, Sharath Turuvekere Sreenivas, Shaun Kotek, Shaya Gharghabi, Shelby Thomas, Sheng-Chieh Lin, Shibani Likhite, Shiqing Fan, Shiyang Chen, Shreya Gopal, Shrimai Prabhumoye, Shubham Pachori, Shubham Toshniwal, Shuo Zhang, Shuoyang Ding, Shyam Renjith, Shyamala Prayaga, Siddhartha Jain, Simeng Sun, Sirisha Rella, Sirshak Das, Smita Ithape, Sneha Harishchandra S, Somshubra Majumdar, Soumye Singhal, Sri Harsha Singudasu, Sriharsha Niverty, Stas Sergienko, Stefana Gloginic, Stefania Alborghetti, Stephen Ge, Stephen McCullough, Sugam Dipak Devare, Suguna Varshini Velury, Sukrit Rao, Sumeet Kumar Barua, Sunny Gai, Suseella Panguluri, Sushil Koundinyan, Swathi Patnam, Sweta Priyadarshi, Swetha Bhendigeri, Syeda Nahida Akter, Sylendran Arunagiri, Tailling Yuan, Talor Abramovich, Tan Bui, Tan Yu, Terry Kong, Thanh Do, Thomas Gburek, Thorgane Marques, Tiffany Moore, Tijmen Blankevoort, Tim Moon, Timothy Ma, Tiyasa Mitra, Tomasz Grzegorzek, Tomer Asida, Tomer Bar Natan, Tomer Keren, Tomer Ronen, Traian Rebedea, Trenton Starkey, Tugrul Konuk, Twinkle Vashishth, Tyler Condensa, Udi Karpas, Ushnish De, Vahid Noorozi, Vahid Noroozi, Vanshil Atul Shah, Veena Vaidyanathan, Venkat Srinivasan, Venmugil Elango, Victor Cui, Vijay Korthikanti, Vikas Mehta, Virginia Adams, Virginia Wu, Vitaly Kurin, Vitaly Lavrukhin, Vladimir Anisimov, Wan Seo, Wanli Jiang, Wasi Uddin Ahmad, Wei Du, Wei Ping, Wei-Ming Chen, Wendy Quan, Wenliang Dai, Wenwen Gao, Will Jennings, William Zhang, Xiaowei Ren, Xiaowen Xin, Xin Li, Yang Yu, Yangyi Chen, Yaniv Galron, Yashaswi Karnati, Yejin Choi, Yev Meyer, Yi-Fu Wu, Yian Zhang, Ying Lin, Yonatan Geifman, Yonggan Fu, Yoshi Suhara, Youngeun Kwon, Yuan Zhang, Yuki Huang, Zach Moshe, Zhilin Wang, Zhiyu Cheng, Zhongbo Zhu, Zhuolin Yang, Zihan Liu, Zijia Chen, Zijie Yan, Zuhair Ahmed

General AI

We describe the pre-training, post-training, and quantization of Nemotron 3 Super, a 120 billion (active 12 billion) parameter hybrid Mamba-Attention Mixture-of-Experts model. Nemotron 3 Super is the first model in the Nemotron 3 family to 1) be pre-trained in NVFP4, 2) leverage LatentMoE, a new Mixture-of-Experts arch…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.5

ReConText3D: Replay-based Continual Text-to-3D Generation

2026-04-15 · Muhammad Ahmed Ullah Khan, Muhammad Haris Bin Amir, Didier Stricker, Muhammad Zeshan Afzal

Research Track A · General AI

Continual learning enables models to acquire new knowledge over time while retaining previously learned capabilities. However, its application to text-to-3D generation remains unexplored. We present ReConText3D, the first framework for continual text-to-3D generation. We first demonstrate that existing text-to-3D model…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.5

LongAct: Harnessing Intrinsic Activation Patterns for Long-Context Reinforcement Learning

2026-04-16 · Bowen Ping, Zijun Chen, Tingfeng Hui, Qize Yu, Chenxuan Li, Junchi Yan, Baobao Chang

General AI

Reinforcement Learning (RL) has emerged as a critical driver for enhancing the reasoning capabilities of Large Language Models (LLMs). While recent advancements have focused on reward engineering or data synthesis, few studies exploit the model's intrinsic representation characteristics to guide the training process. I…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.5

AI scientists produce results without reasoning scientifically

2026-04-20 · Martiño Ríos-García, Nawaf Alampara, Chandan Gupta, Indrajeet Mandal, Sajid Mannan, Ali Asghar Aghajani, N. M. Anoop Krishnan, Kevin Maik Jablonka

General AI

Large language model (LLM)-based systems are increasingly deployed to conduct scientific research autonomously, yet whether their reasoning adheres to the epistemic norms that make scientific inquiry self-correcting is poorly understood. Here, we evaluate LLM-based scientific agents across eight domains, spanning workf…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.5

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

2026-04-20 · Guanting Dong, Junting Lu, Junjie Huang, Wanjun Zhong, Longxiang Liu, Shijue Huang, Zhenyu Li, Yang Zhao, Xiaoshuai Song, Xiaoxi Li, Jiajie Jin, Yutao Zhu, Hanbin Wang, Fangyu Lei, Qinyu Luo, Mingyang Chen, Zehui Chen, Jiazhan Feng, Ji-Rong Wen, Zhicheng Dou

General AI

Large language models are increasingly expected to serve as general-purpose agents that interact with external, stateful tool environments. The Model Context Protocol (MCP) and broader agent skills offer a unified interface for connecting agents with scalable real-world services, but training robust agents remains limi…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.5

DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

2026-04-21 · Venus Team, Sunhao Dai, Yong Deng, Jinzhen Lin, Yusheng Song, Guoqing Wang, Xiaofeng Wu, Yuqi Zhou, Shuo Yang, Zhenzhe Ying, Zhanwei Zhang, Changhua Meng, Weiqiang Wang

General AI

Edge-scale deep research agents based on small language models are attractive for real-world deployment due to their advantages in cost, latency, and privacy. In this work, we study how to train a strong small deep research agent under limited open-data by improving both data quality and data utilization. We present DR…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.5

Discovering Agentic Safety Specifications from 1-Bit Danger Signals

2026-04-25 · Víctor Gallego

General AI

Can large language model agents discover hidden safety objectives through experience alone? We introduce EPO-Safe (Experiential Prompt Optimization for Safe Agents), a framework where an LLM iteratively generates action plans, receives sparse binary danger warnings, and evolves a natural language behavioral specificati…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.5

Sequential Learning and Catastrophic Forgetting in Differentiable Resistor Networks

2026-05-02 · Maniru Ibrahim

Research Track A

Differentiable physical networks provide a simple setting in which learning can be studied through the interaction between trainable parameters and physical equilibrium constraints. We investigate sequential learning in differentiable resistor networks governed by Kirchhoff's laws. Although individual input--output map…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.5

ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents

2026-05-12 · Xuhao Hu, Xi Zhang, Haiyang Xu, Kyle Qiao, Jingyi Yang, Xuanjing Huang, Jing Shao, Ming Yan, Jieping Ye

Research Track B · General AI

Computer Use Agents (CUAs) can act through both atomic GUI actions, such as click and type, and high-level tool calls, such as API-based file operations, but this hybrid action space often leaves them uncertain about when to continue with GUI actions or switch to tools, leading to suboptimal execution paths. This diffi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.5

Unlocking Compositional Generalization in Continual Few-Shot Learning

2026-05-12 · Phu-Quy Nguyen-Lam, Phu-Hoa Pham, Dao Sy Duy Minh, Chi-Nguyen Tran, Huynh Trung Kiet, Long Tran-Thanh

Research Track A · General AI

Object-centric representations promise a key property for few-shot learning: Rather than treating a scene as a single unit, a model can decompose it into individual object-level parts that can be matched and compared across different concepts. In practice, this potential is rarely realized. Continual learners either co…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.5

SABER: Benchmarking Operational Safety of LLM Coding Agents in Stateful Project Workspaces

2026-05-31 · Qi Hu, Yifeng Tang, Qinghua Wang, Lanyang Zhao, Pengji Zhang, Yuhao Qing, Xin Yao, Dong Huang, Lin Zhang, Zhuoran Ji

General AI

Large language models are increasingly deployed as coding agents, shifting safety from individual responses to action sequences. Existing benchmarks, however, primarily assess whether models refuse unsafe prompts, leaving impacts on stateful workspaces largely unexamined. We present SABER, a benchmark for environment-a…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.5

OmniCap-IF: Benchmarking and Improving Instruction Following Abilities for Omni-Video Captioning

2026-06-07 · Jiahao Wang, An Ping, Yanghai Wang, Yuanxing Zhang, Shihao Li, Hanyan Bian, Yichi Ren, Yize Zhang, Han Wang, Haowen Chen, Junze Li, Jiaqi Wang, Yiyang Hu, Zhuze Xu, Zijie Zhang, Jiaheng Liu

General AI

While Omni-modal Large Language Models (OLLMs) have demonstrated impressive capabilities in jointly processing audio and visual streams, their ability to strictly adhere to complex, multi-faceted user instructions remains largely unexplored. Existing benchmarks primarily focus on holistic video understanding or text-on…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.5

Beyond Scalar Distances: Semantic Attribute Gradients from Frozen MLLMs for Visual Embeddings

2026-06-13 · Shubhang Bhatnagar, Dheeraj Baiju, Narendra Ahuja

General AI

Vision encoders for retrieval are typically trained with class-label supervision: each training pair reduces to a scalar that uniformly pushes the embedding apart or pulls it together, as if every visual attribute either differed or matched. A multimodal large language model (MLLM), shown the same pair, can articulate …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.5

ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions

2026-06-16 · Peixian Zhou, Yuxu Chen, Chaorui Zhang, Wei Han, Bo Bai, Xueyan Niu

General AI

Large language models perform increasingly well on standardized logical reasoning benchmarks, but whether this ability remains robust beyond English is unclear. We introduce ChLogic, an English--Chinese aligned benchmark that tests whether models preserve logical reasoning performance when the same latent logical struc…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.5

GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?

2026-06-16 · Tongxu Luo, Rongsheng Wang, Jiaxi Bi, Chenming Xu, Zhengyang Tang, Jianlong Chen, Juhao Liang, Ke Ji, Shuqi Guo, Yuhao Du, Fan Bu, Wenyu Du, Xiaotong Zhang, Kyle Li, Shaobo Wang, Linfeng Zhang, Yuxuan Liu, Xin Lai, Chenxin Li, Yiduo Guo, Zhexin Zhang, Xinyuan Wang, Tianyi Bai, Ziniu Li, Benyou Wang

General AI

Game generation is an emerging application of coding agents, requiring models to transform natural-language specifications into playable interactive systems. Unlike traditional coding tasks, game generation takes place within a game engine, where scripts, scenes, assets, rendering, and runtime interactions must jointly…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.5

PAIWorld: A 3D-Consistent World Foundation Model for Robotic Manipulation

2026-06-16 · Yuhang Huang, Xuan Lv, Junyan Xu, Zhiyuan Yu, Jiazhao Zhang, Ruizhen Hu, Wancheng Feng, Shilong Zou, Hewen Xiao, Ziqiao Zhou, Kaiyun Huang, Zhiyu Peng, Juzhan Xu, Hang Zhao, Chenyang Zhu, Renjiao Yi, Yifei Huang, Douhui Wu, Yan Zhang, Kexu Cheng, Chunhe Song, Yunzhi Xue, Xiuhong Zhang, Leitao Guo, Yunji Chen, Bin Wu, Haibin Yu, Kai Xu

General AI

World foundation models (WFMs) are powerful simulators, yet they predominantly operate in a single-view setting and lack the multi-view 3D consistency required for robotic manipulation. While robotic systems rely on multiple cameras (egocentric, eye-to-hand, and wrist-mounted) for policy learning, current multi-view wo…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.5

When Web Agents Finish but Still Fail: Reproducible Triggers and Trace Diagnostics for Parallel Web Exploration

2026-06-16 · Aagam Sogani, Botao Rui, Swetha Vaidyanathan, Rishi Agarwal, Minghao Yan, Shivaram Venkataraman

Research Track B · General AI

Long-horizon web agents often fail in ways hidden by final-answer evaluation: they may visit useful pages, produce a well-formed answer, and terminate confidently while still missing fields, over-including unsupported items, or relying on stale evidence. We study these failures with Parallel WebBench, a parallel web-ex…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.4

Black-Box Continual Learning for Vision-Language Models

2026-06-22 · Yuting Li, Weihang Fang, Haoyuan Gao, Linghe Kong, Yexin Li, Lichao Sun, Weiran Huang

Research Track A · General AI

The rapid deployment of Vision-Language Models (VLMs) in dynamic environments necessitates the ability to learn continuously without forgetting. However, traditional continual learning (CL) settings often rely on white-box paradigms, which is increasingly invalidated by the shift toward cloud-hosted models. In this pap…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.4

DREAM: Dense Retrieval Embeddings via Autoregressive Modeling

2026-06-23 · Yixuan Tang, Yi Yang

General AI

Dense retrieval embedding models are a fundamental component of modern retrieval-based AI systems. Most dense retrievers are trained with contrastive objectives, which require labeled positive and negative document pairs that are often costly and difficult to obtain. In this work, we investigate whether the autoregress…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.4

RoPE-Aware Bit Allocation for KV-Cache Quantization

2026-06-23 · Fengfeng Liang, Yuechen Zhang, Jiaya Jia

General AI

Existing low-bit KV-cache quantizers often treat each cached key as a flat vector. Under RoPE, however, a key's contribution to a future attention logit decomposes into a position-dependent sum over two-dimensional frequency blocks. This makes key-cache quantization a block-wise bit-allocation problem: high-energy RoPE…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.4

Improved Large Language Diffusion Models

2026-06-24 · Shen Nie, Qiyang Min, Shaoxuan Xu, Zihao Huang, Yuxuan Song, Yong Shan, Yankai Lin, Wayne Xin Zhao, Chongxuan Li, Ji-Rong Wen

General AI

Modern large language models are predominantly trained with autoregressive factorization and causal attention. We present iLLaDA, an 8B masked diffusion language model trained from scratch with fully bidirectional attention. iLLaDA keeps the masked diffusion objective throughout pre-training and supervised fine-tuning …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Moving Beyond Review: Applying Language Models to Planning and Translation in Reflection

2026-03-30 · Seyed Parsa Neshaei, Richard Lee Davis, Tanja Käser

General AI

Reflective writing is known to support the development of students' metacognitive skills, yet learners often struggle to engage in deep reflection, limiting learning gains. Although large language models (LLMs) have been shown to improve writing skills, their use as conversational agents for reflective writing has prod…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Unsafe2Safe: Controllable Image Anonymization for Downstream Utility

2026-03-30 · Mih Dinh, SouYoung Jin

General AI

Large-scale image datasets frequently contain identifiable or sensitive content, raising privacy risks when training models that may memorize and leak such information. We present Unsafe2Safe, a fully automated pipeline that detects privacy-prone images and rewrites only their sensitive regions using multimodally guide…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks

2026-03-31 · Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa, Kai Greshake, Hanshen Xiao, Chaowei Xiao, G. Edward Suh

General AI

AI agents, predominantly powered by large language models (LLMs), are vulnerable to indirect prompt injection, in which malicious instructions embedded in untrusted data can trigger dangerous agent actions. This position paper discusses our vision for system-level defenses against indirect prompt injection attacks. We …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

ActionParty: Multi-Subject Action Binding in Generative Video Games

2026-04-02 · Alexander Pondaven, Ziyi Wu, Igor Gilitschenski, Philip Torr, Sergey Tulyakov, Fabio Pizzati, Aliaksandr Siarohin

General AI

Recent advances in video diffusion have enabled the development of "world models" capable of simulating interactive environments. However, these models are largely restricted to single-agent settings, failing to control multiple agents simultaneously in a scene. In this work, we tackle a fundamental issue of action bin…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

CLEAR: Unlocking Generative Potential for Degraded Image Understanding in Unified Multimodal Models

2026-04-06 · Xiangzhao Hao, Zefeng Zhang, Zhenyu Zhang, Linhao Yu, Yao Chen, Yiqian Zhang, Haiyun Guo, Shuohuan Wang, Yu Sun

General AI

Image degradation from blur, noise, compression, and poor illumination severely undermines multimodal understanding in real-world settings. Unified multimodal models that combine understanding and generation within a single architecture are a natural fit for this challenge, as their generative pathway can model the fin…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Synthetic Sandbox for Training Machine Learning Engineering Agents

2026-04-06 · Yuhang Zhou, Lizhu Zhang, Yifan Wu, Jiayi Liu, Xiangjun Fan, Zhuokai Zhao, Hong Yan

General AI

As large language model agents advance beyond software engineering (SWE) tasks toward machine learning engineering (MLE), verifying agent behavior becomes orders of magnitude more expensive: while SWE tasks can be verified via fast-executing unit tests, MLE verification requires running full ML pipelines -- data prepro…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Vero: An Open RL Recipe for General Visual Reasoning

2026-04-06 · Gabriel Sarch, Linrong Cai, Qunzhong Wang, Haoyang Wu, Danqi Chen, Zhuang Liu

General AI

What does it take to build a visual reasoner that works across charts, science, spatial understanding, and open-ended tasks? The strongest vision-language models (VLMs) show such broad visual reasoning is within reach, but the recipe behind them remains unclear, locked behind proprietary reinforcement learning (RL) pip…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

"I See What You Did There": Can Large Vision-Language Models Understand Multimodal Puns?

2026-04-07 · Naen Xu, Jiayi Sheng, Changjiang Li, Chunyi Zhou, Yuyuan Li, Tianyu Du, Jun Wang, Zhihui Fu, Jinbao Li, Shouling Ji

General AI

Puns are a common form of rhetorical wordplay that exploits polysemy and phonetic similarity to create humor. In multimodal puns, visual and textual elements synergize to ground the literal sense and evoke the figurative meaning simultaneously. Although Vision-Language Models (VLMs) are widely used in multimodal unders…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

From Hallucination to Structure Snowballing: The Alignment Tax of Constrained Decoding in LLM Reflection

2026-04-07 · Hongxu Zhou

General AI

Intrinsic self-correction in Large Language Models (LLMs) frequently fails in open-ended reasoning tasks due to ``hallucination snowballing,'' a phenomenon in which models recursively justify early errors during free-text reflection. While structured feedback can mitigate this issue, existing approaches often rely on e…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Ads in AI Chatbots? An Analysis of How Large Language Models Navigate Conflicts of Interest

2026-04-09 · Addison J. Wu, Ryan Liu, Shuyue Stella Li, Yulia Tsvetkov, Thomas L. Griffiths

General AI

Today's large language models (LLMs) are trained to align with user preferences through methods such as reinforcement learning. Yet models are beginning to be deployed not merely to satisfy users, but also to generate revenue for the companies that created them through advertisements. This creates the potential for LLM…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks

2026-04-13 · Junlin Liu, Shengnan An, Shuang Zhou, Dan Ma, Shixiong Luo, Ying Xie, Yuan Zhang, Wenling Yuan, Yifan Zhou, Xiaoyu Li, Ziwen Wang, Xuezhi Cao, Xunliang Cai

General AI

Contemporary large language models (LLMs) have demonstrated remarkable reasoning capabilities, particularly in specialized domains like mathematics and physics. However, their ability to generalize these reasoning skills to more general and broader contexts--often termed general reasoning--remains under-explored. Unlik…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Visual Preference Optimization with Rubric Rewards

2026-04-14 · Ya-Qi Yu, Fangyu Hong, Xiangyang Qu, Hao Wang, Gaojie Wu, Qiaoyu Luo, Nuo Xu, Huixin Wang, Wuheng Xu, Yongxin Liao, Zihao Chen, Haonan Li, Ziming Li, Dezhi Peng, Minghui Liao, Jihao Wu, Haoyu Ren, Dandan Tu

General AI

The effectiveness of Direct Preference Optimization (DPO) depends on preference data that reflect the quality differences that matter in multimodal tasks. Existing pipelines often rely on off-policy perturbations or coarse outcome-based signals, which are not well suited to fine-grained visual reasoning. We propose rDP…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

ReviewGrounder: Improving Review Substantiveness with Rubric-Guided, Tool-Integrated Agents

2026-04-15 · Zhuofeng Li, Yi Lu, Dongfu Jiang, Haoxiang Zhang, Yuyang Bai, Chuan Li, Yu Wang, Shuiwang Ji, Jianwen Xie, Yu Zhang

Research Track A · General AI

The rapid rise in AI conference submissions has driven increasing exploration of large language models (LLMs) for peer review support. However, LLM-based reviewers often generate superficial, formulaic comments lacking substantive, evidence-grounded feedback. We attribute this to the underutilization of two key compone…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas

2026-04-16 · Emanuel Tewolde, Xiao Zhang, David Guzman Piedrahita, Vincent Conitzer, Zhijing Jin

General AI

It is increasingly important that LLM agents interact effectively and safely with other goal-pursuing agents, yet, recent works report the opposite trend: LLMs with stronger reasoning capabilities behave _less_ cooperatively in mixed-motive games such as the prisoner's dilemma and public goods settings. Indeed, our exp…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Learning to Think Like a Cartoon Captionist: Incongruity-Resolution Supervision for Multimodal Humor Understanding

2026-04-16 · Hatice Merve Vural, Doga Kukul, Ege Erdem Ozlu, Demir Ekin Arikan, Bob Mankoff, Erkut Erdem, Aykut Erdem

General AI

Humor is one of the few cognitive tasks where getting the reasoning right matters as much as getting the answer right. While recent work evaluates humor understanding on benchmarks such as the New Yorker Cartoon Caption Contest (NYCC), it largely treats it as black-box prediction, overlooking the structured reasoning p…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Do Vision-Language Models Truly Perform Vision Reasoning? A Rigorous Study of the Modality Gap

2026-04-17 · Yige Xu, Yongjie Wang, Zizhuo Wu, Kaisong Song, Jun Lin, Zhiqi Shen

General AI

Reasoning in vision-language models (VLMs) has recently attracted significant attention due to its broad applicability across diverse downstream tasks. However, it remains unclear whether the superior performance of VLMs stems from genuine vision-grounded reasoning or relies predominantly on the reasoning capabilities …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

From Benchmarking to Reasoning: A Dual-Aspect, Large-Scale Evaluation of LLMs on Vietnamese Legal Text

2026-04-17 · Van-Truong Le

General AI

The complexity of Vietnam's legal texts presents a significant barrier to public access to justice. While Large Language Models offer a promising solution for legal text simplification, evaluating their true capabilities requires a multifaceted approach that goes beyond surface-level metrics. This paper introduces a co…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Information Router for Mitigating Modality Dominance in Vision-Language Models

2026-04-17 · Seulgi Kim, Mohit Prabhushankar, Ghassan AlRegib

General AI

Vision Language models (VLMs) have demonstrated strong performance across a wide range of benchmarks, yet they often suffer from modality dominance, where predictions rely disproportionately on a single modality. Prior approaches primarily address this issue by steering model's attention allocation, implicitly assuming…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Long-Term Memory for VLA-based Agents in Open-World Task Execution

2026-04-17 · Xu Huang, Weixin Mao, Yinhao Li, Hua Chen, Jiabao Zhao

General AI

Vision-Language-Action (VLA) models have demonstrated significant potential for embodied decision-making; however, their application in complex chemical laboratory automation remains restricted by limited long-horizon reasoning and the absence of persistent experience accumulation. Existing frameworks typically treat p…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning

2026-04-20 · Xinyu Ma, Mingzhou Xu, Xuebo Liu, Chang Jin, Qiang Wang, Derek F. Wong, Min Zhang

General AI

Recent advancements in Reinforcement Learning with Verifiable Rewards (RLVR) have significantly improved Large Language Model (LLM) reasoning, yet models often struggle to explore novel trajectories beyond their initial latent space. While offline teacher guidance and entropy-driven strategies have been proposed to add…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language

2026-04-21 · Yi Zhong, Buqiang Xu, Yijun Wang, Zifei Shan, Shuofei Qiao, Guozhou Zheng, Ningyu Zhang

General AI

At present, executable visual workflows have emerged as a mainstream paradigm in real-world industrial deployments, offering strong reliability and controllability. However, in current practice, such workflows are almost entirely constructed through manual engineering: developers must carefully design workflows, write …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

AgentSOC: A Multi-Layer Agentic AI Framework for Security Operations Automation

2026-04-22 · Joyjit Roy, Samaresh Kumar Singh

General AI

Security Operations Centers (SOCs) increasingly encounter difficulties in correlating heterogeneous alerts, interpreting multi-stage attack progressions, and selecting safe and effective response actions. This study introduces AgentSOC, a multi-layered agentic AI framework that enhances SOC automation by integrating pe…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Benchmarking Source-Sensitive Reasoning in Turkish: Humans and LLMs under Evidential Trust Manipulation

2026-04-27 · Sercan Karakaş, Yusuf Şimşek

General AI

This paper investigates whether source trustworthiness shapes Turkish evidential morphology and whether large language models (LLMs) track this sensitivity. We study the past-domain contrast between -DI and -mIs in controlled cloze contexts where the information source is overtly external, while only its perceived reli…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Green Shielding: A User-Centric Approach Towards Trustworthy AI

2026-04-27 · Aaron J. Li, Nicolas Sanchez, Hao Huang, Ruijiang Dong, Jaskaran Bains, Katrin Jaradeh, Zhen Xiang, Bo Li, Feng Liu, Aaron Kornblith, Bin Yu

General AI

Large language models (LLMs) are increasingly deployed, yet their outputs can be highly sensitive to routine, non-adversarial variation in how users phrase queries, a gap not well addressed by existing red-teaming efforts. We propose Green Shielding, a user-centric agenda for building evidence-backed deployment guidanc…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

The Chameleon's Limit: Investigating Persona Collapse and Homogenization in Large Language Models

2026-04-27 · Yunze Xiao, Vivienne J. Zhang, Chenghao Yang, Ningshan Ma, Weihao Xuan, Jen-tse Huang

General AI

Applications based on large language models (LLMs), such as multi-agent simulations, require population diversity among agents. We identify a pervasive failure mode we term \emph{Persona Collapse}: agents each assigned a distinct profile nonetheless converge into a narrow behavioral mode, producing a homogeneous simula…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

CoRE: Concept-Reasoning Expansion for Continual Brain Lesion Segmentation

2026-04-28 · Qianqian Chen, Anglin Liu, Jingyang Zhang, Yudong Zhang

Research Track A · General AI

Accurate brain lesion segmentation in MRI is vital for effective clinical diagnosis and treatment planning. Due to high annotation costs and strict data privacy regulations, universal models require employing Continual Learning (CL) to adapt to evolving clinical tasks without losing previously acquired knowledge. Howev…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

ClawGym: A Scalable Framework for Building Effective Claw Agents

2026-04-29 · Fei Bai, Huatong Song, Shuang Sun, Daixuan Cheng, Yike Yang, Chuan Hao, Renyuan Li, Feng Chang, Yuan Wei, Ran Tao, Bryan Dai, Jian Yang, Wayne Xin Zhao

General AI

Claw-style environments support multi-step workflows over local files, tools, and persistent workspace states. However, scalable development around these environments remains constrained by the absence of a systematic framework, especially one for synthesizing verifiable training data and integrating it with agent trai…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models

2026-04-29 · Gongbo Zhang, Wen Wang, Ye Tian, Li Yuan

General AI

Diffusion large language models (dLLMs) offer parallel decoding and bidirectional context, but state-of-the-art dLLMs require billions of parameters for competitive performance. While existing distillation methods for dLLMs reduce inference steps within a single architecture, none address cross-architecture knowledge t…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Stable Behavior, Limited Variation: Persona Validity in LLM Agents for Urban Sentiment Perception

2026-04-30 · Neemias B da Silva, Rodrigo Minetto, Daniel Silver, Thiago H Silva

General AI

Large Language Models (LLMs) are increasingly used as proxies for human perception in urban analysis, yet it remains unclear whether persona prompting produces meaningful and reproducible behavioral diversity. We investigate whether distinct personas influence urban sentiment judgments generated by multimodal LLMs. Usi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Can Coding Agents Reproduce Findings in Computational Materials Science?

2026-05-01 · Ziyang Huang, Yi Cao, Ali K. Shargh, Jing Luo, Ruidong Mei, Mohd Zaki, Zhan Liu, Wyatt Bunstine, William Jurayj, Somdatta Goswami, Tyrel McQueen, Michael Shields, Jaafar El-Awady, Paulette Clancy, Benjamin Van Durme, Nicholas Andrews, William Walden, Daniel Khashabi

General AI

Large language models are increasingly deployed as autonomous coding agents and have achieved remarkably strong performance on software engineering benchmarks. However, it is unclear whether such success transfers to computational scientific workflows, where tasks require not only strong coding ability, but also the ab…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Make Your LVLM KV Cache More Lightweight

2026-05-01 · Xihao Chen, Yangyang Guo, Roger Zimmermann

General AI

Key-Value (KV) cache has become a de facto component of modern Large Vision-Language Models (LVLMs) for inference. While it enhances decoding efficiency in Large Language Models (LLMs), its direct adoption in LVLMs introduces substantial GPU memory overhead due to the large number of vision tokens processed during the …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Recover-LoRA for Aggressive Quantization: Reclaiming Accuracy in 2-Bit Language Models via Low-Rank Adaptation with Knowledge Distillation on Synthetic Data

2026-06-02 · Devleena Das, Rajeev Patwari, Elliott Delaye, Ashish Sirasao

General AI

Aggressive weight quantization to 2-bit precision offers substantial throughput and memory gains for large language model (LLM) inference, but typically incurs severe accuracy degradation. These gains are particularly relevant for edge and on-device deployment, where memory capacity and bandwidth are primary constraint…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

RunAgent SuperBrowser: A Theory of Autonomous Web Navigation Grounded in Human Browsing Behaviour

2026-06-08 · Radeen Mostafa, Sawradip Saha

Research Track B · General AI

We present SUPERBROWSER, an autonomous web-navigation agent designed against a single guiding hypothesis: a web agent should browse the way a person browses. A human reading a page does not retain every pixel they have seen; they look at a few candidate targets, decide on one, and remember only what is needed to keep t…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

SIGA: Self-Evolving Coding-Agent Adapters for Scientific Simulation

2026-06-08 · Matthew Ho, Brian Liu, Jixuan Chen, Audrey Wang, Lianhui Qin

General AI

Advanced scientific simulators expose specialized input languages that turn simulation goals into executable configurations, but learning them can cost domain scientists hours to days. We study simulator setup as a problem of agent-tool interface grounding: what minimal simulator-specific adaptations are needed for an …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

iOSWorld: A Benchmark for Personally Intelligent Phone Agents

2026-06-08 · Lawrence Keunho Jang, Mareks Woodside, Geronimo Carom, Andrew Keunwoo Jang, Jing Yu Koh, Ruslan Salakhutdinov

Research Track B · General AI

A useful phone agent needs to be personally intelligent. It should reason over a user's identity, history, and preferences as they exist on the device, not just follow isolated instructions in an impersonal sandbox. Existing mobile agent benchmarks lack this kind of personalization. We introduce iOSWorld, the first int…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Infini Memory: Maintainable Topic Documents for Long-Term LLM Agent Memory

2026-06-09 · Suozhao Ji, Baodong Wu, Zehao Wang, Lei Xia, Qingping Li, Ruisong Wang, Wenbo Ding, Zhenhua Zhu, Boxun Li, Guohao Dai, Yu Wang

General AI

Long-term LLM agents need persistent memory that can track changing facts and provide relevant evidence across sessions. Existing memory systems often store observations as isolated records, summaries, or indexed fragments, which makes evidence aggregation, fact revision, and memory maintenance difficult. We propose In…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

ReasonAlloc: Hierarchical Decoding-Time KV Cache Budget Allocation for Reasoning Models

2026-06-09 · Wenhao Liu, Hao Shi, Yunhe Li, Weizhi Fei, Xiangyuan Wang, Mengzhe Ruan, Hanxu Hou, Peisong Wang, Linqi Song, Shuang Qiu

General AI

Long chain-of-thought (CoT) trajectories in large language model (LLM) reasoning cause severe inference bottlenecks due to rapid key-value (KV) cache growth. Current decoding-time compression methods mitigate this issue via token eviction, but typically assume a uniform budget distribution across all layers and heads. …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

2026-06-10 · Michal Chudoba, Sergey Alyaev, Petra Galuscakova, Tomasz Wiktorski

General AI

There are two main Parameter-Efficient Fine-Tuning (PEFT) techniques for Large Language Models (LLMs). While Low-Rank Adaptation (LoRA) introduces additional weights between the LLM layers, Soft Prompting introduces additional fine-tuning-specific raw tokens to an LLM input. However, both require modification to the co…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

A Three-Layer Framework for AI in Scientific Discovery

2026-06-11 · Guojun Liao

General AI

Current discussions of AI in scientific discovery are often dominated by two visible capabilities: search over existing knowledge and execution through optimization, simulation, and automation. Both are important, but neither fully captures the central act of discovery: the formation and evolution of models. This paper…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

2026-06-11 · Zilin Xiao, Qi Ma, Chun-cheng Jason Chen, Xintao Chen, Avinash Atreya, Hanjie Chen, Vicente Ordonez

General AI

Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is poorly suited for complex reasoning tasks: a semantically similar problem may demand an entirely different solution strategy, wh…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Recursive Agent Harnesses

2026-06-11 · Elias Lumer, Sahil Sen, Kevin Paul, Vamse Kumar Subbiah

General AI

Recursive language models (RLMs) showed that recursion over model calls is an effective strategy for long-context reasoning, and production coding agents have begun to write code that spawns subagents at scale, most recently in Anthropic's dynamic workflows. We name and study the pattern between these two lines of work…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents

2026-06-11 · Zihao Wang, Yiming Li, Yutong Wu, Zheyu Liu, Kangjie Chen, Fok Kar Wai, Pin-Yu Chen, Vrizlynn L. L. Thing, Bo Li, Dacheng Tao, Tianwei Zhang

Research Track B · General AI

Web agents driven by large language models (LLMs) are increasingly deployed in real-world environments, where they operate over untrusted web content and execute actions with direct consequences. This makes them vulnerable to prompt-injection attacks, in which seemingly benign content embeds adversarial instructions th…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

LLM4RTL: Tool-Assisted LLM for RTL Generation

2026-06-13 · Jing Jin, Robert Chu, Ning Yan, Masood S. Mortazavi

General AI

Large language models (LLMs) have facilitated impressive progress in software engineering, code generation, tooling, and systems. Concurrently, a significant body of research has developed which explores a growing variety of methods and systems for applying LLMs to hardware and chip design (e.g., systems for RTL code g…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Evolution & Foundation: AI Shares Creative Control

2026-06-15 · Dylan Banarse, Stephen Todd, William Latham, Frederic Fol Leymarie

General AI

This paper investigates the creative process of automated design and artistic evaluation using an evolutionary system. We consider how a multimodal artificial intelligence (AI) model can communicate and guide a combined generative and evolutionary computational system. This creates a framework for the evolution of aest…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

FraudSMSWalker: Benchmarking Agentic Large Language Models for SMS-to-Webpage Fraud Detection

2026-06-15 · Y. H. Zhou, Z. M. Ma, Y. J. Zhou, Y. T. Li, H. X. Xiang, Y. M. Cheng, T. L. Chen, K. J. Zhang, Z. H. Nan, J. H. Ni, Z. Wu, Q. Y. Pan, S. Zhang, S. Cheng, M. Y. Luo

Research Track B · General AI

SMS fraud is increasingly cross-channel: a message directs the user to a webpage, and the final risk depends on how the SMS claim aligns with the page content and requested user action. However, existing evaluations either focus on message-only smishing classification or expose URL and domain cues that allow models to …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Evaluating Open-Source LLMs for Multi-Label ATT&CK Technique Classification on CTI Reports

2026-06-16 · Ahmed Ryan, Saad Sakib Noor, Md Erfan, Shaswata Mitra, Sudip Mittal, Md Rayhanur Rahman

General AI

Classifying Cyber Threat Intelligence (CTI) using MITRE Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK) is essential for proactive defense, but historically required extensive human effort. Pre-Large Language Model (LLM) automation sped up this process, but could not resolve the complex language and mult…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

The Stanford EDGAR Filings Dataset: Reconstructing U.S. Corporate and Financial Disclosures into Layout-Faithful and Token-Efficient Pretraining Data

2026-06-16 · Nick Bettencourt, Xiaowei Ding, Kay Giesecke

General AI

As high-quality public web corpora become increasingly exhausted, clean long-context documents have become a scarce and expensive source of training data for large language models (LLMs). Existing long-context corpora are often proprietary and costly to acquire, synthetically generated, or concentrated in narrow domain…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification

2026-06-16 · Wujian Peng, Lingchen Meng, Yuxuan Cai, Xianwei Zhuang, Yuhuan Yang, Rongyao Fang, Chenfei Wu, Junyang Lin, Zuxuan Wu, Shuai Bai

General AI

Unified Multimodal Modeling aims to integrate visual understanding and generation within a single system. However, existing approaches typically rely on two disparate visual tokenizers, which splits the representation space and hinders truly unified modeling. We propose UniAR, a unified autoregressive framework where a…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

A Unified Framework for Efficient Remote Sensing Visual Question Answering: Adapting Dual, Hybrid, and Encoder-Decoder Architectures

2026-06-17 · Timothy Agboada, Shikha Chandel, Yadav Raj Ghimire, Leila Hashemi-Beni

General AI

Visual Question Answering (VQA) in the Remote Sensing (RS) domain presents unique challenges due to the high resolution, multi scale object distribution, and semantic complexity of aerial imagery. While general domain Foundation Models have achieved remarkable success, their direct application to RSVQA is hindered by m…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Learning User Simulators with Turing Rewards

2026-06-17 · Yingshan Susan Wang, Cedegao E. Zhang, Linlu Qiu, Zexue He, Pengyuan Li, Alex Pentland, Roger P. Levy, Yoon Kim

General AI

Learning to simulate human users in interactive settings could advance the training of agent assistants, evaluation of personalization systems, research in the social sciences, and more. Existing approaches generally do so by training a large language model (LLM) to match a single ground truth response, either by maxim…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.3

Rethinking Reward Supervision: Rubric-Conditioned Self-Distillation

2026-06-17 · Siyi Gu, Jialin Chen, Sophia Zhou, Arman Cohan, Rex Ying

General AI

Post-training of reasoning language models is commonly driven by supervised distillation and reinforcement learning with verifiable rewards. Distillation often relies on chain-of-thought annotations that are expensive to obtain and may themselves be noisy, incomplete, or partially incorrect; even when the final solutio…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.2

Autodata: An agentic data scientist to create high quality synthetic data

2026-06-24 · Ilia Kulikov, Chenxi Whitehouse, Tianhao Wu, Yixin Nie, Swarnadeep Saha, Eryk Helenowski, Weizhe Yuan, Olga Golovneva, Jack Lanchantin, Yoram Bachrach, Jakob Foerster, Xian Li, Han Fang, Sainbayar Sukhbaatar, Jason Weston

General AI

We introduce Autodata, a general method that enables AI agents to act as data scientists who build high quality training and evaluation data. We show how to train (meta-optimize) such a data scientist agent, so that it learns to create even stronger data. We describe the overall formulation, and a specific practical im…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.0

Pruned Adaptation Modules: A Simple yet Strong Baseline for Continual Foundation Models

2026-03-22 · Elif Ceren Gok Yildirim, Murat Onur Yildirim, Joaquin Vanschoren

Research Track A · General AI

The continual learning literature has rapidly shifted from traditional class incremental learning (CIL) techniques to foundation model (FM)-based CIL methods without a clear understanding of how these newer approaches compare to strong, lightweight convolutional baselines. This abrupt transition has created a substanti…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.0

Chameleons do not Forget: Prompt-Based Online Continual Learning for Next Activity Prediction

2026-04-01 · Marwan Hassani, Tamara Verbeek, Sjoerd van Straten

Research Track A

Predictive process monitoring (PPM) focuses on predicting future process trajectories, including next activity predictions. This is crucial in dynamic environments where processes change or face uncertainty. However, current frameworks often assume a static environment, overlooking dynamic characteristics and concept d…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.0

The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook

2026-04-02 · Xinlei Yu, Zhangquan Chen, Yongbo He, Tianyu Fu, Cheng Yang, Chengming Xu, Yue Ma, Xiaobin Hu, Zhe Cao, Jie Xu, Guibin Zhang, Jiale Tao, Jiayi Zhang, Siyuan Ma, Kaituo Feng, Haojie Huang, Youxing Li, Ronghao Chen, Huacan Wang, Chenglin Wu, Zikun Su, Xiaogang Xu, Kelu Yao, Kun Wang, Chen Gao, Yue Liao, Ruqi Huang, Tao Jin, Cheng Tan, Jiangning Zhang, Wenqi Ren, Yanwei Fu, Yong Liu, Yu Wang, Xiangyu Yue, Yu-Gang Jiang, Shuicheng Yan

Research Track A · General AI

Latent space is rapidly emerging as a native substrate for language-based models. While modern systems are still commonly understood through explicit token-level generation, an increasing body of work shows that many critical internal processes are more naturally carried out in continuous latent space than in human-rea…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.0

When Modalities Remember: Continual Learning for Multimodal Knowledge Graphs

2026-04-03 · Linyu Li, Zhi Jin, Yichi Zhang, Dongming Jin, Yuanpeng He, Haoran Duan, Gadeng Luosang, Nyima Tashi

Research Track A · General AI

Real-world multimodal knowledge graphs (MMKGs) are dynamic, with new entities, relations, and multimodal knowledge emerging over time. Existing continual knowledge graph reasoning (CKGR) methods focus on structural triples and cannot fully exploit multimodal signals from new entities. Existing multimodal knowledge grap…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.0

A Faster Path to Continual Learning

2026-04-13 · Wei Li, Hangjie Yuan, Zixiang Zhao, Borui Kang, Ziwei Liu, Tao Feng

Research Track A

Continual Learning (CL) aims to train neural networks on a dynamic stream of tasks without forgetting previously learned knowledge. Among optimization-based approaches, C-Flat has emerged as a promising solution due to its plug-and-play nature and its ability to encourage uniformly low-loss regions for both new and old…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.0

Continual Learning for fMRI-Based Brain Disorder Diagnosis via Functional Connectivity Matrices Generative Replay

2026-04-15 · Qianyu Chen, Shujian Yu

Research Track A

Functional magnetic resonance imaging (fMRI) is widely used for studying and diagnosing brain disorders, with functional connectivity (FC) matrices providing powerful representations of large-scale neural interactions. However, existing diagnostic models are trained either on a single site or under full multi-site acce…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.0

AIM: Asymmetric Information Masking for Visual Question Answering Continual Learning

2026-04-16 · Peifeng Zhang, Zice Qiu, Donghua Yu, Shilei Cao, Juepeng Zheng, Yutong Lu, Haohuan Fu

Research Track A · General AI

In continual visual question answering (VQA), existing Continual Learning (CL) methods are mostly built for symmetric, unimodal architectures. However, modern Vision-Language Models (VLMs) violate this assumption, as their trainable components are inherently asymmetric. This structural mismatch renders VLMs highly pron…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.0

CI-CBM: Class-Incremental Concept Bottleneck Model for Interpretable Continual Learning

2026-04-16 · Amirhosein Javadi, Tuomas Oikarinen, Tara Javidi, Tsui-Wei Weng

Research Track A · General AI

Catastrophic forgetting remains a fundamental challenge in continual learning, in which models often forget previous knowledge when fine-tuned on a new task. This issue is especially pronounced in class incremental learning (CIL), which is the most challenging setting in continual learning. Existing methods to address …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.0

Tree of Concepts: Interpretable Continual Learners in Non-Stationary Clinical Domains

2026-04-18 · Dongkyu Cho, Xiyue Li, Samrachana Adhikari, Rumi Chunara

Research Track A · General AI

Continual learning aims to update models under distribution shift without forgetting, yet many high-stakes deployments, such as healthcare, also require interpretability. In practice, models that adapt well (e.g., deep networks) are often opaque, while models that are interpretable (e.g., decision trees) are brittle un…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.0

Lifecycle-Aware Federated Continual Learning in Mobile Autonomous Systems

2026-04-22 · Beining Wu, Jun Huang

Research Track A

Federated continual learning (FCL) allows distributed autonomous fleets to adapt collaboratively to evolving terrain types across extended mission lifecycles. However, current approaches face several key challenges: 1) they use uniform protection strategies that do not account for the varying sensitivities to forgettin…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.0

Temporally Extended Mixture-of-Experts Models

2026-04-22 · Zeyu Shen, Peter Henderson

Research Track A · General AI

Mixture-of-Experts models, now popular for scaling capacity at fixed inference speed, switch experts at nearly every token. Once a model outgrows available GPU memory, this churn can render optimizations like offloading and pre-fetching ineffective. We make the case that the options framework in reinforcement learning …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.0

Fine-Tuning Regimes Define Distinct Continual Learning Problems

2026-04-23 · Paul-Tiberiu Iordache, Elena Burceanu

Research Track A · General AI

Continual learning (CL) studies how models acquire tasks sequentially while retaining previously learned knowledge. Despite substantial progress in benchmarking CL methods, comparative evaluations typically keep the fine-tuning regime fixed. In this paper, we argue that the fine-tuning regime, defined by the trainable …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.0

OceanPile: A Large-Scale Multimodal Ocean Corpus for Foundation Models

2026-04-25 · Yida Xue, Ningyu Zhang, Tingwei Wu, Zhe Ma, Daxiong Ji, Zhao Wang, Guozhou Zheng, Huajun Chen

General AI

The vast and underexplored ocean plays a critical role in regulating global climate and supporting marine biodiversity, yet artificial intelligence has so far delivered limited impact in this domain due to a fundamental data bottleneck. Specifically, ocean data are highly fragmented across disparate sources and inheren…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.0

Leveraging LLMs for Multi-File DSL Code Generation: An Industrial Case Study

2026-04-27 · Sivajeet Chand, Kevin Nguyen, Peter Kuntz, Alexander Pretschner

Research Track A · General AI

Large language models (LLMs) perform strongly on general-purpose code generation, yet their applicability to enterprise domain-specific languages (DSLs) remains underexplored, especially for repository-scale change generation spanning multiple files and folder structures from a single natural-language (NL) instruction.…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.0

Learning to Forget: Continual Learning with Adaptive Weight Decay

2026-04-29 · Aditya A. Ramesh, Alex Lewandowski, Jürgen Schmidhuber

Research Track A · General AI

Continual learning agents with finite capacity must balance acquiring new knowledge with retaining the old. This requires controlled forgetting of knowledge that is no longer needed, freeing up capacity to learn. Weight decay, viewed as a mechanism for forgetting, can serve this role by gradually discarding information…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.0

T^2PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning

2026-05-04 · Haixin Wang, Hejie Cui, Chenwei Zhang, Xin Liu, Shuowei Jin, Shijie Geng, Xinyang Zhang, Nasser Zalmout, Zhenyu Shi, Yizhou Sun

General AI

Recent progress in multi-turn reinforcement learning (RL) has significantly improved reasoning LLMs' performances on complex interactive tasks. Despite advances in stabilization techniques such as fine-grained credit assignment and trajectory filtering, instability remains pervasive and often leads to training collapse…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.0

SeePhys Pro: Diagnosing Modality Transfer and Blind-Training Effects in Multimodal RLVR for Physics Reasoning

2026-05-10 · Kun Xiang, Terry Jingchen Zhang, Zirong Liu, Bokai Zhou, Yueling Tang, Junjie Yu, Jiacong Lu, Shangrui Huang, Heng Li, Likui Zhang, Kunkun Liu, Changzheng Zhang, Yangle Fang, Boqiang Guo, Hui-Ling Zhen, Dandan Tu, Yinya Huang, Xiaodan Liang

General AI

We introduce SeePhys Pro, a fine-grained modality transfer benchmark that studies whether models preserve the same reasoning capability when critical information is progressively transferred from text to image. Unlike standard vision-essential benchmarks that evaluate a single input form, SeePhys Pro features four sema…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.0

Reflective Prompt Tuning through Language Model Function-Calling

2026-05-20 · Farima Fatahi Bayat, Moin Aminnaseri, Pouya Pezeshkpour, Estevam Hruschka

General AI

Large language models (LLMs) have become increasingly capable of following instructions and complex reasoning, making prompting a flexible interface for adapting models without parameter updates. Yet prompt design remains labor-intensive and highly sensitive to formatting, phrasing, and instruction order, motivating au…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 14.0

SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research

2026-05-20 · Shuofei Qiao, Yunxiang Wei, Jiazheng Fan, Bin Wu, Busheng Zhang, Mengru Wang, Yuqi Zhu, Ningyu Zhang, Keyan Ding, Qiang Zhang, Huajun Chen

General AI

The exponential growth of global academic output has confronted researchers and AI agents with an unprecedented ``information explosion,'' where fragmented and unstructured knowledge organization impedes deep interdisciplinary integration. Current academic retrieval tools predominantly rely on superficial keyword match…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 14.0

Manufactured Confidence: How Memory Consolidation Turns Hearsay into Confident Facts

2026-06-28 · Alex Kwon

Research Track A · General AI

LLM agents carry conclusions across steps and sessions in compressed memory, and memory products (e.g., mem0, LangMem) rewrite conversation into stored "facts" that later steps trust. We show this rewriting manufactures confidence: across our constructed agent settings, a casual, hedged remark becomes a confident, date…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.8

Semantic Risk-Aware Heuristic Planning for Robotic Navigation in Dynamic Environments: An LLM-Inspired Approach

2026-05-04 · Hamza Ahmed Durrani, Rafay Suleman Durrani

General AI

The integration of Large Language Model (LLM) reasoning principles into classical robot path planning represents a rapidly emerging research direction. In this paper, we propose a Semantic Risk-Aware Heuristic (SRAH) planner that encodes LLM-inspired cost functions penalising geometrically cluttered or high-risk zones …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.8

Tool Use as Action: Towards Agentic Control in Mobile Core Networks

2026-05-04 · Purna Sai Garigipati, Onur Ayan, Kishor Chandra Joshi, Xueli An

General AI

Artificial Intelligence (AI) will play an essential role in 6G. It will fundamentally reshape the network architecture itself and drive major changes in the design of network entities, interfaces, and procedures. The adoption of agentic AI in next-generation networks is expected to enhance network intelligence and auto…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.8

Detecting Time Series Anomalies Like an Expert: A Multi-Agent LLM Framework with Specialized Analyzers

2026-05-07 · Hyeongwon Kang, Jeongseob Kim, Jinwoo Park, Pilsung Kang

General AI

Recent studies have explored large language models for time-series anomaly detection, yet existing approaches often rely on a single general-purpose model to directly infer anomaly indices or intervals, limiting controllability, interpretability, and reliability for complex anomaly patterns. We propose SAGE (Specialize…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.8

Patch2Vuln: Agentic Reconstruction of Vulnerabilities from Linux Distribution Binary Patches

2026-05-07 · Isaac David, Arthur Gervais

General AI

Security updates create a short but important window in which defenders and attackers can compare vulnerable and patched software. Yet in many operational settings, the most accessible artifacts are binary packages rather than source patches or advisory text. This paper asks whether a language-model agent, restricted t…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.8

Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval

2026-05-07 · Zeyu Yang, Qi Ma, Jason Chen, Anshumali Shrivastava

General AI

Retrieval-augmented agents are increasingly the interface to large organizational knowledge bases, yet most still treat retrieval as a black box: they issue exploratory queries, inspect returned snippets, and iteratively reformulate until useful evidence emerges. This approach resembles how a newcomer searches an unfam…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.8

C-CoT: Counterfactual Chain-of-Thought with Vision-Language Models for Safe Autonomous Driving

2026-05-11 · Kefei Tian, Yuansheng Lian, Kai Yang, Xiangdong Chen, Shen Li

General AI

Safety-critical planning in complex environments, particularly at urban intersections, remains a fundamental challenge for autonomous driving. Existing methods, whether rule-based or data-driven, frequently struggle to capture complex scene semantics, infer potential risks, and make reliable decisions in rare, high-ris…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.8

Formalize, Don't Optimize: The Heuristic Trap in LLM-Generated Combinatorial Solvers

2026-05-12 · Haoyu Wang, Yuliang Song, Tao Li, Zhiwei Deng, Yaqing Wang, Deepak Ramachandran, Eldan Cohen, Dan Roth

General AI

Large Language Models (LLMs) struggle to solve complex combinatorial problems through direct reasoning, so recent neuro-symbolic systems increasingly use them to synthesize executable solvers. A central design question is how the LLM should represent the solver, and whether it should also attempt to optimize search. We…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.8

LychSim: A Controllable and Interactive Simulation Framework for Vision Research

2026-05-12 · Wufei Ma, Chloe Wang, Siyi Chen, Jiawei Peng, Patrick Li, Alan Yuille

General AI

While self-supervised pretraining has reduced vision systems' reliance on synthetic data, simulation remains an indispensable tool for closed-loop optimization and rigorous out-of-distribution (OOD) evaluation. However, modern simulation platforms often present steep technical barriers, requiring extensive expertise in…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.8

OptArgus: A Multi-Agent System to Detect Hallucinations in LLM-based Optimization Modeling

2026-05-12 · Zhong Li, Zihan Guo, Xiaohan Lu, Juntao Wang, Jie Song, Chao Shen, Jiageng Wu, Mingyang Sun

Research Track A · General AI

Large language models (LLMs) are increasingly used to translate natural-language optimization problems into mathematical formulations and solver code, but matching the reference objective value is not a reliable test of correctness: an artifact may agree numerically while still changing the underlying optimization sema…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.8

SimGym: A Framework for A/B Test Simulation in E-Commerce with Traffic-Grounded VLM Agents

2026-05-19 · Han Li, Vibhor Malik, Zahra Zanjani Foumani, Alberto Castelo, Shuang Xie, Ailin Fan, Keat Yang Koay, Yuanzheng Zhu, Meysam Feghhi, Ronie Uliana, Zhaoyu Zhang, Angelo Ocana Martins, Mingyu Zhao, Francis Pelland, Jonathan Faerman, Nikolas LeBlanc, Aaron Glazer, Andrew McNamara, Zhong Wu, Lingyun Wang

Research Track B · General AI

A/B testing remains the gold standard for evaluating modifications to e-commerce storefronts, yet it diverts traffic, requires weeks to reach statistical significance, and risks degrading user experience. We present SimGym, a framework for simulating A/B tests on e-commerce storefronts using vision-language model (VLM)…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.8

CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolving Data Marketplaces

2026-05-22 · Joydeep Chandra

General AI

Temporal knowledge-graph data marketplaces face three coupled failures in static designs: stale hybrid index shortcuts reduce recall as edges evolve, stationary Shapley pricing misattributes value after distribution shifts, and uncoordinated agents over-consume a shared differential-privacy budget. We present CHRONOS, …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.8

CVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception

2026-05-22 · Liupeng Li, Haoqian Kang, Zhenyu Lu, Jinpeng Wang, Bin Chen, Ke Chen, Yaowei Wang

General AI

High-resolution (HR) image perception presents a key bottleneck for multimodal large language models (MLLMs). While visual search offers a promising solution, existing methods struggle with the trade-off between coverage and efficiency. Visual expert-assisted search is efficient but prone to blind spots when proposals …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.8

Decomposing Queries into Tool Calls for Long-Video Keyframe Retrieval

2026-05-22 · Michal Shlapentokh-Rothman, Prachi Garg, Yu-Xiong Wang, Derek Hoiem

General AI

Keyframe selection is a direct way to provide verifiable visual evidence for long-video question answering (QA). Queries differ in what they require, and finding the right frames depends on knowing what to look for. Existing keyframe selectors either score every frame against a single query, or decompose the query into…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.8

Self-Supervised Online Robot-Agnostic Traversability Estimation for Open-World Environments

2026-05-27 · Julia Hindel, Simon Bultmann, Houman Masnavi, Daniele Cattaneo, Abhinav Valada

Research Track A · General AI

Self-supervised online traversability estimation enables robots to continuously learn from unlabeled open-world experiences and adapt their navigation behavior toward safe and efficient trajectories. Existing approaches either rely on handcrafted proprioceptive traversability scores, limiting robot-agnosticism, or clus…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.8

GTA: Generating Long-Horizon Tasks for Web Agents at Scale

2026-05-28 · Tenghao Huang, Kung-Hsiang Huang, Prafulla Kumar Choubey, Yilun Zhou, Muhao Chen, Jonathan May, Chien-Sheng Wu

Research Track B · General AI

Web agents, which couple language models with browsing and tool-use capabilities, show promise as open web assistants. Yet progress is increasingly limited by the lack of scalable, process-level supervision. Existing benchmarks are largely manually constructed, providing only coarse start-goal annotations without inter…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.8

SoundnessBench: Can Your AI Scientist Really Tell Good Research Ideas from Bad Ones?

2026-05-28 · Sy-Tuyen Ho, Minghui Liu, Huy Nghiem, Furong Huang

General AI

Autonomous AI research agents aim to accelerate scientific discovery by automating the research pipeline, from hypothesis generation to peer review. However, existing benchmarks rarely test a fundamental bottleneck: whether Large Language Models can judge the methodological viability of a research idea before expending…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.8

Preference-Aware Rubric Learning for Personalized Evaluation

2026-05-29 · Yilun Qiu, Xiaoyan Zhao, Yang Zhang, Yuxin Chen, Cilin Yan, Jiayin Cai, Xiaolong Jiang, Yao Hu, Yoko Yamakata, Tat-Seng Chua

General AI

As Large Language Models (LLMs) evolve from general-purpose assistants to user-centric agents, personalization has become central to aligning model behavior with individual preferences, making the evaluation of personalized alignment a critical bottleneck. Existing evaluation methods-ranging from automatic metrics to L…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.8

Learning from Mistakes: Rollout-Retrieval Lifelong Policy Learning for Autonomous Driving

2026-06-29 · Cheng Gong, Haoyang Wang, Chao Lu, Zirui Li, Jianwei Gong

Research Track A · General AI

Autonomous driving policies should be able to improve continually as deployment exposes them to increasingly diverse and long-tail traffic situations. However, most learning-based policies are trained or fine-tuned on expert demonstrations and then rely largely on generalization to handle challenging closed-loop scenar…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.8

Uncertainty-Aware Generation and Decision-Making Under Ambiguity

2026-06-29 · Nico Daheim, Iryna Gurevych

General AI

With rapidly improving capabilities, Large Language Models (LLMs) are increasingly used in many complex real-world tasks. Beyond requiring in-depth knowledge and reasoning skills, many of these tasks exhibit a high degree of subjectivity and require that the outputs of the model can be trusted. While a lot of progress …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.6

RetailSMV: Exocentric vs. Egocentric Adaptation of Foundation Video World Models in Retail

2026-07-01 · Amirreza Rouhi, Rajat Aggarwal, Parikshit Sakurikar, Anoop M. Namboodiri, Sashi P. Reddi

General AI

Foundation video diffusion models are increasingly viewed as world simulators for embodied agents, yet their pretraining on internet-scale generic video leaves them poorly aligned with real-world deployment domains. We study parameter-efficient adaptation of a pretrained foundation video world model to retail scenes: w…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.6

Alignment Is All You Need For X-to-4D Generation

2026-07-02 · Qiaowei Miao, Kehan Li, Yawei Luo, Yi Yang

General AI

Generative diffusion models excel at synthesizing high-quality images, videos, and 3D content under multimodal control. However, arbitrary user-defined modality-to-4D (X-to-4D) generation remains challenging due to the high cost of constructing diverse datasets and the limited scalability of existing methods. This pape…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.6

TestEvo-Bench: An Executable and Live Benchmark for Test and Code Co-Evolution

2026-07-02 · Jiale Amber Wang, Kaiyuan Wang, Pengyu Nie

General AI

Software tests and code evolve together: a code change should be followed by new or updated tests that record the new software behavior. Yet existing test generation and update benchmarks often isolate the test from the code change, and rely on static metadata that does not verify whether a test is executable or semant…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization

2026-03-20 · Chiyu Ma, Shuo Yang, Kexin Huang, Jinda Lu, Haoming Meng, Shangshang Wang, Bolin Ding, Soroush Vosoughi, Guoyin Wang, Jingren Zhou

General AI

We present Future-KL Influenced Policy Optimization (FIPO), a reinforcement learning algorithm designed to overcome reasoning bottlenecks in large language models. While GRPO style training scales effectively, it typically relies on outcome-based rewards (ORM) that distribute a global advantage uniformly across every t…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

GEMS: Agent-Native Multimodal Generation with Memory and Skills

2026-03-30 · Zefeng He, Siyuan Huang, Xiaoye Qu, Yafu Li, Tong Zhu, Yu Cheng, Yang Yang

General AI

Recent multimodal generation models have achieved remarkable progress on general-purpose generation tasks, yet continue to struggle with complex instructions and specialized downstream tasks. Inspired by the success of advanced agent frameworks such as Claude Code, we propose GEMS (Agent-Native Multimodal GEneration wi…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation

2026-03-31 · Zhaohe Liao, Kaixun Jiang, Zhihang Liu, Yujie Wei, Junqiu Yu, Quanhao Li, Hong-Tao Yu, Pandeng Li, Yuzheng Wang, Zhen Xing, Shiwei Zhang, Chen-Wei Xie, Yun Zheng, Xihui Liu

General AI

Although image generation has boosted various applications via its rapid evolution, whether the state-of-the-art models are able to produce ready-to-use academic illustrations for papers is still largely unexplored. Directly comparing or evaluating the illustration with VLM is native but requires oracle multi-modal und…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.5

Dual-Imbalance Continual Learning for Real-World Food Recognition

2026-03-31 · Xiaoyan Zhang, Jiangpeng He

Research Track A · General AI

Visual food recognition in real-world dietary logging scenarios naturally exhibits severe data imbalance, where a small number of food categories appear frequently while many others occur rarely, resulting in long-tailed class distributions. In practice, food recognition systems often operate in a continual learning se…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery

2026-04-02 · Ao Qu, Han Zheng, Zijian Zhou, Yihao Yan, Yihong Tang, Shao Yong Ong, Fenglu Hong, Kaichen Zhou, Chonghe Jiang, Minwei Kong, Jiacheng Zhu, Xuan Jiang, Sirui Li, Cathy Wu, Bryan Kian Hsiang Low, Jinhua Zhao, Paul Pu Liang

General AI

Large language model (LLM)-based evolution is a promising approach for open-ended discovery, where progress requires sustained search and knowledge accumulation. Existing methods still rely heavily on fixed heuristics and hard-coded exploration rules, which limit the autonomy of LLM agents. We present CORAL, the first …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

ClawArena: Benchmarking AI Agents in Evolving Information Environments

2026-04-05 · Haonian Ji, Kaiwen Xiong, Siwei Han, Peng Xia, Shi Qiu, Yiyang Zhou, Jiaqi Liu, Jinlong Li, Bingzhou Li, Zeyu Zheng, Cihang Xie, Huaxiu Yao

General AI

AI agents deployed as persistent assistants must maintain correct beliefs as their information environment evolves. In practice, evidence is scattered across heterogeneous sources that often contradict one another, new information can invalidate earlier conclusions, and user preferences surface through corrections rath…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

SkillX: Automatically Constructing Skill Knowledge Bases for Agents

2026-04-06 · Chenxi Wang, Zhuoyun Yu, Xin Xie, Wuguannan Yao, Runnan Fang, Shuofei Qiao, Kexin Cao, Guozhou Zheng, Xiang Qi, Peng Zhang, Shumin Deng

General AI

Learning from experience is critical for building capable large language model (LLM) agents, yet prevailing self-evolving paradigms remain inefficient: agents learn in isolation, repeatedly rediscover similar behaviors from limited experience, resulting in redundant exploration and poor generalization. To address this …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

2026-04-06 · Chaoyou Fu, Haozhi Yuan, Yuhao Dong, Yi-Fan Zhang, Yunhang Shen, Xiaoxing Hu, Xueying Li, Jinsen Su, Chengwu Long, Xiaoyao Xie, Yongkang Xie, Xiawu Zheng, Xue Yang, Haoyu Cao, Yunsheng Wu, Ziwei Liu, Xing Sun, Caifeng Shan, Ran He

General AI

With the rapid advancement of video understanding, existing benchmarks are becoming increasingly saturated, exposing a critical discrepancy between inflated leaderboard scores and real-world model capabilities. To address this widening gap, we introduce Video-MME-v2, a comprehensive benchmark designed to rigorously eva…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

MedConclusion: A Benchmark for Biomedical Conclusion Generation from Structured Abstracts

2026-04-07 · Weiyue Li, Ruizhi Qian, Yi Li, Yongce Li, Yunfan Long, Jiahui Cai, Yan Luo, Mengyu Wang

General AI

Large language models (LLMs) are widely explored for reasoning-intensive research tasks, yet resources for testing whether they can infer scientific conclusions from structured biomedical evidence remain limited. We introduce MedConclusion, a large-scale dataset of 5.7M PubMed structured abstracts for biomedical conclu…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind

2026-04-13 · Hanqi Xiao, Vaidehi Patil, Zaid Khan, Hyunji Lee, Elias Stengel-Eskin, Mohit Bansal

General AI

As large language models (LLMs) become the engine behind conversational systems, their ability to reason about the intentions and states of their dialogue partners (i.e., form and use a theory-of-mind, or ToM) becomes increasingly critical for safe interaction with potentially adversarial partners. We propose a novel p…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

Is This Edit Correct? A Multi-Dimensional Benchmark for Reasoning-Aware Image Editing

2026-04-16 · Yixuan Ding, Wei Huang, Ruijie Quan, Xiaojuan Qi, Yi Yang

General AI

Diffusion-based image editing has achieved strong visual fidelity under natural language instructions, yet most existing systems still operate at the level of surface instruction following, without reasoning about the implicit contextual constraints embedded in real user requests. This often leads to visually plausible…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

Chain-of-Thought Degrades Visual Spatial Reasoning Capabilities of Multimodal LLMs

2026-04-17 · Sai Srinivas Kancheti, Aditya Sanjiv Kanade, Vineeth N. Balasubramanian, Tanuja Ganu

General AI

Multimodal Reasoning Models (MRMs) leveraging Chain-of-Thought (CoT) based thinking have revolutionized mathematical and logical problem-solving. However, we show that this paradigm struggles with generalized spatial intelligence. We perform a comprehensive evaluation of seventeen models across thirteen spatial benchma…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

MNAFT: modality neuron-aware fine-tuning of multimodal large language models for image translation

2026-04-18 · Bo Li, Ningyuan Deng, Tianyu Dong, Shaobo Wang, Shaolin Zhu, Lijie Wen

General AI

Multimodal large language models (MLLMs) have shown impressive capabilities, yet they often struggle to effectively capture the fine-grained textual information within images crucial for accurate image translation. This often leads to a modality gap between visual text inputs and textual inputs/outputs for image transl…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

EvoMaster: A Foundational Agent Framework for Building Evolving Autonomous Scientific Agents at Scale

2026-04-19 · Xinyu Zhu, Yuzhu Cai, Zexi Liu, Cheng Wang, Fengyang Li, Wenkai Jin, Wanxu Liu, Zehao Bing, Bingyang Zheng, Jingyi Chai, Shuo Tang, Rui Ye, Yuwen Du, Xianghe Pang, Yaxin Du, Tingjia Miao, Yuzhi Zhang, Ruoxue Liao, Zhaohan Ding, Linfeng Zhang, Yanfeng Wang, Weinan E, Siheng Chen

General AI

The convergence of large language models and agents is catalyzing a new era of scientific discovery: Agentic Science. While the scientific method is inherently iterative, existing agent frameworks are predominantly static, narrowly scoped, and lack the capacity to learn from trial and error. To bridge this gap, we pres…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics

2026-04-19 · Yueyang Ding, HaoPeng Zhang, Rui Dai, Yi Wang, Tianyu Zong, Kaikui Liu, Xiangxiang Chu

General AI

Comprehensive understanding of time series remains a significant challenge for Large Language Models (LLMs). Current research is hindered by fragmented task definitions and benchmarks with inherent ambiguities, precluding rigorous evaluation and the development of unified Time Series Reasoning Models(TSRMs). To bridge …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

Latent Preference Modeling for Cross-Session Personalized Tool Calling

2026-04-20 · Yejin Yoon, Minseo Kim, Taeuk Kim

General AI

Users often omit essential details in their requests to LLM-based agents, resulting in under-specified inputs for tool use. This poses a fundamental challenge for tool-augmented agents, as API execution typically requires complete arguments, highlighting the need for personalized tool calling. To study this problem, we…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

MM-JudgeBias: A Benchmark for Evaluating Compositional Biases in MLLM-as-a-Judge

2026-04-20 · Sua Lee, Sanghee Park, Jinbae Im

General AI

Multimodal Large Language Models (MLLMs) have been increasingly used as automatic evaluators-a paradigm known as MLLM-as-a-Judge. However, their reliability and vulnerabilities to biases remain underexplored. We find that many MLLM judges fail to reliably integrate key visual or textual cues, yielding unreliable evalua…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

OpenGame: Open Agentic Coding for Games

2026-04-20 · Yilei Jiang, Jinyuan Hu, Qianyin Xiao, Yaozhi Zheng, Ruize Ma, Kaituo Feng, Jiaming Han, Tianshuo Peng, Kaixuan Fan, Manyuan Zhang, Xiangyu Yue

General AI

Game development sits at the intersection of creative design and intricate software engineering, demanding the joint orchestration of game engines, real-time loops, and tightly coupled state across many files. While Large Language Models (LLMs) and code agents now solve isolated programming tasks with ease, they consis…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

Trust but Verify: Introducing DAVinCI -- A Framework for Dual Attribution and Verification in Claim Inference for Language Models

2026-04-23 · Vipula Rawte, Ryan Rossi, Franck Dernoncourt, Nedim Lipka

General AI

Large Language Models (LLMs) have demonstrated remarkable fluency and versatility across a wide range of NLP tasks, yet they remain prone to factual inaccuracies and hallucinations. This limitation poses significant risks in high-stakes domains such as healthcare, law, and scientific communication, where trust and veri…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents

2026-04-26 · Fanqing Meng, Lingxiao Du, Zijian Wu, Guanzheng Chen, Xiangyan Liu, Jiaqi Liao, Chonghe Jiang, Zhenglin Wan, Jiawei Gu, Pengfei Zhou, Rui Huang, Ziqi Zhao, Shengyuan Ding, Ailing Yu, Bo Peng, Bowei Xia, Hao Sun, Haotian Liang, Ji Xie, Jiajun Chen, Jiajun Song, Liu Yang, Ming Xu, Qionglin Qiu, Runhao Fu, Shengfang Zhai, Shijian Wang, Tengfei Ma, Tianyi Wu, Weiyang Jin, Yan Wang, Yang Dai, Yao Lai, Youwei Shu, Yue Liu, Yunzhuo Hao, Yuwei Niu, Jinkai Huang, Jiayuan Zhuo, Zhennan Shen, Linyu Wu, Cihang Xie, Yuyin Zhou, Jiaheng Zhang, Zeyu Zheng, Mengkang Hu, Michael Qizhe Shieh

General AI

Language-model agents are increasingly used as persistent coworkers that assist users across multiple working days. During such workflows, the surrounding environment may change independently of the agent: new emails arrive, calendar entries shift, knowledge-base records are updated, and evidence appears across images,…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

Co-Director: Agentic Generative Video Storytelling

2026-04-27 · Yale Song, Yiwen Song, Nick Losier, Nathan Hodson, Ye Jin, Rhyard Zhu, Yan Xu, Daniel Vlasic, Carina Claassen, Jasmine Leon, Khanh G. LeViet, Zack Chomyn, Joe Timmons, Brett Slatkin, Scott Penberthy, Tomas Pfister

General AI

While diffusion models generate high-fidelity video clips, transforming them into coherent storytelling engines remains challenging. Current agentic pipelines automate this via chained modules but suffer from semantic drift and cascading failures due to independent, handcrafted prompting. We present Co-Director, a hier…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills

2026-04-27 · Qiliang Liang, Hansi Wang, Zhong Liang, Yang Liu

General AI

LLM agents increasingly rely on reusable skills, capability packages that combine instructions, control flow, constraints, and tool calls. In most current agent systems, however, skills are still represented by text-heavy artifacts, including SKILL.md-style documents and structured records whose machine-usable evidence…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora

2026-04-27 · Chenkai Pan, Xinglong Xu, Yuhang Xu, Yujun Wu, Siyuan Li, Jintao Chen, Conghui He, Jingxuan Wei, Cheng Tan

General AI

Reliably transferring specialized human knowledge from text into large language models remains a fundamental challenge in artificial intelligence. Fine-tuning on domain corpora has enabled substantial capability gains, but the process operates without feedback: when a model fails on a domain task, there is no method to…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

TCOD: Exploring Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents

2026-04-27 · Jiaqi Wang, Wenhao Zhang, Weijie Shi, Yaliang Li, James Cheng

General AI

On-policy distillation (OPD) has shown strong potential for transferring reasoning ability from frontier or domain-specific models to smaller students. While effective on static single-turn tasks, its behavior in multi-turn agent settings remains underexplored. In this work, we identify a key limitation of vanilla OPD …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery

2026-04-28 · Lei Xiong, Kun Luo, Ziyi Xia, Wenbo Zhang, Jin-Ge Yao, Zheng Liu, Jingying Shao, Jianlyu Chen, Hongjin Qian, Xi Yang, Qian Yu, Hao Li, Chen Yue, Xiaan Du, Yuyang Wang, Yesheng Liu, Haiyu Xu, Zhicheng Dou

General AI

Autonomous scientific research is significantly advanced thanks to the development of AI agents. One key step in this process is finding the right scientific literature, whether to explore existing knowledge for a research problem, or to acquire evidence for verifying assumptions and supporting claims. To assess AI age…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models

2026-04-29 · Xingwei Tan, Marco Valentino, Mahmud Elahi Akhter, Yuxiang Zhou, Maria Liakata, Nikolaos Aletras

General AI

Large Language Models (LLMs) are known to acquire reasoning capabilities through shared inference patterns in pre-training data, which are further elicited via Chain-of-Thought (CoT) practices. However, whether fundamental reasoning patterns, such as induction, deduction, and abduction, can be decoupled from specific p…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

2026-06-02 · Tao Chen, Gangwei Jiang, Pengyu Cheng, Siyuan Huang, Yihao Liu, Jingwei Ni, Jiaqi Guo, Mengyu Zhou, Kai Tang, Junling Liu, Qinliang Su, Xiaoxi Jiang, Guanjun Jiang

General AI

Reward models (RMs) provide critical feedback signals for LLM post-training, notably in reinforced fine-tuning (RFT) and reinforcement learning (RL) pipelines. However, current reward evaluation relies on heterogeneous criteria such as rule-based verifiers, ground-truth references, procedural checklists, and complex ru…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

Self-Evaluation Is Already There: Eliciting Latent Judge Calibration in Base LLMs with Minimal Data

2026-06-03 · XiuYu Zhang, Yi Shan, Junfeng Fang, Zhenkai Liang

General AI

Large language models are increasingly evaluated by other models, raising a natural question: can a model predict how a judge will score its own output? We find that the ability is largely present before any targeted training: prompted few-shot, a base model already predicts an external judge's multi-attribute quality …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?

2026-06-04 · Woojung Song, Nalim Kim, Sangjun Song, Chaewon Heo, Jongwon Lim, Yohan Jo

General AI

Role-playing language agents (RPLAs) should play characters whose values and behavior evolve as the story progresses, not maintain a fixed persona. Existing benchmarks measure factual recall at a given chapter, not whether responses align with the character's psychological trajectory, especially in scenarios the source…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

Towards One-to-Many Temporal Grounding

2026-06-04 · Qi Xu, Yue Tan, Shihao Chen, Jiahao Meng, Anna Wang, Shunping Ji, Hao Fei, Jason Li

General AI

Temporal Grounding (TG) aims to localize video segments corresponding to a textual query. Prior research predominantly focuses on single-segment retrieval. Real-world scenarios, however, often require localizing multiple disjoint segments for a single query -- a setting we term One-to-Many Temporal Grounding (OMTG). Pr…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.5

Phase model analysis of the effect of M-current on neural synchrony in hippocampal networks

2026-06-10 · Megha Manoj, Sue Ann Campbell

Research Track A

Neural assemblies, transiently coordinated groups of neurons, observed in the hippocampus are thought to underlie the formation of episodic memories. Acetylcholine (ACh), a neuromodulator, that is received by the hippocampus, plays a critical role in memory and learning. A well supported hypothesis suggests that high l…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.5

Beyond Monolingual Deep Research: Evaluating Agents and Retrievers with Cross-Lingual BrowseComp-Plus

2026-06-13 · Yuheng Lu, Qingcheng Zeng, Heli Qi, Puxuan Yu, Fuheng Zhao, Rui Yang, Hitomi Yanaka, Naoto Yokoya, Weihao Xuan

General AI

Deep research agents are increasingly evaluated on their ability to search for evidence, reason over retrieved sources, and produce grounded answers. Existing browsing benchmarks, however, largely assume that the user's query and the supporting evidence are written in the same language, leaving open whether agentic sea…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.5

CLIMB: Centroid-Based Hierarchical Memory for Online Continual Self-Supervised Learning

2026-06-30 · Julien Lefebvre, Stefan Duffner, Mathieu Lefort

Research Track A · General AI

Online Continual Self-Supervised Learning (OCSSL) aims to learn representations from a continuous stream of unlabeled data, without knowledge of task boundaries and under memory constraints. Existing methods rely either on replay buffers that exploit latent space structure, or on regularization alone. We present CLIMB …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.4

SKILL-DISCO: Distilling and Compiling Agent Traces into Reusable Procedural Skills

2026-06-25 · Zhongxin Guo, Danrui Qi, Hanwen Gu, Peng Cheng, Yongqiang Xiong

Research Track B · General AI

Agents often repeatedly solve similar task instances from scratch, leading to unnecessary reasoning cost and long execution traces. Prior work has explored workflow reuse and executable skill induction, but it remains unclear which task scenarios admit procedural skills and how the shared procedural structure should be…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

WebNavigator: Global Web Navigation via Interaction Graph Retrieval

2026-03-20 · Xuanwang Zhang, Yuteng Han, Jinnan Qi, Mulong Xie, Zhen Wu, Xinyu Dai

Research Track B · General AI

Despite significant advances in autonomous web navigation, current methods remain far from human-level performance in complex web environments. We argue that this limitation stems from Topological Blindness, where agents are forced to explore via trial-and-error without access to the global topological structure of the…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Agent Factories for High Level Synthesis: How Far Can General-Purpose Coding Agents Go in Hardware Optimization?

2026-03-26 · Abhishek Bhandwaldar, Mihir Choudhury, Ruchir Puri, Akash Srivastava

General AI

We present an empirical study of how far general-purpose coding agents -- without hardware-specific training -- can optimize hardware designs from high-level algorithmic specifications. We introduce an agent factory, a two-stage pipeline that constructs and coordinates multiple autonomous optimization agents. In Stage~…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Just Zoom In: Cross-View Geo-Localization via Autoregressive Zooming

2026-03-26 · Yunus Talha Erzurumlu, Jiyong Kwag, Alper Yilmaz

General AI

Cross-view geo-localization (CVGL) estimates a camera's location by matching a street-view image to geo-referenced overhead imagery, enabling GPS-denied localization and navigation. Existing methods almost universally formulate CVGL as an image-retrieval problem in a contrastively trained embedding space. This ties per…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Scaling Teams or Scaling Time? Memory Enabled Lifelong Learning in LLM Multi-Agent Systems

2026-03-27 · Shanglin Wu, Yuyang Luo, Yueqing Liang, Kaiwen Shi, Yanfang Ye, Ali Payani, Kai Shu

Research Track A · General AI

Large language model (LLM) multi-agent systems can scale along two distinct dimensions: by increasing the number of agents and by improving through accumulated experience over time. Although prior work has studied these dimensions separately, their interaction under realistic cost constraints remains unclear. In this p…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Fine-Tuning Large Language Models for Cooperative Tactical Deconfliction of Small Unmanned Aerial Systems

2026-03-30 · Iman Sharifi, Alex Zongo, Peng Wei

General AI

The growing deployment of small Unmanned Aerial Systems (sUASs) in low-altitude airspaces has increased the need for reliable tactical deconfliction under safety-critical constraints. Tactical deconfliction involves short-horizon decision-making in dense, partially observable, and heterogeneous multi-agent environments…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Can Commercial LLMs Be Parliamentary Political Companions? Comparing LLM Reasoning Against Romanian Legislative Expuneri de Motive

2026-03-31 · Iulian Lucău, Adelin-George Voicu

General AI

This paper evaluates whether commercial large language models (LLMs) can function as reliable political advisory tools by comparing their outputs against official legislative reasoning. Using a dataset of 15 Romanian Senate law proposals paired with their official explanatory memoranda (expuneri de motive), we test six…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

SurgTEMP: Temporal-Aware Surgical Video Question Answering with Text-guided Visual Memory for Laparoscopic Cholecystectomy

2026-03-31 · Shi Li, Vinkle Srivastav, Nicolas Chanel, Saurav Sharma, Nabani Banik, Lorenzo Arboit, Kun Yuan, Pietro Mascagni, Nicolas Padoy

General AI

Surgical procedures are inherently complex and risky, requiring extensive expertise and constant focus to well navigate evolving intraoperative scenes. Computer-assisted systems such as surgical visual question answering (VQA) offer promises for education and intraoperative support. Current surgical VQA research largel…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Think Anywhere in Code Generation

2026-03-31 · Xue Jiang, Tianyu Zhang, Ge Li, Mengyang Liu, Taozhi Chen, Zhenhua Xu, Binhua Li, Wenpin Jiao, Zhi Jin, Yongbin Li, Yihong Dong

General AI

Recent advances in reasoning Large Language Models (LLMs) have primarily relied on upfront thinking, where reasoning occurs before final answer. However, this approach suffers from critical limitations in code generation, where upfront thinking is often insufficient as problems' full complexity only reveals itself duri…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Batched Contextual Reinforcement: A Task-Scaling Law for Efficient Reasoning

2026-04-02 · Bangji Yang, Hongbo Ma, Jiajun Fan, Ge Liu

General AI

Large Language Models employing Chain-of-Thought reasoning achieve strong performance but suffer from excessive token consumption that inflates inference costs. Existing efficiency methods such as explicit length penalties, difficulty estimators, or multi-stage curricula either degrade reasoning quality or require comp…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Steerable Visual Representations

2026-04-02 · Jona Ruthardt, Manu Gaur, Deva Ramanan, Makarand Tapaswi, Yuki M. Asano

General AI

Pretrained Vision Transformers (ViTs) such as DINOv2 and MAE provide generic image features that can be applied to a variety of downstream tasks such as retrieval, classification, and segmentation. However, such representations tend to focus on the most salient visual cues in the image, with no way to direct them towar…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing

2026-04-02 · Gengsheng Li, Tianyu Yang, Junfeng Fang, Mingyang Song, Mao Zheng, Haiyun Guo, Dan Zhang, Jinqiao Wang, Tat-Seng Chua

General AI

Reinforcement learning with verifiable rewards (RLVR) has become a standard paradigm for post-training large language models. While Group Relative Policy Optimization (GRPO) is widely adopted, its coarse credit assignment uniformly penalizes failed rollouts, lacking the token-level focus needed to efficiently address s…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

RL-Driven Sustainable Land-Use Allocation for the Lake Malawi Basin

2026-04-04 · Ying Yao

Research Track A · General AI

Unsustainable land-use practices in ecologically sensitive regions threaten biodiversity, water resources, and the livelihoods of millions. This paper presents a deep reinforcement learning (RL) framework for optimizing land-use allocation in the Lake Malawi Basin to maximize total ecosystem service value (ESV). Drawin…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Rethinking Exploration in RLVR: From Entropy Regularization to Refinement via Bidirectional Entropy Modulation

2026-04-06 · Hengrui Gu, Xiaotian Han, Yujing Bian, Kaixiong Zhou

General AI

Reinforcement learning with verifiable rewards (RLVR) has significantly advanced the reasoning capabilities of large language models (LLMs). However, it faces a fundamental limitation termed \textit{restricted exploration}, where the policy rapidly converges to a narrow set of solutions. While entropy regularization is…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

ForkKV: Scaling Multi-LoRA Agent Serving via Copy-on-Write Disaggregated KV Cache

2026-04-07 · Shao Wang, Rui Ren, Lin Gui

General AI

The serving paradigm of large language models (LLMs) is rapidly shifting towards complex multi-agent workflows where specialized agents collaborate over massive shared contexts. While Low-Rank Adaptation (LoRA) enables the efficient co-hosting of these specialized agents on a single base model, it introduces a critical…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Social Dynamics as Critical Vulnerabilities that Undermine Objective Decision-Making in LLM Collectives

2026-04-07 · Changgeon Ko, Jisu Shin, Hoyun Song, Huije Lee, Eui Jun Hwang, Jong C. Park

General AI

Large language model (LLM) agents are increasingly acting as human delegates in multi-agent environments, where a representative agent integrates diverse peer perspectives to make a final decision. Drawing inspiration from social psychology, we investigate how the reliability of this representative agent is undermined …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

UAVReason: A Unified, Large-Scale Benchmark for Multimodal Aerial Scene Reasoning and Generation

2026-04-07 · Jintao Sun, Hu Zhang, Donglin Di, Gangyi Ding, Zhedong Zheng

General AI

Vision-Language models (VLMs) have demonstrated remarkable capability in ground-view visual understanding but often fracture when deployed on high-altitude Unmanned Aerial Vehicles (UAVs). The failure largely stems from a pronounced domain shift, characterized by tiny and densely packed objects, repetitive textures, an…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Joint Optimization of Reasoning and Dual-Memory for Self-Learning Diagnostic Agent

2026-04-08 · Bingxuan Li, Simo Du, Yue Guo

Research Track A · General AI

Clinical expertise improves not only by acquiring medical knowledge, but by accumulating experience that yields reusable diagnostic patterns. Recent LLMs-based diagnostic agents have shown promising progress in clinical reasoning for decision support. However, most approaches treat cases independently, limiting experie…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Large Language Model Post-Training: A Unified View of Off-Policy and On-Policy Learning

2026-04-09 · Shiwan Zhao, Zhihu Wang, Xuyang Zhao, Jiaming Zhou, Caiyue Xu, Chenfei Liu, Liting Zhang, Yuhang Jia, Yanzhe Zhang, Hualong Yu, Zichen Xu, Qicheng Li, Yong Qin

Research Track A · General AI

Post-training has become central to turning pretrained large language models (LLMs) into aligned and deployable systems. Recent progress spans supervised fine-tuning (SFT), preference optimization, reinforcement learning (RL), process supervision, verifier-guided methods, distillation, and multi-stage pipelines. Yet th…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Seeing but Not Thinking: Routing Distraction in Multimodal Mixture-of-Experts

2026-04-09 · Haolei Xu, Haiwen Hong, Hongxing Li, Rui Zhou, Yang Zhang, Longtao Huang, Hui Xue, Yongliang Shen, Weiming Lu, Yueting Zhuang

General AI

Multimodal Mixture-of-Experts (MoE) models have achieved remarkable performance on vision-language tasks. However, we identify a puzzling phenomenon termed Seeing but Not Thinking: models accurately perceive image content yet fail in subsequent reasoning, while correctly solving identical problems presented as pure tex…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Vision-Language Navigation for Aerial Robots: Towards the Era of Large Language Models

2026-04-09 · Xingyu Xia, Lekai Zhou, Yujie Tang, Xiaozhou Zhu, Hai Zhu, Wen Yao

General AI

Aerial vision-and-language navigation (Aerial VLN) aims to enable unmanned aerial vehicles (UAVs) to interpret natural language instructions and autonomously navigate complex three-dimensional environments by grounding language in visual perception. This survey provides a critical and analytical review of the Aerial VL…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

What do Language Models Learn and When? The Implicit Curriculum Hypothesis

2026-04-09 · Emmy Liu, Kaiser Sun, Millicent Li, Isabelle Lee, Lindia Tjuatja, Jen-tse Huang, Graham Neubig

General AI

Large language models (LLMs) can perform remarkably complex tasks, yet the fine-grained details of how these capabilities emerge during pretraining remain poorly understood. Scaling laws on validation loss tell us how much a model improves with additional compute, but not what skills it acquires in which order. To reme…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation

2026-04-13 · Yuqian Yuan, Wenqiao Zhang, Juekai Lin, Yu Zhong, Mingjian Gao, Binhe Yu, Yunqi Cao, Wentong Li, Yueting Zhuang, Beng Chin Ooi

General AI

Large Multimodal Models (LMMs) have achieved remarkable progress in general-purpose vision--language understanding, yet they remain limited in tasks requiring precise object-level grounding, fine-grained spatial reasoning, and controllable visual manipulation. In particular, existing systems often struggle to identify …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Agentic Discovery with Active Hypothesis Exploration for Visual Recognition

2026-04-14 · Jaywon Koo, Jefferson Hernandez, Ruozhen He, Hanjie Chen, Chen Wei, Vicente Ordonez

General AI

We introduce HypoExplore, an agentic framework that formulates neural architecture discovery for visual recognition as a hypothesis-driven scientific inquiry. Given a human-specified high-level research direction, HypoExplore ideates, implements, evaluates, and improves neural architectures through evolutionary branchi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Don't Show Pixels, Show Cues: Unlocking Visual Tool Reasoning in Language Models via Perception Programs

2026-04-14 · Muhammad Kamran Janjua, Hugo Silva, Di Niu, Bahador Rashidi

General AI

Multimodal language models (MLLMs) are increasingly paired with vision tools (e.g., depth, flow, correspondence) to enhance visual reasoning. However, despite access to these tool-generated visual cues, MLLMs often fail to benefit from them. Existing approaches typically feed raw tool outputs into the model, but these …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

PolicyLLM: Towards Excellent Comprehension of Public Policy for Large Language Models

2026-04-14 · Han Bao, Penghao Zhang, Yue Huang, Zhengqing Yuan, Yanchi Ru, Rui Su, Yujun Zhou, Xiangqi Wang, Kehan Guo, Nitesh V Chawla, Yanfang Ye, Xiangliang Zhang

General AI

Large Language Models (LLMs) are increasingly integrated into real-world decision-making, including in the domain of public policy. Yet, their ability to comprehend and reason about policy-related content remains underexplored. To fill this gap, we present \textbf{\textit{PolicyBench}}, the first large-scale cross-syst…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

See, Point, Refine: Multi-Turn Approach to GUI Grounding with Visual Feedback

2026-04-14 · Himangi Mittal, Gaurav Mittal, Nelson Daniel Troncoso, Yu Hu

Research Track B · General AI

Computer Use Agents (CUAs) fundamentally rely on graphical user interface (GUI) grounding to translate language instructions into executable screen actions, but editing-level grounding in dense coding interfaces, where sub-pixel accuracy is required to interact with dense IDE elements, remains underexplored. Existing a…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Agentic Microphysics: A Manifesto for Generative AI Safety

2026-04-16 · Federico Pierucci, Matteo Prandi, Marcantonio Bracale Syrnikov, Marcello Galisai, Piercosma Bisconti

General AI

This paper advances a methodological proposal for safety research in agentic AI. As systems acquire planning, memory, tool use, persistent identity, and sustained interaction, safety can no longer be analysed primarily at the level of the isolated model. Population-level risks arise from structured interaction among ag…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Blue Data Intelligence Layer: Streaming Data and Agents for Multi-source Multi-modal Data-Centric Applications

2026-04-16 · Moin Aminnaseri, Farima Fatahi Bayat, Nikita Bhutani, Jean-Flavien Bussotti, Kevin Chan, Rafael Li Chen, Yanlin Feng, Jackson Hassell, Estevam Hruschka, Eser Kandogan, Hannah Kim, James Levine, Seiji Maekawa, Jalal Mahmud, Kushan Mitra, Naoki Otani, Pouya Pezeshkpour, Nima Shahbazi, Chen Shen, Dan Zhang

General AI

NL2SQL systems aim to address the growing need for natural language interaction with data. However, real-world information rarely maps to a single SQL query because (1) users express queries iteratively (2) questions often span multiple data sources beyond the closed-world assumption of a single database, and (3) queri…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Feedback-Driven Execution for LLM-Based Binary Analysis

2026-04-16 · XiangRui Zhang, Qiang Li, Haining Wang

General AI

Binary analysis increasingly relies on large language models (LLMs) to perform semantic reasoning over complex program behaviors. However, existing approaches largely adopt a one-pass execution paradigm, where reasoning operates over a fixed program representation constructed by static analysis tools. This formulation …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

QuantCode-Bench: A Benchmark for Evaluating the Ability of Large Language Models to Generate Executable Algorithmic Trading Strategies

2026-04-16 · Alexey Khoroshilov, Alexey Chernysh, Orkhan Ekhtibarov, Nini Kamkia, Dmitry Zmitrovich

General AI

Large language models have demonstrated strong performance on general-purpose programming tasks, yet their ability to generate executable algorithmic trading strategies remains underexplored. Unlike standard code benchmarks, trading-strategy generation requires simultaneous mastery of domain-specific financial logic, k…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

ChemGraph-XANES: An Agentic Framework for XANES Simulation and Analysis

2026-04-17 · Vitor F. Grizzi, Thang Duc Pham, Luke N. Pretzie, Jiayi Xu, Murat Keceli, Cong Liu

General AI

Computational X-ray absorption near-edge structure (XANES) is widely used to probe local coordination environments, oxidation states, and electronic structure in chemically complex systems. However, the use of computational XANES at scale is constrained more by workflow complexity than by the underlying simulation meth…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Learning to Reason with Insight for Informal Theorem Proving

2026-04-17 · Yunhe Li, Hao Shi, Bowen Deng, Wei Wang, Mengzhe Ruan, Hanxu Hou, Zhongxiang Dai, Siyang Gao, Chao Wang, Shuang Qiu, Linqi Song

General AI

Although most of the automated theorem-proving approaches depend on formal proof systems, informal theorem proving can align better with large language models' (LLMs) strength in natural language processing. In this work, we identify a primary bottleneck in informal theorem proving as a lack of insight, namely the diff…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

SwanNLP at SemEval-2026 Task 5: An LLM-based Framework for Plausibility Scoring in Narrative Word Sense Disambiguation

2026-04-17 · Deshan Sumanathilaka, Nicholas Micallef, Julian Hough, Saman Jayasinghe

General AI

Recent advances in language models have substantially improved Natural Language Understanding (NLU). Although widely used benchmarks suggest that Large Language Models (LLMs) can effectively disambiguate, their practical applicability in real-world narrative contexts remains underexplored. SemEval-2026 Task 5 addresses…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Where Do Vision-Language Models Fail? World Scale Analysis for Image Geolocalization

2026-04-17 · Siddhant Bharadwaj, Ashish Vashist, Fahimul Aleem, Shruti Vyas

General AI

Image geolocalization has traditionally been addressed through retrieval-based place recognition or geometry-based visual localization pipelines. Recent advances in Vision-Language Models (VLMs) have demonstrated strong zero-shot reasoning capabilities across multimodal tasks, yet their performance in geographic infere…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Phase-Scheduled Multi-Agent Systems for Token-Efficient Coordination

2026-04-19 · Mohit Dubey

Research Track B · General AI

Multi-agent systems (MAS) powered by large language models suffer from severe token inefficiency arising from two compounding sources: (i) unstructured parallel execution, where all agents activate simultaneously irrespective of input readiness; and (ii) unrestricted context sharing, where every agent receives the full…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Document-as-Image Representations Fall Short for Scientific Retrieval

2026-04-20 · Ghazal Khalighinejad, Raghuveer Thirukovalluru, Alexander H. Oh, Bhuwan Dhingra

General AI

Many recent document embedding models are trained on document-as-image representations, embedding rendered pages as images rather than the underlying source. Meanwhile, existing benchmarks for scientific document retrieval, such as ArXivQA and ViDoRe, treat documents as images of pages, implicitly favoring such represe…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Sessa: Selective State Space Attention

2026-04-20 · Liubomyr Horbatko

General AI

Modern sequence models are dominated by Transformers, where self-attention mixes information from the visible context in an input-dependent way. However, when retrieval is not sharp and attention remains diffuse over an effective support $S_{\mathrm{eff}}(t)$, the influence of any individual token is diluted, typically…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

RespondeoQA: a Benchmark for Bilingual Latin-English Question Answering

2026-04-22 · Marisa Hudspeth, Patrick J. Burns, Brendan O'Connor

General AI

We introduce a benchmark dataset for question answering and translation in bilingual Latin and English settings, containing about 7,800 question-answer pairs. The questions are drawn from Latin pedagogical sources, including exams, quizbowl-style trivia, and textbooks ranging from the 1800s to the present. After automa…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Self-Awareness before Action: Mitigating Logical Inertia via Proactive Cognitive Awareness

2026-04-22 · Fulong Fan, Peilin Liu, Fengzhe Liu, Shuyan Yang, Gang Yan

General AI

Large language models perform well on many reasoning tasks, yet they often lack awareness of whether their current knowledge or reasoning state is complete. In non-interactive puzzle settings, the narrative is fixed and the underlying structure is hidden; once a model forms an early hypothesis under incomplete premises…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

StructMem: Structured Memory for Long-Horizon Behavior in LLMs

2026-04-23 · Buqiang Xu, Yijun Chen, Jizhan Fang, Ruobin Zhong, Yunzhi Yao, Yuqi Zhu, Lun Du, Shumin Deng

Research Track A · General AI

Long-term conversational agents need memory systems that capture relationships between events, not merely isolated facts, to support temporal reasoning and multi-hop question answering. Current approaches face a fundamental trade-off: flat memory is efficient but fails to model relational structure, while graph-based m…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Task-Driven Co-Design of Heterogeneous Multi-Robot Systems

2026-04-23 · Maximilian Stralz, Meshal Alharbi, Yujun Huang, Gioele Zardini

General AI

Designing multi-agent robotic systems requires reasoning across tightly coupled decisions spanning heterogeneous domains, including robot design, fleet composition, and planning. Much effort has been devoted to isolated improvements in these domains, whereas system-level co-design considering trade-offs and task requir…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Rethinking Math Reasoning Evaluation: A Robust LLM-as-a-Judge Framework Beyond Symbolic Rigidity

2026-04-24 · Erez Yosef, Oron Anschel, Shunit Haviv Hakimi, Asaf Gendler, Adam Botach, Nimrod Berman, Igor Kviatkovsky

General AI

Recent advancements in large language models have led to significant improvements across various tasks, including mathematical reasoning, which is used to assess models' intelligence in logical reasoning and problem-solving. Models are evaluated on mathematical reasoning benchmarks by verifying the correctness of the f…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Can LLMs Act as Historians? Evaluating Historical Research Capabilities of LLMs via the Chinese Imperial Examination

2026-04-27 · Lirong Gao, Zeqing Wang, Yuyan Cai, Jiayi Deng, Yanmei Gu, Yiming Zhang, Jia Zhou, Yanfei Zhang, Junbo Zhao

General AI

While Large Language Models (LLMs) have increasingly assisted in historical tasks such as text processing, their capacity for professional-level historical reasoning remains underexplored. Existing benchmarks primarily assess basic knowledge breadth or lexical understanding, failing to capture the higher-order skills, …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

MARD: A Multi-Agent Framework for Robust Android Malware Detection

2026-04-28 · Xueying Zeng, Youquan Xian, Sihao Liu, Xudong Mou, Yanze Li, Lei Cui, Bo Li

Research Track A · General AI

With the rapid evolution of Android applications, traditional machine learning-based detection models suffer from concept drift. Additionally, they are constrained by shallow features, lacking deep semantic understanding and interpretability of decisions. Although Large Language Models (LLMs) demonstrate remarkable sem…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents

2026-04-28 · Mengyao Du, Han Fang, Haokai Ma, Jiahao Chen, Kai Xu, Quanjun Yin, Ee-Chien Chang

Research Track B · General AI

Web agents have emerged as an effective paradigm for automating interactions with complex web environments, yet remain vulnerable to prompt injection attacks that embed malicious instructions into webpage content to induce unintended actions. This threat is further amplified for screenshot-based web agents, which opera…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations

2026-04-29 · Bochao Liu, Zhipeng Qian, Yang Zhao, Xinyuan Jiang, Zihan Liang, Yufei Ma, Junpeng Zhuang, Ben Chen, Shuo Yang, Hongen Wan, Yao Wu, Chenyi Lei, Xiao Liang

General AI

Operating and maintaining (O&M) large-scale online engine systems (search, recommendation, advertising) demands substantial human effort for release monitoring, alert response, and root cause analysis. While LLM-based agents are a natural fit for these tasks, the deployment bottleneck is not reasoning capability but or…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Factorized Latent Reasoning for LLM-based Recommendation

2026-04-29 · Tianqi Gao, Chengkai Huang, Zihan Wang, Cao Liu, Ke Zeng, Lina Yao

General AI

Large language models (LLMs) have recently been adopted for recommendation by framing user preference modeling as a language generation problem. However, existing latent reasoning approaches typically represent user intent with a single latent vector, which struggles to capture the inherently multi-faceted nature of us…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

State Beyond Appearance: Diagnosing and Improving State Consistency in Dial-Based Measurement Reading

2026-04-29 · Yuanze Hu, Gen Li, Yuqin Lan, Qingchen Yu, Zhichao Yang, Junwei Jing, Zhaoxin Fan, Xiaotie Deng

General AI

Multimodal large language models (MLLMs) have achieved impressive progress on general multimodal tasks, yet they remain brittle on dial-based measurement reading. In this paper, we study this problem through controlled benchmarks and feature-space probing, and show that current MLLMs not only achieve unsatisfactory acc…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

2026-04-30 · Xin Zhou, Dingkang Liang, Xiwu Chen, Feiyang Tan, Dingyuan Zhang, Hengshuang Zhao, Xiang Bai

General AI

Driving world models serve as a pivotal technology for autonomous driving by simulating environmental dynamics. However, existing approaches predominantly focus on future scene generation, often overlooking comprehensive 3D scene understanding. Conversely, while Large Language Models (LLMs) demonstrate impressive reaso…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

2026-04-30 · Keming Wu, Zuhao Yang, Kaichen Zhang, Shizun Wang, Haowei Zhu, Sicong Leng, Zhongyu Yang, Qijie Wang, Sudong Wang, Ziting Wang, Zili Wang, Hui Zhang, Haonan Wang, Hang Zhou, Yifan Pu, Xingxuan Li, Fangneng Zhan, Bo Li, Lidong Bing, Yuxin Song, Ziwei Liu, Wenhu Chen, Jingdong Wang, Xinchao Wang, Xiaojuan Qi, Shijian Lu, Bin Wang

General AI

Recent visual generation models have made major progress in photorealism, typography, instruction following, and interactive editing, yet they still struggle with spatial reasoning, persistent state, long-horizon consistency, and causal understanding. We argue that the field should move beyond appearance synthesis towa…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Legible Evaluation Design

2026-04-30 · Ivan Bercovich

General AI

Terminal-agent benchmarks have become a primary signal for measuring the coding and system-administration capabilities of large language models. As the market for evaluation environments grows, so does the pressure to ship tasks quickly, often without thorough adversarial review of the verification logic. This paper is…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Self-Adaptive Multi-Agent LLM-Based Security Pattern Selection for IoT Systems

2026-05-01 · Saeid Jamshidi, Foutse Khomh, Carol Fung, Kawser Wazed Nafi

General AI

The adoption of Internet of Things (IoT) systems at the network edge of smart architectures is increasing rapidly, intensifying the need for security mechanisms that are both adaptive and resource-efficient. In such environments, runtime defence mechanisms are no longer limited to detection alone but become a resource-…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Rethinking Continual Experience Internalization for Self-Evolving LLM Agents

2026-06-03 · Jingwen Chen, Wenkai Yang, Shengda Fan, Wenbo Nie, Chenxing Sun, Shaodong Zheng, Yangen Hu, Lu Pan, Ke Zeng, Yankai Lin

Research Track A · General AI

Experience internalization converts contextual experience from past interactions into reusable parametric capability, offering a promising path toward continual learning in large language models (LLMs). While prior work has predominantly focused on single-iteration transfer, we discover that under multi-iteration exper…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Latent Reasoning with Normalizing Flows

2026-06-04 · Guancheng Tu, Xiangjun Fu, Suhao Yu, Yao Tang, Haoqiang Kang, Lianhui Qin, Yizhe Zhang, Jiatao Gu

General AI

Large language models often improve reasoning by generating explicit chain-of-thought (CoT), demonstrating the importance of intermediate computation. However, textual CoT forces this computation through a discrete, serial, and communication-oriented token stream: each reasoning step must be verbalized before the model…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

AliyunConsoleAgent: Training Web Agents in Real-World Cloud Environments via Distillation and Reinforcement Learning

2026-06-08 · Bojie Rong, Zheyu Shen, Qiaoping Wang, Pengfei Kang, Yang Xu, Yawen Wei, Hanyu Wu, Zhi Zhao, Leihao Pei, Linquan Jiang

Research Track B · General AI

We present AliyunConsoleAgent, a web agent framework for automated documentation verification in real-world cloud consoles. Major cloud platforms encompass hundreds of products with rapid feature iteration, causing console UIs to frequently diverge from their corresponding documentation. Verifying that documented proce…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

LargeMonitor: Monitoring Online Task-Free Continual Learning via Large Pretrained Models

2026-06-08 · Mingqi Yuan, Xiaoquan Sun, Shihao Luo, Jiayu Chen

Research Track A · General AI

Online task-free continual learning (TFCL) requires intelligent agents to sequentially accumulate knowledge from an unbounded, non-stationary data stream under strict single-pass constraints and without any explicit task identifiers. Existing online TFCL paradigms primarily rely on parameter-efficient prompt tuning or …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

$\texttt{WEAVER}$, Better, Faster, Longer: An Effective World Model for Robotic Manipulation

2026-06-11 · Arnav Kumar Jain, Yilin Wu, Jesse Farebrother, Gokul Swamy, Andrea Bajcsy

General AI

The potential impacts of world models (WMs, i.e., learned simulators) on robotics are far-reaching -- policy evaluation, policy improvement, and test-time planning -- all with limited real-world interaction. To unlock these downstream capabilities, a WM needs to jointly satisfy three desiderata: $\textit{(i)}$ fidelity…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

LEDGER: A Long-Context Benchmark of Corporate Annual Reports for Grounded Financial Retrieval and Extraction

2026-06-11 · Charles Moslonka, Amaury de Vitry, Arthur Garnier, Hicham Randrianarivo, Emmanuel Malherbe

General AI

Finance reporting is a natural proving ground for large language models, and the very-long-context capabilities of recent models across all sizes make rigorous evaluation in this domain an increasingly pressing need. Yet most public financial resources reduce the task to plain-text SEC 10-K filings paired with a handfu…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Mana: Dexterous Manipulation of Articulated Tools

2026-06-11 · Zhao-Heng Yin, Guanya Shi, Pieter Abbeel, C. Karen Liu

General AI

Articulated tool manipulation remains a major challenge in dexterous robotics due to the need to coordinate internal degrees of freedom and contact-rich interactions. While prior work has largely focused on rigid objects, articulated tool use remains underexplored because of its physical complexity and the difficulty o…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data

2026-06-11 · Jiwen Liu, Shujuan Li, Zhixue Fang, Xiaohan Li, Yan Zhou, Zijie Meng, Zhimin Zhang, Yawen Luo, Guoxin Zhang, Yu-Shen Liu, Pengfei Wan

General AI

Cloning camera motion from reference videos is an important task in video generation, as videos provide intuitive and precise control. Existing methods either directly use parametric representations that fail to handle multi-shot generation or synthesize cross-paired data, which suffer from data scarcity, resulting in …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

LESS Is More: Mutual-Stability Sampling for Diffusion Language Models

2026-06-15 · Amr Mohamed, Guokan Shang, Michalis Vazirgiannis

General AI

Diffusion large language models (dLLMs) offer a promising alternative to autoregressive decoding by iteratively refining masked sequences, enabling parallel token updates and bidirectional conditioning. Their practical efficiency, however, is limited by sampling procedures that execute a fixed number of reverse denoisi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

EventDrive: Event Cameras for Vision-Language Driving Intelligence

2026-06-16 · Dongyue Lu, Rong Li, Ao Liang, Lingdong Kong, Wei Yin, Lai Xing Ng, Benoit R. Cottereau, Camille Simon Chane, Wei Tsang Ooi

General AI

Event cameras sense the world through asynchronous brightness changes with microsecond latency and high dynamic range, offering motion fidelity far beyond frame-based sensors and capturing temporal structure that conventional exposures often miss. These properties make events a powerful complement to RGB in autonomous …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

EvolveNav: Proactive Preflection and Self-Evolving Memory for Zero-Shot Object Goal Navigation

2026-06-16 · Qi Chai, Wenhao Shen, Nanjie Yao, Yue Xia, Kaiyong Zhao, Jie Ma, Guosheng Lin, Hao Wang

General AI

Zero-Shot Object-Goal Navigation (ZS-OGN) requires embodied agents to explore and locate target objects without any prior training. To this end, recent methods leverage foundation models. But they typically rely on static priors and lack adaptation, which leads to repeated errors and costly trial and error. In this pap…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Learning Cardiac Electrophysiology Digital Twins Through Agentic Discovery of Hybrid Structure

2026-06-16 · Ziqi Zhou, Yubo Ye, Sumeet Atul Vadhavka, Linwei Wang, Zhiqiang Tao

General AI

Building personalized cardiac electrophysiology (EP) digital twins requires identifying the appropriate model structure for each patient, not merely fitting parameters. Traditional methods rely on experts to manually prescribe hybrid physics-neural architectures, which requires deep domain expertise and does not transf…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Data Intelligence Agents: Interpreting, Modeling, and Querying Enterprise Data via Autonomous Coding Agents

2026-06-17 · Anoushka Vyas, Aarushi Dhanuka, Sina Khoshfetrat Pakazad, Henrik Ohlsson

General AI

Production data integration is bottlenecked by repeated, lossy handoffs between data owners, engineers, and analysts who must collaboratively discover, structure, and query enterprise data. We present Data Intelligence Agents (DIA), a system of three agents (Data Interpreter, Schema Creator, and Query Generator) that c…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

HT-Bench: Benchmarking and Learning Dexterous Full-Hand Tactile Representations with Egocentric Vision

2026-06-17 · Yuzhe Huang, Jiaping Wu, Jiaming Jiang, Hezhe Lin, Aikebaier Aierken, Yunlong Wang, Kun Cheng, Ziyuan Jiao, Yuanxin Zhong

General AI

Establishing a universal benchmark for tactile representation learning in robotic manipulation remains challenging due to the diversity of tactile sensor designs, data formats, and robot embodiments. Rather than seeking to establish such, we explore a scalable and promising direction for future development: egocentric …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.3

Episodic-to-Semantic Consolidation Without Identity Drift

2026-07-02 · Xue Qin, Simin Luan, Cong Yang, Zhijun Li

Research Track A · General AI

Long-running adaptive intelligent agents face a structural tension between knowledge consolidation and information integrity. Memory consolidation is conventionally treated as an agent-changing operation: a model is fine-tuned, a prompt rewritten, a policy distilled, or a reflection appended to the context that governs…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.2

FLUX3D: High-Fidelity 3D Gaussian Generation with Diffusion-Aligned Sparse Representation

2026-06-23 · Haorui Ji, Weizhe Liu, Hongdong Li, Hengkai Guo

General AI

Sparse voxel representation has emerged as a scalable foundation for image-to-3D Gaussian Splatting (3DGS) generation, yet current methods struggle to preserve high-frequency visual details of input images due to two structural bottlenecks. First, they adopt discriminative 2D features optimized for semantic abstraction…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.2

Scaling Laws for Task-Specific LLM Distillation

2026-06-23 · Lavinia Ghita, Dhruv Desai, Ioana Boier

General AI

Large Language Models (LLMs) achieve strong performance across a growing range of domains, yet their scale poses deployment challenges in applications where latency and cost constraints are critical. This paper derives empirical scaling laws for domain-specific LLM compression, quantifying how in-domain and general kno…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.2

Memory-Efficient Policy Libraries with Low-Rank Adaptation in Reinforcement Learning

2026-06-24 · Samuel Valland Lyngset, Tor Viljen Raanaas, Gard Sveipe, Eirik Møller Nilsen, Jim Torresen, Kai Olav Ellefsen, Tobias Lømo

General AI

When fine-tuning Large Language Models (LLMs), there has been success in minimizing both memory usage and computation with Parameter-Efficient Fine-Tuning (PEFT), like Low Rank Adaptation (LoRA). In this article, we have explored whether this approach is transferable to the world of robotics and Reinforcement Learning …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.2

SurgAtlas: A Large-Scale Surgical Video-Language Dataset with 2,391 Hours of Open and Minimally Invasive Surgery

2026-06-24 · Filippos Bellos, Andre S. Gala-Garza, Miaowei Wang, Alyssa M. Hardin, Ahmad M. Hider, Yayuan Li, Jing Bi, Susan Liang, Chenliang Xu, Donald S. Likosky, Jason J. Corso

General AI

We introduce SurgAtlas, the largest surgical video-language dataset to date, comprising 15,291 videos (2,391 hours) spanning 18 surgical specialties and over 5,000 procedure types, sourced entirely from publicly available YouTube content. SurgAtlas is also the first surgical video-language dataset to include open surge…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.0

Learning from Many and Adapting to the Unknown in Open-set Test Streams

2026-04-01 · Xiao Zhang, Juntao Lyu, Tianyu Hu, Qianchuan Zhao, Huimin Ma

Research Track A · General AI

Large Language Models (LLMs) generalize across tasks via reusable representations and flexible reasoning, yet remain brittle in real deployment under evolving tasks and continual distribution shift. A common approach is Test-Time Adaptation (TTA), existing ones of which updates models with hand-designed unsupervised ob…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.0

Learning to Learn-at-Test-Time: Language Agents with Learnable Adaptation Policies

2026-04-01 · Zhanzhi Lou, Hui Chen, Yibo Li, Qian Wang, Bryan Hooi

Research Track B · General AI

Test-Time Learning (TTL) enables language agents to iteratively refine their performance through repeated interactions with the environment at inference time. At the core of TTL is an adaptation policy that updates the actor policy based on experience from previous episodes, thereby improving future behavior. Existing …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.0

CobwebTM: Probabilistic Concept Formation for Lifelong and Hierarchical Topic Modeling

2026-04-15 · Karthik Singaravadivelan, Anant Gupta, Zekun Wang, Christopher MacLellan, Christopher J. MacLellan

Research Track A

Topic modeling seeks to uncover latent semantic structure in text corpora with minimal supervision. Neural approaches achieve strong performance but require extensive tuning and struggle with lifelong learning due to catastrophic forgetting and fixed capacity, while classical probabilistic models lack flexibility and a…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.0

AEGIS: Anchor-Enforced Gradient Isolation for Knowledge-Preserving Vision-Language-Action Fine-Tuning

2026-04-17 · Guransh Singh

Research Track A

Adapting pre-trained vision-language models (VLMs) for robotic control requires injecting high-magnitude continuous gradients from a flow-matching action expert into a backbone trained exclusively with cross-entropy. This cross-modal gradient asymmetry - the spectral dimensionality mismatch between low-rank MSE regress…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.0

Meta-Tool: Efficient Few-Shot Tool Adaptation for Small Language Models

2026-04-22 · Sachin Kumar

Research Track B · General AI

Can small language models achieve strong tool-use performance without complex adaptation mechanisms? This paper investigates this question through Meta-Tool, a controlled empirical study comparing hypernetwork-based LoRA adaptation against carefully designed few-shot prompting. Using a Llama-3.2-3B-Instruct backbone, w…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.0

Task-specific Subnetwork Discovery in Reinforcement Learning for Autonomous Underwater Navigation

2026-04-23 · Yi-Ling Liu, Melvin Laux, Mariela De Lucas Alvarez, Frank Kirchner, Rebecca Adam

Research Track A · General AI

Autonomous underwater vehicles are required to perform multiple tasks adaptively and in an explainable manner under dynamic, uncertain conditions and limited sensing, challenges that classical controllers struggle to address. This demands robust, generalizable, and inherently interpretable control policies for reliable…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.0

AcademiClaw: When Students Set Challenges for AI Agents

2026-05-04 · Junjie Yu, Pengrui Lu, Weiye Si, Hongliang Lu, Jiabao Wu, Kaiwen Tao, Kun Wang, Lingyu Yang, Qiran Zhang, Xiuting Guo, Xuanyu Wang, Yang Wang, Yanjie Wang, Yi Yang, Zijian Hu, Ziyi Yang, Zonghan Zhou, Binghao Qiang, Borui Zhang, Chenning Li, Enchang Zhang, Feifan Chen, Feng Jian, Fengyin Sun, Hao Qiu, Hao Zheng, Haoran Zhu, Hongyu Liu, Jianbin Deng, Jiaxin Song, Jiaying Chi, Jiayou Shi, Jie Fang, Jinghui Zhong, Jingyu Zhou, Jinze Li, Junfeng Yi, Junyan Yu, Junzhi Xue, Ni Song, Pengyi Chen, Qi Chen, Quansheng Li, Rui Tao, Shenghai Gong, Shenhang Lu, Tianqi Shen, Tianxiang Zhu, Tiehan Kang, Tingyu Li, Wendi Wu, Xiao Shen, Xiao Zhou, Xiaotao Zhang, Xinrong Li, Xuankun Yang, Xun Zhang, Yan Li, Ye Lu, Yi Wang, Yibo Zhou, Yichi Zhang, Yihao Sun, Yijun Huang, Yixin Zhu, Yixuan Wu, Yuchen Sun, Yue Wu, Yuheng Sun, Yukun Li, Yutian Tu, Yuxuan Qin, Yuzhuo Wu, Zeyu Li, Zhengyu Lou, Zhenning Ran, Zizhu He, Pengfei Liu

General AI

Benchmarks within the OpenClaw ecosystem have thus far evaluated exclusively assistant-level tasks, leaving the academic-level capabilities of OpenClaw largely unexamined. We introduce AcademiClaw, a bilingual benchmark of 80 complex, long-horizon tasks sourced directly from university students' real academic workflows…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.0

Automated In-the-Wild Data Collection for Continual AI Generated Image Detection

2026-05-04 · Thanasis Pantsios, Dimitrios Karageorgiou, Christos Koutlis, George Karantaidis, Olga Papadopoulou, Symeon Papadopoulos

Research Track A · General AI

The rapid advancement of generative Artificial Intelligence (AI) has introduced significant challenges for reliable AI-generated image detection. Existing detectors often suffer from performance degradation under distribution shifts and when encountering newly emerging generative models. In this work, we propose a data…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.0

Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces

2026-05-14 · William Lugoloobi, Samuelle Marro, Jabez Magomere, Joss Wright, Chris Russell

Research Track B · General AI

As LLM-based agents increasingly browse the web on users' behalf, a natural question arises: can websites passively identify which underlying model powers an agent? Doing so would represent a significant security risk, enabling targeted attacks tailored to known model vulnerabilities. Across 14 frontier LLMs and four w…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.0

Flat-Pack Bench: Evaluating Spatio-Temporal Understanding in Large Vision-Language Models through Furniture Assembly

2026-05-20 · Aditya Chetan, Eric Cai, Peeyush Kushwaha, Bharath Raj Nagoor Kani, Utkarsh Mall, Qianqian Wang, Noah Snavely, Bharath Hariharan

General AI

The emergence of Large Vision-Language Models (LVLMs) has significantly advanced video understanding capabilities. However, existing benchmarks focus predominantly on coarse-grained tasks such as action segmentation, classification, captioning, and retrieval. Furthermore, these benchmarks often rely on entities that ca…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.0

Xetrieval: Mechanistically Explaining Dense Retrieval

2026-05-28 · Zhixin Cai, Jun Bai, Yang Liu, Jiaqi Li, Yichi Zhang, Taichuan Li, Zhuofan Chen, Zixia Jia, Zilong Zheng, Wenge Rong

General AI

Explaining why dense retrievers assign high relevance scores remains challenging because retrieval decisions are made through opaque high-dimensional embeddings. Existing explanations often focus on surface signals, such as lexical matches, token alignments, or post-hoc textual rationales, and thus provide limited insi…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.0

Agentic Abstention: Do Agents Know When to Stop Instead of Act?

2026-06-27 · Han Luo, Bingbing Wen, Lucy Lu Wang

General AI

LLM agents are expected to act over multiple turns, using search, browsing interfaces, and terminal tools to complete user goals. Yet not every goal is well specified or achievable in the available environment. In such cases, a reliable agent should recognize that further interaction is unlikely to help and abstain fro…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.0

Convergence of Continual Learning in Homogeneous Deep Networks

2026-06-29 · Matan Schliserman, Gon Buzaglo, Itay Evron, Daniel Soudry

Research Track A

We characterize weakly regularized continual classification in homogeneous models as sequential projections onto task margin sets. This result generalizes prior analyses restricted to either stationary (single-task) deep models or continual linear models. We show that global convergence generally fails, even for simple…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.8

AI-Generated Smells: An Analysis of Code and Architecture in LLM and Agent-Driven Development

2026-05-04 · Yuecai Zhu, Nikolaos Tsantalis, Peter C. Rigby

General AI

The promise of Large Language Models in automated software engineering is often measured by functional correctness, overlooking the critical issue of long term maintainability. This paper presents a systematic audit of technical debt in AI-generated software, revealing that AI does not eliminate flaws but rather introd…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.8

EvoPoC: Automated Exploit Synthesis for DeFi Smart Contracts via Hierarchical Knowledge Graphs

2026-05-04 · Ruichao Liang, Jing Chen, Xianglong Li, Huangpeng Gu, Yebo Feng, Yue Xue, Cong Wu, Yang Liu

General AI

Smart contract vulnerabilities in Decentralized Finance caused over billions of dollars losses every year, yet the security community faces a critical bottleneck: identifying a vulnerability is not the same as proving it is exploitable. Manual PoC construction is prohibitively labor-intensive, leaving most disclosed vu…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.8

MINER: Mining Multimodal Internal Representation for Efficient Retrieval

2026-05-07 · Weien Li, Rui Song, Zeyu Li, Haochen Liu, Gonghao Zhang, Difan Jiao, Zhenwei Tang, Bowei He, Haolun Wu, Xue Liu, Ye Yuan

General AI

Visual document retrieval has become essential for accessing information in visually rich documents. Existing approaches fall into two camps. Late-interaction retrievers achieve strong quality through fine-grained token-level matching but store hundreds of vectors per page, incurring large index footprints and high ser…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.8

StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction

2026-05-07 · Xiangyuan Xue, Yifan Zhou, Zidong Wang, Shengji Tang, Philip Torr, Wanli Ouyang, Lei Bai, Zhenfei Yin

General AI

Large language models (LLMs) are increasingly used as interactive agents, but optimizing them for long-horizon decision making remains difficult because current methods are largely purely reactive, which weakens both exploration and credit assignment over extended trajectories. In this work, we present Strategic Trajec…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.8

Verifier-Backed Hard Problem Generation for Mathematical Reasoning

2026-05-07 · Yuhang Lai, Jiazhan Feng, Yee Whye Teh, Ning Miao

General AI

Large Language Models (LLMs) demonstrate strong capabilities for solving scientific and mathematical problems, yet they struggle to produce valid, challenging, and novel problems - an essential component for advancing LLM training and enabling autonomous scientific research. Existing problem generation approaches eithe…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.8

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

2026-05-12 · Di Wu, Zixiang Ji, Asmi Kawatkar, Bryan Kwan, Jia-Chen Gu, Nanyun Peng, Kai-Wei Chang

Research Track B · General AI

Long-term memory is crucial for agents in specialized web environments, where success depends on recalling interface affordances, state dynamics, workflows, and recurring failure modes. However, existing memory benchmarks for agents mostly focus on user histories, short traces, or downstream task success, leaving open …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.8

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

2026-05-12 · Haiwen Diao, Penghao Wu, Hanming Deng, Jiahao Wang, Shihao Bai, Silei Wu, Weichen Fan, Wenjie Ye, Wenwen Tong, Xiangyu Fan, Yan Li, Yubo Wang, Zhijie Cao, Zhiqian Lin, Zhitao Yang, Zhongang Cai, Yuwei Niu, Yue Zhu, Bo Liu, Chengguang Lv, Haojia Yu, Haozhe Xie, Hongli Wang, Jianan Fan, Jiaqi Li, Jiefan Lu, Jingcheng Ni, Junxiang Xu, Kaihuan Liang, Lianqiang Shi, Linjun Dai, Linyan Wang, Oscar Qian, Peng Gao, Pengfei Liu, Qingping Sun, Rui Shen, Ruisi Wang, Shengnan Ma, Shuang Yang, Siyi Xie, Siying Li, Tianbo Zhong, Xiangli Kong, Xuanke Shi, Yang Gao, Yongqiang Yao, Yves Wang, Zhengqi Bai, Zhengyu Lin, Zixin Yin, Wenxiu Sun, Ruihao Gong, Quan Wang, Lewei Lu, Lei Yang, Ziwei Liu, Dahua Lin

General AI

Recent large vision-language models (VLMs) remain fundamentally constrained by a persistent dichotomy: understanding and generation are treated as distinct problems, leading to fragmented architectures, cascaded pipelines, and misaligned representation spaces. We argue that this divide is not merely an engineering arti…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.8

Auto-Dreamer: Learning Offline Memory Consolidation for Language Agents

2026-05-20 · Chongrui Ye, Yuxiang Liu, Yu Wang, Haofei Yu, Yining Zhao, Ge Liu, Julian McAuley, Jiaxuan You

Research Track A · Research Track B · General AI

Language agents increasingly operate over streams of related tasks, yet existing memory systems struggle to convert accumulated experience into reusable knowledge. Retrieval-augmented and structured memory methods record per-session observations effectively, but often couple acquisition and consolidation into a single …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.8

One-Way Policy Optimization for Self-Evolving LLMs

2026-05-21 · Shuo Yang, Jinda Lu, Kexin Huang, Chiyu Ma, Shaohang Wei, Yuyang Liu, Guoyin Wang, Jingren Zhou, Li Yuan

General AI

Reinforcement Learning with Verifiable Rewards (RLVR) has become a promising paradigm for scaling reasoning capabilities of Large Language Models (LLMs). However, the sparsity of binary verifier rewards often leads to low efficiency and optimization instability. To stabilize training, existing methods typically impose …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.8

GenClaw: Code-Driven Agentic Image Generation

2026-05-28 · Junyan Ye, Jun He, Zilong Huang, Dongzhi Jiang, Xuan Yang, Rui Chen, Weijia Li

General AI

Image generation models have evolved from text-conditioned pixel synthesis toward multimodal agents endowed with visual comprehension and tool invocation capabilities. Yet, existing agents remain at the mercy of underlying black-box image models. Their workflow is trapped in a repetitive cycle of prompt rewriting for g…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.8

How LoRA Remembers? A Parametric Memory Law for LLM Finetuning

2026-05-28 · Ziwen Xu, Haiwen Hong, Linsong Yu, Benglei Cui, Longtao Huang, Hui Xue, Ningyu Zhang

General AI

Large Language Models (LLMs) must continuously learn and update knowledge to remain effective in dynamic real-world environments. While Low-Rank Adaptation (LoRA) is widely used for such memory updates, existing studies mainly rely on qualitative downstream evaluations, leaving the quantitative capacity limits and unde…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.8

SpecBench: Evaluating Specification-Level Reasoning for Software Engineering LLM Agents

2026-05-28 · Grant Hamblin, Kevin Song, Zhanda Zhu, Anand Jayarajan, Sihang Liu, Nandita Vijaykumar, Gennady Pekhimenko

General AI

Software engineering (SWE) agents are transitioning from code generation to full software development lifecycle automation. A critical phase in this lifecycle is specification design: transforming initial proposals into carefully considered requirements through expert review. Existing benchmarks such as SWE-Bench are i…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.8

Statistical Embeddings for Similarity, Retrieval, and Interpretable Alignment of Numeric Tabular Datasets

2026-05-28 · M. Ross Kunz, John Merickel, Keith Wilson

General AI

Numeric tabular datasets are the dominant data format in scientific practice, yet large language models lack native mechanisms for representing numeric datasets in a meaningful way across heterogeneous feature spaces. Existing approaches either target predictive modeling over individual datasets, which requires a share…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.8

YoCausal: How Far is Video Generation from World Model? A Causality Perspective

2026-05-28 · You-Zhe Xie, Yu-Hsuan Li, Jie-Ying Lee, Kaipeng Zhang, Yu-Lun Liu, Zhixiang Wang

General AI

As video diffusion models (VDMs) advance toward world models, a key question arises: do they truly understand causality, or merely overfit to statistical temporal patterns? Existing benchmarks mostly rely on synthetic data, limiting real-world generalization due to the sim-to-real gap. We present YoCausal, a two-level …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.8

Don't Fool Me Twice: Adapting to Adversity in the Wild with Experience-Driven Reasoning

2026-05-29 · Navin Sriram Ravie, Andrew Jong, Krrish Jain, John Liu, Omar Alama, Bijo Sebastian, Sebastian Scherer

Research Track A · General AI

In robotics, dangers and adversity modes are often embodiment-specific and relative to each agent. A frontier of autonomous mobile robotics is to enable agents to operate effectively in the wild in unseen unstructured environments. A significant challenge in unseen unstructured environments is that it may not be possib…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.8

EGOSTREAM: A Diagnostic Benchmark for Streaming Episodic Memory in Egocentric Vision

2026-05-29 · Rosario Forte, Giuseppe Lando, Antonino Furnari

Research Track A · General AI

Continuous episodic memory is a core capability for autonomous agents operating in dynamic, real-world environments, yet current streaming video benchmarks provide limited tools for diagnosing what models remember and for how long. We introduce \egostream, a diagnostic benchmark for streaming episodic memory evaluation…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.8

Skill Reuse as Compression in Agentic RL

2026-05-29 · Zhikun Xu, Yu Feng, Jacob Dineen, Taiwei Shi, Jieyu Zhao, Ben Zhou

General AI

Large language model agents trained with reinforcement learning (RL) often learn brittle, task-specific shortcuts. We hypothesize that agents generalize better when their successful trajectories are structurally compressible, decomposed into a small set of reusable abstract patterns. To formalize this, we introduce Reu…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.8

Words Speak Louder Than Code: Investigating Cognitive Heuristics in LLM-Based Code Vulnerability Detection

2026-06-29 · Asif Shahriar, Hongyu Cai, Hadjer Benkraouda, Gang Wang, Z. Berkay Celik

General AI

Researchers and practitioners increasingly apply Large Language Models (LLMs) for automated vulnerability detection. Recent work has shown that LLMs are susceptible to the same cognitive heuristics that bias human judgment. Yet, no work has investigated whether these heuristics affect a model's assessment of code vulne…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.8

AnyGroundBench: A Specialized-Domain Benchmark for Video Grounding in Vision-Language Models

2026-07-02 · Rintaro Otsubo, Ryo Fujii, Reina Ishikawa, Taiki Kanaya, Kanta Sawafuji, Hiroki Kajita, Shigeki Sakai, Hideo Saito, Ryo Hachiuma

Research Track A · General AI

Vision-Language Models (VLMs) have demonstrated immense promise in Spatio-Temporal Video Grounding (STVG). However, current evaluation protocols are largely confined to zero-shot assessments on general, daily-life benchmarks. This creates a critical disconnect from real-world applications in specialized fields, where m…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.6

Bayesian Sparse Low-Rank Adaptation for Large Language Model Uncertainty Estimation

2026-07-02 · Jijie Zhang, Zhe Ren, Quan Zhang, Dandan Guo

General AI

Large language models (LLMs) exhibit remarkable reasoning capabilities, but their task-specific fine-tuning is notoriously plagued by overconfidence, severely hindering trustworthy deployment. We propose Data-Adaptive Lower-Rank Adaptation (DALorRA), a simple and effective variational Bayesian sparse framework that shi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.6

Program-as-Weights: A Programming Paradigm for Fuzzy Functions

2026-07-02 · Wentao Zhang, Liliana Hotsko, Woojeong Kim, Pengyu Nie, Stuart Shieber, Yuntian Deng

General AI

Many everyday programming tasks resist clean rule-based implementation, such as alerting on important log lines, repairing malformed JSON, or ranking search results by intent, and are increasingly outsourced to large language model APIs at the cost of locality, reproducibility, and price. We propose fuzzy-function prog…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.5

AgriIR: A Scalable Framework for Domain-Specific Knowledge Retrieval

2026-03-17 · Shuvam Banerji Seal, Aheli Poddar, Alok Mishra, Dwaipayan Roy

General AI

This paper introduces AgriIR, a configurable retrieval augmented generation (RAG) framework designed to deliver grounded, domain-specific answers while maintaining flexibility and low computational cost. Instead of relying on large, monolithic models, AgriIR decomposes the information access process into declarative mo…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.5

Reframing Long-Tailed Learning via Loss Landscape Geometry

2026-03-22 · Shenghan Chen, Yiming Liu, Yanzhen Wang, Yujia Wang, Xiankai Lu

Research Track A · General AI

Balancing performance trade-off on long-tail (LT) data distributions remains a long-standing challenge. In this paper, we posit that this dilemma stems from a phenomenon called "tail performance degradation" (the model tends to severely overfit on head classes while quickly forgetting tail classes) and pose a solution …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 12.5

BioVITA: Biological Dataset, Model, and Benchmark for Visual-Textual-Acoustic Alignment

2026-03-25 · Risa Shinoda, Kaede Shiohara, Nakamasa Inoue, Kuniaki Saito, Hiroaki Santo, Fumio Okura

General AI

Understanding animal species from multimodal data poses an emerging challenge at the intersection of computer vision and ecology. While recent biological models, such as BioCLIP, have demonstrated strong alignment between images and textual taxonomic information for species identification, the integration of the audio …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.5

Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models

2026-03-25 · Isha Puri, Mehul Damani, Idan Shenfeld, Marzyeh Ghassemi, Jacob Andreas, Yoon Kim

General AI

Given a question, a language model (LM) implicitly encodes a distribution over possible answers. In practice, post-training procedures for LMs often collapse this distribution onto a single dominant mode. While this is generally not a problem for benchmark-style evaluations that assume one correct answer, many real-wor…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.5

Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization

2026-03-30 · He Du, Qiming Ge, Jiakai Hu, Aijun Yang, Zheng Cai, Zixian Huang, Sheng Yuan, Qinxiu Cheng, Xinchen Xie, Yicheng Chen, Yining Li, Jiaxing Xie, Huanan Dong, Yaguang Wu, Xiangjun Huang, Jian Yang, Hui Wang, Bowen Zhou, Bowen Li, Qipeng Guo, Kai Chen

General AI

We present Kernel-Smith, a framework for high-performance GPU kernel and operator generation that combines a stable evaluation-driven evolutionary agent with an evolution-oriented post-training recipe. On the agent side, Kernel-Smith maintains a population of executable candidates and iteratively improves them using an…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.5

Continual Vision-Language Learning for Remote Sensing: Benchmarking and Analysis

2026-04-01 · Xingxing Weng, Ruifeng Ni, Chao Pang, XiangYu Hao, Yishan Wang, Xiaokang Zhang, Wei Xu, Gui-Song Xia

Research Track A · General AI

Current remote sensing vision-language models (RS VLMs) demonstrate impressive performance in image interpretation but rely on static training data, limiting their ability to accommodate continuously emerging sensing modalities and downstream tasks. This exposes a fundamental challenge: enabling RS VLMs to continually …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.5

ProTPS: Prototype-Guided Text Prompt Selection for Continual Learning

2026-04-01 · Jie Mei, Li-Leng Peng, Keith Fuller, Jenq-Neng Hwang

Research Track A

For continual learning, text-prompt-based methods leverage text encoders and learnable prompts to encode semantic features for sequentially arrived classes over time. A common challenge encountered by existing works is how to learn unique text prompts, which implicitly carry semantic information of new classes, so that…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.5

Can LLMs Learn to Reason Robustly under Noisy Supervision?

2026-04-05 · Shenzhi Yang, Guangcheng Zhu, Bowen Song, Sharon Li, Haobo Wang, Xing Zheng, Yingfan Ma, Zhongqi Chen, Weiqiang Wang, Gang Chen

General AI

Reinforcement Learning with Verifiable Rewards (RLVR) effectively trains reasoning models that rely on abundant perfect labels, but its vulnerability to unavoidable noisy labels due to expert scarcity remains critically underexplored. In this work, we take the first step toward a systematic analysis of noisy label mech…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.5

QEIL v2: Heterogeneous Computing for Edge Intelligence via Roofline-Derived Pareto-Optimal Energy Modeling and Multi-Objective Orchestration

2026-04-05 · Satyam Kumar, Saurabh Jha

General AI

Deploying large language models (LLMs) on heterogeneous edge devices demands frameworks that jointly optimize energy efficiency, inference quality, and reliability. Our prior QEIL v1 (Kumar & Jha, 2026) achieved 4.82x IPW improvement but relied on static efficiency factors, greedy optimization, and unverified candidate…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.5

How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings

2026-04-06 · Yujian Liu, Jiabao Ji, Li An, Tommi Jaakkola, Yang Zhang, Shiyu Chang

General AI

Agent skills, which are reusable, domain-specific knowledge artifacts, have become a popular mechanism for extending LLM-based agents, yet formally benchmarking skill usage performance remains scarce. Existing skill benchmarking efforts focus on overly idealized conditions, where LLMs are directly provided with hand-cr…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.5

When Reasoning Models Hurt Behavioral Simulation: A Solver-Sampler Mismatch in Multi-Agent LLM Negotiation

2026-04-12 · Sandro Andric

General AI

Large language models are increasingly used as agents in social, economic, and policy simulations. A common assumption is that stronger reasoning should improve simulation fidelity. We argue that this assumption can fail when the objective is not to solve a strategic problem, but to sample plausible boundedly rational …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.5

Self-Evolving LLM Memory Extraction Across Heterogeneous Tasks

2026-04-13 · Yuqing Yang, Tengxiao Liu, Wang Bill Zhu, Taiwei Shi, Linxin Song, Robin Jia

General AI

As LLM-based assistants become persistent and personalized, they must extract and retain useful information from past conversations as memory. However, the types of information worth remembering vary considerably across tasks. We formalize the heterogeneous memory extraction task and introduce BEHEMOTH, a benchmark tha…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.5

Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG

2026-04-16 · Yiqun Sun, Pengfei Wei, Lawrence B. Hsieh

General AI

Retrieval-Augmented Generation (RAG) grounds LLM responses in external evidence but treats the model as a passive consumer of search results: it never sees how the corpus is organized or what it has not yet retrieved, limiting its ability to backtrack or combine scattered evidence. We present Corpus2Skill, which distil…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.5

Scaling Test-Time Compute for Agentic Coding

2026-04-16 · Joongwon Kim, Wannan Yang, Kelvin Niu, Hongming Zhang, Yun Zhu, Eryk Helenowski, Ruan Silva, Zhengxing Chen, Srinivasan Iyer, Manzil Zaheer, Daniel Fried, Hannaneh Hajishirzi, Sanjeev Arora, Gabriel Synnaeve, Ruslan Salakhutdinov, Anirudh Goyal

General AI

Test-time scaling has become a powerful way to improve large language models. However, existing methods are best suited to short, bounded outputs that can be directly compared, ranked or refined. Long-horizon coding agents violate this premise: each attempt produces an extended trajectory of actions, observations, erro…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.5

Continual Hand-Eye Calibration for Open-world Robotic Manipulation

2026-04-17 · Fazeng Li, Gan Sun, Chenxi Liu, Yao He, Wei Cong, Yang Cong

Research Track A

Hand-eye calibration through visual localization is a critical capability for robotic manipulation in open-world environments. However, most deep learning-based calibration models suffer from catastrophic forgetting when adapting into unseen data amongst open-world scene changes, while simple rehearsal-based continual …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.5

HyCal: A Training-Free Prototype Calibration Method for Cross-Discipline Few-Shot Class-Incremental Learning

2026-04-17 · Eunju Lee, MiHyeon Kim, JuneHyoung Kwon, Yoonji Lee, JiHyun Kim, Soojin Jang, YoungBin Kim

Research Track A · General AI

Pretrained Vision-Language Models (VLMs) like CLIP show promise in continual learning, but existing Few-Shot Class-Incremental Learning (FSCIL) methods assume homogeneous domains and balanced data distributions, limiting real-world applicability where data arises from heterogeneous disciplines with imbalanced sample av…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.5

AgentSearchBench: A Benchmark for AI Agent Search in the Wild

2026-04-24 · Bin Wu, Arastun Mammadli, Xiaoyu Zhang, Emine Yilmaz

General AI

The rapid growth of AI agent ecosystems is transforming how complex tasks are delegated and executed, creating a new challenge of identifying suitable agents for a given task. Unlike traditional tools, agent capabilities are often compositional and execution-dependent, making them difficult to assess from textual descr…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.5

Step-Audio-R1.5 Technical Report

2026-04-28 · Yuxin Zhang, Xiangyu Tony Zhang, Daijiao Liu, Fei Tian, Yayue Deng, Jun Chen, Qingjian Lin, Haoyang Zhang, Yuxin Li, Jinglan Gong, Yechang Huang, Liang Zhao, Chengyuan Yao, Hexin Liu, Eng Siong Chng, Xuerui Yang, Gang Yu, Xiangyu Zhang, Daxin Jiang

General AI

Recent advancements in large audio language models have extended Chain-of-Thought (CoT) reasoning into the auditory domain, enabling models to tackle increasingly complex acoustic and spoken tasks. To elicit and sustain these extended reasoning chains, the prevailing paradigm -- driven by the success of text-based reas…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.5

Anytime Training with Schedule-Free Spectral Optimization

2026-05-21 · Anuj Apte, Pranav Deshpande, Niraj Kumar, Shouvanik Chakrabarti, Junhyung Lyle Kim

Research Track A

Standard neural network training relies on learning-rate schedules tied to a fixed horizon, leading to strong path dependence and costly re-tuning as data availability changes. Schedule-Free (SF) methods address this by removing explicit schedules, yet SF-AdamW, the current state-of-the-art anytime optimizer, consisten…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.5

Multimodal Music Recommendation System using LLMs

2026-05-28 · Srikar Prabhas Kandagatla, Sreehitha R. Narayana, Chandana Magapu, Swetha Mohan, Shamanth Kuthpadi, Hongjie Chen, Ryan A. Rossi, Franck Dernoncourt, Nesreen Ahmed

General AI

Music recommendation systems typically treat songs as opaque tokens, relying on collaborative interaction histories which overlooks semantic or acoustic content. Prior work has explored LLM-augmented, multimodal, and text-enhanced approaches to sequential recommendation, and while some methods partially combine semanti…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.5

Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism

2026-05-29 · Haoxiang Zhang, Qixin Xu, Zhuofeng Li, Lei Zhang, Pengcheng Jiang, Yu Zhang, Julian McAuley

Research Track B · General AI

Long-horizon search agents accumulate large amounts of retrieved content across many tool calls, making context-budget efficiency increasingly important. A minimal intervention is to mask stale observations from the context as the trajectory progresses, but it remains unclear when this form of context management helps …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.5

AdaCodec: A Predictive Visual Code for Video MLLMs

2026-06-01 · Haowen Hou, Zhen Huang, Zheming Liang, Qingyi Si, Chenglin Li, Shuai Dong, Kele Shao, Ruilin Li, Dianyi Wang, Nan Duan, Jiaqi Wang

General AI

Video is temporally redundant: adjacent frames usually share most objects, background, and layout. Yet existing video multimodal large language models (video MLLMs) usually encode each sampled frame as an independent RGB image, causing visual tokens to repeat content already present in earlier frames. This suggests a m…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.5

Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation

2026-06-04 · Hanxu Hu, Zdeněk Šnajdr, Pinzhen Chen, Jannis Vamvas, Rico Sennrich

Research Track A · General AI

Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific languages, with limited zero-shot transfer at test time. To translate extremely low…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.5

PBSD: Privileged Bayesian Self-Distillation for Long-Horizon Credit Assignment

2026-06-08 · Yang Tian, Rui Wang, Xumeng Wen, Junjie Li, Shizhao Sun, Lei Song, Jiang Bian, Bo Zhao

General AI

Long-horizon agentic tasks pose a fundamental credit assignment challenge for outcome-base reinforcement learning: trajectory-level rewards verify final correctness but provide limited guidance on which intermediate reasoning steps or tool interactions contribute to the outcome. The difficulty is especially pronounced …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.5

GUI-AC: Enhancing Continual Learning in GUI Agents

2026-06-09 · Can Lin, Tao Feng, Hangjie Yuan, Dan Zhang, Yifan Zhu, Zhonghong Ou

Research Track A · Research Track B · General AI

Graphical User Interfaces (GUIs) serve as the dominant medium for human-computer interaction, yet building GUI agents that generalize across the vast diversity of real-world interface environments, with the same flexibility and robustness that humans naturally exhibit, remains unsolved. Notably, GUI data are inherently…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.5

See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous Agents

2026-06-11 · Siyi Chen, Xiaoyan Zhang, Meng Wu, Jonathan Tremblay, Valts Blukis, Stan Birchfield, Rene Vidal, Alvaro Velasquez, Sijia Liu, Qing Qu

General AI

Multi-agent systems communicate mostly through text, paying a lossy and expensive decode and re-encode cost. KV-cache communication is a promising alternative, yet most prior work is homogeneous, using duplicate copies of the same model, and avoids the central challenge of cross-model latent alignment; existing heterog…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.5

DreamX-World 1.0: A General-Purpose Interactive World Model

2026-06-15 · DreamX Team, Yancheng Bai, Rui Chen, Xiangxiang Chu, Rujing Dang, Hao Dou, Bingjie Gao, Qiwen Gu, Siyu Hong, Jiachen Lei, Geng Li, Jifan Li, Ruimin Lin, Qingfeng Shi, Bingze Song, Lei Sun, Jing Tang, Ruitian Tian, Jun Wang, Jiahong Wu, Pengfei Zhang, Shen Zhang, Jiashu Zhu

General AI

DreamX-World 1.0 is a general-purpose interactive text/image-to-video world model for controllable long-horizon generation. It supports camera navigation, revisits to previously observed regions, and promptable events across photorealistic, game-style, and stylized domains. Our data engine combines camera-accurate Unre…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.4

Fast and Slow Variational Continual Learning

2026-06-22 · Subarnaduti Paul, Yohan Jung, Mohammad Emtiyaz Khan, Siddharth Swaroop, Thomas Möllenhoff, Martin Mundt

Research Track A · General AI

Continual learning remains a major challenge for modern deep networks, partly because commonly used optimizers lack inherent mechanisms for continual adaptation. One such natural mechanism is fast and slow adaptation to balance stability and plasticity. This mechanism has deep roots in neuroscience and biology, but the…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.4

FlowR2A: Learning Reward-to-Action Distribution for Multimodal Driving Planning

2026-06-23 · Xirui Li, Zhe Liu, Xiaoqing Ye, Wenhua Han, Yifeng Pan, Junyu Han, Hengshuang Zhao

General AI

Multimodal driving planning faces a long-standing tension between two paradigms: scoring-based methods benefit from dense reward supervision but are confined to a fixed action vocabulary, while anchor-based methods generate proposals dynamically yet suffer from sparse supervision constrained to a single ground-truth tr…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.4

GCT-MARL: Graph-Based Contrastive Transfer for Sample-Efficient Cooperative Multi-Agent Reinforcement Learning

2026-06-23 · Animesh Animesh, Satheesh K Perepu, Kaushik Dey

Research Track A · General AI

In cooperative multi-agent reinforcement learning (MARL), from a deployment perspective, it is challenging and expensive to train agents from scratch for each new environment or task. In this work, we propose GCT-MARL, a transfer learning framework that builds on the multi-view graph contrastive backbone of MAIL and au…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.4

Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning

2026-06-23 · Chenhao Dang, Jing Ma, Mingjie Liao

General AI

The composition of training data, governed by the diversity of sources and their mixing strategy, is a cornerstone of Large Language Model (LLM) pre-training. Online Data Mixing (ODM), the technique of adaptively adjusting data mixtures during training, has emerged as a promising direction to improve efficiency. Howeve…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

ELITE: Experiential Learning and Intent-Aware Transfer for Self-improving Embodied Agents

2026-03-25 · Bingqing Wei, Zhongyu Xia, Dingai Liu, Xiaoyu Zhou, Zhiwei Lin, Yongtao Wang

General AI

Vision-language models (VLMs) have shown remarkable general capabilities, yet embodied agents built on them fail at complex tasks, often skipping critical steps, proposing invalid actions, and repeating mistakes. These failures arise from a fundamental gap between the static training data of VLMs and the physical inter…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning

2026-03-26 · Zirui Zhang, Haoyu Dong, Kexin Pei, Chengzhi Mao

General AI

Robust perception and reasoning require consistency across sensory modalities. Yet current multimodal models often violate this principle, yielding contradictory predictions for visual and textual representations of the same concept. Rather than masking these failures with standard voting mechanisms, which can amplify …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

Vega: Learning to Drive with Natural Language Instructions

2026-03-26 · Sicheng Zuo, Yuxuan Li, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu

General AI

Vision-language-action models have reshaped autonomous driving to incorporate languages into the decision-making process. However, most existing pipelines only utilize the language modality for scene descriptions or reasoning and lack the flexibility to follow diverse user instructions for personalized driving. To addr…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

FairLLaVA: Fairness-Aware Parameter-Efficient Fine-Tuning for Large Vision-Language Assistants

2026-03-27 · Mahesh Bhosale, Abdul Wasi, Shantam Srivastava, Shifa Latif, Tianyu Luan, Mingchen Gao, David Doermann, Xuan Gong

General AI

While powerful in image-conditioned generation, multimodal large language models (MLLMs) can display uneven performance across demographic groups, highlighting fairness risks. In safety-critical clinical settings, such disparities risk producing unequal diagnostic narratives and eroding trust in AI-assisted decision-ma…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

AgentSwing: Adaptive Parallel Context Management Routing for Long-Horizon Web Agents

2026-03-29 · Zhaopeng Feng, Liangcai Su, Zhen Zhang, Xinyu Wang, Xiaotian Zhang, Xiaobin Wang, Runnan Fang, Qi Zhang, Baixuan Li, Shihao Cai, Rui Ye, Hui Chen, Jiang Yong, Joey Tianyi Zhou, Chenxiong Qian, Pengjun Xie, Bryan Hooi, Zuozhu Liu, Jingren Zhou

Research Track B · General AI

As large language models (LLMs) evolve into autonomous agents for long-horizon information-seeking, managing finite context capacity has become a critical bottleneck. Existing context management methods typically commit to a single fixed strategy throughout the entire trajectory. Such static designs may work well in so…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

See it to Place it: Evolving Macro Placements with Vision-Language Models

2026-03-30 · Ikechukwu Uchendu, Swati Goel, Karly Hou, Ebrahim Songhori, Kuang-Huei Lee, Joe Wenjie Jiang, Vijay Janapa Reddi, Vincent Zhuang

General AI

We propose using Vision-Language Models (VLMs) for macro placement in chip floorplanning, a complex optimization task that has recently shown promising advancements through machine learning methods. Because human designers rely heavily on spatial reasoning to arrange components on the chip canvas, we hypothesize that V…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

PsychAgent: An Experience-Driven Lifelong Learning Agent for Self-Evolving Psychological Counselor

2026-04-01 · Yutao Yang, Junsong Li, Qianjun Pan, Jie Zhou, Kai Chen, Qin Chen, Jingyuan Zhao, Ningning Zhou, Xin Li, Liang He

Research Track A · General AI

Existing methods for AI psychological counselors predominantly rely on supervised fine-tuning using static dialogue datasets. However, this contrasts with human experts, who continuously refine their proficiency through clinical practice and accumulated experience. To bridge this gap, we propose an Experience-Driven Li…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

Beyond Referring Expressions: Scenario Comprehension Visual Grounding

2026-04-02 · Ruozhen He, Nisarg A. Shah, Qihua Dong, Zilin Xiao, Jaywon Koo, Vicente Ordonez

General AI

Existing visual grounding benchmarks primarily evaluate alignment between image regions and literal referring expressions, where models can often succeed by matching a prominent named category. We explore a complementary and more challenging setting of scenario-based visual grounding, where the target must be inferred …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

2026-04-02 · Zhengxi Lu, Zhiyuan Yao, Jinyang Wu, Chengcheng Han, Qi Gu, Xunliang Cai, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen

General AI

Agent skills, structured packages of procedural knowledge and executable resources that agents dynamically load at inference time, have become a reliable mechanism for augmenting LLM agents. Yet inference-time skill augmentation is fundamentally limited: retrieval noise introduces irrelevant guidance, injected skill co…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

The Tool Illusion: Rethinking Tool Use in Web Agents

2026-04-03 · Renze Lou, Baolin Peng, Wenlin Yao, Qianhui Wu, Hao Cheng, Suman Nath, Wenpeng Yin, Jianfeng Gao

Research Track B · General AI

As web agents rapidly evolve, an increasing body of work has moved beyond conventional atomic browser interactions and explored tool use as a higher-level action paradigm. Although prior studies have shown the promise of tools, their conclusions are often drawn from limited experimental scales and sometimes non-compara…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

Assessing Large Language Models for Stabilizing Numerical Expression in Scientific Software

2026-04-06 · Tien Nguyen, Muhammad Ali Gulzar, Kirshanthan Sundararajah

General AI

Scientific software relies on high-precision computation, yet finite floating-point representations can introduce precision errors that propagate in safety-critical domains. Despite the growing use of large language models (LLMs) in scientific applications, their reliability in handling floating-point numerical stabili…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

2026-04-06 · Weian Mao, Xi Lin, Wei Huang, Yuxin Xie, Tianfu Fu, Bohan Zhuang, Song Han, Yukang Chen

General AI

Extended reasoning in large language models (LLMs) creates severe KV cache memory bottlenecks. Leading KV cache compression methods estimate KV importance using attention scores from recent post-RoPE queries. However, queries rotate with position during RoPE, making representative queries very few, leading to poor top-…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

Leveraging Image Editing Foundation Models for Data-Efficient CT Metal Artifact Reduction

2026-04-07 · Ahmet Rasim Emirdagi, Süleyman Aslan, Mısra Yavuz, Görkay Aydemir, Yunus Bilge Kurt, Nasrin Rahimi, Burak Can Biner, M. Akın Yılmaz

General AI

Metal artifacts from high-attenuation implants severely degrade CT image quality, obscuring critical anatomical structures and posing a challenge for standard deep learning methods that require extensive paired training data. We propose a paradigm shift: reframing artifact reduction as an in-context reasoning task by a…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

Lightweight LLM Agent Memory with Small Language Models

2026-04-09 · Jiaquan Zhang, Chaoning Zhang, Shuxu Chen, Zhenzhen Huang, Pengcheng Zheng, Zhicheng Wang, Ping Guo, Fan Mo, Sung-Ho Bae, Jie Zou, Jiwei Wei, Yang Yang

Research Track A · General AI

Although LLM agents can leverage tools for complex tasks, they still need memory to maintain cross-turn consistency and accumulate reusable information in long-horizon interactions. However, retrieval-based external memory systems incur low online overhead but suffer from unstable accuracy due to limited query construc…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

CASK: Core-Aware Selective KV Compression for Reasoning Traces

2026-04-13 · Buseong Kim, Heejun Gwon

Research Track A · General AI

In large language models performing long-form reasoning, the KV cache grows rapidly with decode length, creating bottlenecks in memory and inference stability. Existing reasoning-oriented KV compression has mostly followed an eviction-centered view: estimate token importance more accurately, then discard lower-ranked e…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

GenTac: Generative Modeling and Forecasting of Soccer Tactics

2026-04-13 · Jiayuan Rao, Tianlin Gui, Haoning Wu, Yanfeng Wang, Weidi Xie

General AI

Modeling open-play soccer tactics is a formidable challenge due to the stochastic, multi-agent nature of the game. Existing computational approaches typically produce single, deterministic trajectory forecasts or focus on highly structured set-pieces, fundamentally failing to capture the inherent variance and branching…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

A Sanity Check on Composed Image Retrieval

2026-04-14 · Yikun Liu, Jiangchao Yao, Weidi Xie, Yanfeng Wang

General AI

Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image, and a relative caption that specifies the desired modification. Despite the rapid development of CIR models, their performance is not well characterized by existing benchmarks, which inherently contain indeter…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

Modeling Co-Pilots for Text-to-Model Translation

2026-04-14 · Serdar Kadioglu, Karthik Uppuluri, Akash Singirikonda

General AI

There is growing interest in leveraging large language models (LLMs) for text-to-model translation and optimization tasks. This paper aims to advance this line of research by introducing \textsc{Text2Model} and \textsc{Text2Zinc}. \textsc{Text2Model} is a suite of co-pilots based on several LLM strategies with varying …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

Compressing Sequences in the Latent Embedding Space: $K$-Token Merging for Large Language Models

2026-04-16 · Zihao Xu, John Harvill, Ziwei Fan, Yizhou Sun, Hao Ding, Hao Wang

General AI

Large Language Models (LLMs) incur significant computational and memory costs when processing long prompts, as full self-attention scales quadratically with input length. Token compression aims to address this challenge by reducing the number of tokens representing inputs. However, existing prompt-compression approache…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

Beyond Distribution Sharpening: The Importance of Task Rewards

2026-04-17 · Sarthak Mittal, Leo Gagnon, Guillaume Lajoie

General AI

Frontier models have demonstrated exceptional capabilities following the integration of task-reward-based reinforcement learning (RL) into their training pipelines, enabling systems to evolve from pure reasoning models into sophisticated agents. However, debate persists regarding whether RL genuinely instills new skill…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

Evaluating the Progression of Large Language Model Capabilities for Small-Molecule Drug Design

2026-04-17 · Shriram Chennakesavalu, Kirill Shmilovich, Hayley Weir, Colin Grambow, John Bradshaw, Patricia Suriana, Chen Cheng, Kangway Chuang

General AI

Large Language Models (LLMs) have the potential to accelerate small molecule drug design due to their ability to reason about information from diverse sources and formats. However, their practical utility remains unclear due to the lack of benchmarks that reflect real-world scenarios. In this work, we introduce a suite…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

FUSE: Ensembling Verifiers with Zero Labeled Data

2026-04-20 · Joonhyuk Lee, Virginia Ma, Sarah Zhao, Yash Nair, Asher Spector, Regev Cohen, Emmanuel J. Candès

General AI

Verification of model outputs is rapidly emerging as a key primitive for both training and real-world deployment of large language models (LLMs). In practice, this often involves using imperfect LLM judges and reward models since ground truth acquisition can be time-consuming and expensive. We introduce Fully Unsupervi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

Omni-Embed-Audio: Leveraging Multimodal LLMs for Robust Audio-Text Retrieval

2026-04-20 · HaeJun Yoo, Yongseop Shin, Insung Lee, Myoung-Wan Koo, Du-Seong Chang

General AI

Audio-text retrieval systems based on Contrastive Language-Audio Pretraining (CLAP) achieve strong performance on traditional benchmarks; however, these benchmarks rely on caption-style queries that differ substantially from real-world search behavior, limiting their assessment of practical retrieval robustness. We pre…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

Revisiting Change VQA in Remote Sensing with Structured and Native Multimodal Qwen Models

2026-04-20 · Yakoub Bazi, Mohamad M. Al Rahhal, Mansour Zuair, Faroun Mohamed

General AI

Change visual question answering (Change VQA) addresses the problem of answering natural-language questions about semantic changes between bi-temporal remote sensing (RS) images. Although vision-language models (VLMs) have recently been studied for temporal RS image understanding, Change VQA remains underexplored in th…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

When Can LLMs Learn to Reason with Weak Supervision?

2026-04-20 · Salman Rahman, Jingyan Shen, Anna Mordvina, Hamid Palangi, Saadia Gabriel, Pavel Izmailov

General AI

Large language models have achieved significant reasoning improvements through reinforcement learning with verifiable rewards (RLVR). Yet as model capabilities grow, constructing high-quality reward signals becomes increasingly difficult, making it essential to understand when RLVR can succeed under weaker forms of sup…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

PlayCoder: Making LLM-Generated GUI Code Playable

2026-04-21 · Zhiyuan Peng, Wei Tao, Xin Yin, Chenhao Ying, Yuan Luo, Yiwen Guo

General AI

Large language models (LLMs) have achieved strong results in code generation, but their ability to generate GUI applications, especially games, remains insufficiently studied. Existing benchmarks mainly evaluate correctness through test cases, which are inadequate for GUI applications because these systems are interact…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

MCP Pitfall Lab: Exposing Developer Pitfalls in MCP Tool Server Security under Multi-Vector Attacks

2026-04-23 · Run Hao, Zhuoran Tan

General AI

Model Context Protocol (MCP) is increasingly adopted for tool-integrated LLM agents, but its multi-layer design and third-party server ecosystem expand risks across tool metadata, untrusted outputs, cross-tool flows, multimodal inputs, and supply-chain vectors. Existing MCP benchmarks largely measure robustness to mali…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

TEMA: Anchor the Image, Follow the Text for Multi-Modification Composed Image Retrieval

2026-04-23 · Zixu Li, Yupeng Hu, Zhiheng Fu, Zhiwei Chen, Yongqi Li, Liqiang Nie

General AI

Composed Image Retrieval (CIR) is an important image retrieval paradigm that enables users to retrieve a target image using a multimodal query that consists of a reference image and modification text. Although research on CIR has made significant progress, prevailing setups still rely simple modification texts that typ…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

LARA: Validation-Driven Agentic Supercomputer Workflows for Atomistic Modeling

2026-04-24 · William Dawson, Louis Beal, Yoann Curé, Giuseppe Fisicaro, Dorian Rolland, Luigi Genovese

General AI

Large language models (LLMs) and agentic systems have recently demonstrated potential for automating scientific workflows, including atomistic simulations. However, their deployment in high-performance computing (HPC) environments remains limited by the lack of mechanisms ensuring correctness, reproducibility, and safe…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

Seeing the Whole Elephant: A Benchmark for Failure Attribution in LLM-based Multi-Agent Systems

2026-04-24 · Mengzhuo Chen, Junjie Wang, Fangwen Mu, Yawen Wang, Zhe Liu, Huanxiang Feng, Qing Wang

General AI

Failure attribution, i.e., identifying the responsible agent and decisive step of a failure, is particularly challenging in LLM-based multi-agent systems (MAS) due to their natural-language reasoning, nondeterministic outputs, and intricate interaction dynamics. A reliable benchmark is therefore essential to guide and …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

DepthKV: Layer-Dependent KV Cache Pruning for Long-Context LLM Inference

2026-04-27 · Zahra Dehghanighobadi, Asja Fischer

General AI

Long-context reasoning is a critical capability of large language models (LLMs), enabling applications such as long-document understanding, summarization, and code generation. However, efficient autoregressive inference relies on the key-value (KV) cache, whose memory footprint grows linearly with sequence length, lead…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

ADEMA: A Knowledge-State Orchestration Architecture for Long-Horizon Knowledge Synthesis with LLMAgents

2026-04-28 · Zhou Hanlin, Chan Huah Yong

General AI

Long-horizon LLM tasks often fail not because a single answer is unattainable, but because knowledge states drift across rounds, intermediate commitments remain implicit, and interruption fractures the evolving evidence chain. This paper presents ADEMA as a knowledge-state orchestration architecture for long-horizon kn…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

SIEVES: Selective Prediction Generalizes through Visual Evidence Scoring

2026-04-28 · Hector G. Rodriguez, Marcus Rohrbach

General AI

Multimodal large language models (MLLMs) achieve ever-stronger performance on visual-language tasks. Even as traditional visual question answering benchmarks approach saturation, reliable deployment requires satisfying low error tolerances in real-world out-of-distribution (OOD) scenarios. Precisely, selective predicti…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

"I Strongly Suspect This Website Is a Scam": Benchmarking PII Leakage and Detection without Defense in Autonomous Web Agents

2026-05-30 · Soham Roy, Sarthakbrata Halder, Arya Bharaty, Vaibhav Bhaskar, Yash Sinha, Dhruv Kumar, Srikant Panda, Murari Mandal

Research Track B · General AI

Deceptive web content, widely instantiated across the internet and commonly known as \textit{social-engineering attacks}, manipulates autonomous web agents into submitting users' personally identifiable information (PII) to attacker-controlled endpoints. In this paper, we show that social-engineering attacks are highly…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

Projecting the Emerging Mindset of SWE Agent by Launching a Wild Code Understanding Journey

2026-06-07 · Zhengyi Zhuo, Yan Liu

General AI

Software engineering agents (SWE agents) increasingly work through tool-mediated trajectories in real repositories, yet their behavior remains difficult to characterize in concrete, observable terms. These trajectories record tool use, intermediate reasoning, evidence selection, and self-directed stopping, but they do …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

An Agency-Transferring Model-Free Policy Enhancement Technique

2026-06-08 · Anton Bolychev, Georgiy Malaniya, Sinan Ibrahim, Pavel Osinenko

General AI

Training reinforcement learning (RL) policies from scratch is costly: it requires careful reward and environment design, extensive tuning, and substantial computation. Yet many control problems already have a functional but suboptimal policy available as a baseline. This paper proposes a method for embedding such a bas…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

Transition-Based Digital Twin Modelling for Alzheimer's Disease under Sparse Longitudinal Data

2026-06-08 · Yinyu Huang, Yilin Zhang, Sofia Michopoulou, Christopher Kipps, Rahman Attar

General AI

Alzheimer's disease (AD) progression is highly heterogeneous and is typically observed through sparse and irregular longitudinal data, posing challenges for prediction and personalised monitoring. Existing machine learning approaches have improved AD prediction using multimodal data, yet often focus on static classific…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

DarkAgents

2026-06-09 · Michele Lucente, Silvia Pascoli, Filippo Sala, Matteo Zandi

General AI

We present DarkAgents: a multi-agent system that leverages the reasoning and code-generation capabilities of large language models (LLMs), together with deterministic tested human-written code, to build orchestrated pipelines for theoretical astroparticle physics research. While related approaches have been proposed in…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories

2026-06-09 · Kevin Qinghong Lin, Batu EI, Yuhong Shi, Pan Lu, Philip Torr, James Zou

General AI

Data tells stories that shape society; the data journalist's job is to turn raw information into stories non-experts can trust. A high-quality news feature takes a newsroom team weeks: hunting for context, running statistics, choosing an angle, and designing visuals. Recent agents handle individual steps well: data-sci…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

HyperTool: Beyond Step-Wise Tool Calls for Tool-Augmented Agents

2026-06-11 · Yaxin Du, Yifan Zhou, Yujie Ge, Jiajun Wang, Xianghe Pang, Shuo Tang, Tuney Zheng, Bryan Dai, Jian Yang, Siheng Chen

General AI

Tool-augmented LLM agents commonly rely on step-wise atomic tool calls, where each invocation, observation, and value transfer is exposed in the main reasoning trace. This creates an \emph{execution-granularity mismatch}: locally deterministic tool workflows are unfolded into repeated model-visible decisions, consuming…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

TokenPilot: Cache-Efficient Context Management for LLM Agents

2026-06-15 · Buqiang Xu, Zirui Xue, Dianmou Chen, Chenyang Fu, Chiyu Wu, Caiying Huang, Chen Jiang, Jizhan Fang, Xinle Deng, Yijun Chen, Yunzhi Yao, Xuehai Wang, Jin Shang, Gong Yu, Ningyu Zhang

General AI

As LLM agents are deployed in long-horizon sessions, context accumulation drives up inference costs. Existing approaches utilize text pruning or dynamic memory eviction to minimize token footprints; however, their unconstrained sequence mutations alter layouts, introducing prefix mismatches and cache invalidation. This…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

A Red-Team Study of Anthropic Fable 5 & Opus 4.8 Models

2026-06-16 · Nicola Franco

General AI

We evaluate the adversarial robustness of two frontier large language models (LLMs) developed by Anthropic, Fable 5 and Opus 4.8, against four families of automated jailbreak attack across 7 826 harmful intents spanning a ten-category harm taxonomy. Using the HackAgent red-teaming framework, hundreds of thousands of ad…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

MLLMs Get It Right, Then Get It Wrong: Tracing and Correcting Late-Layer Textual Bias

2026-06-16 · Xingming Li, Ao Cheng, Qiyao Sun, Xixiang He, Xuanyu Ji, Runke Huang, Qingyong Hu

General AI

When vision contradicts text, multimodal large language models (MLLMs) consistently favor text, even when images provide clear evidence otherwise. This bias poses risks for applications requiring visual grounding, yet its cause remains unclear. In this paper, we uncover a surprising finding: models often get it right i…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

ReproRepo: Scaling Reproducibility Audits with GitHub Repository Issues

2026-06-16 · Shanda Li, Qiuhong Anna Wei, Jingwu Tang, Valerie Chen, Nihar B Shah, Tim Dettmers, Yiming Yang, Ameet Talwalkar

General AI

Reproducing research results from papers and released code is central to scientific progress. Existing works have introduced benchmarks to evaluate whether LLM agents can assist with reproducibility, but they are difficult to scale due to their reliance on substantial manual effort for data curation and evaluation. We …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

A Mixed-Reality Testbed for Autonomous Vehicles

2026-06-17 · H. M. Sabbir Ahmad, Ehsan Sabouni, Emrullah Celik, Zean Wan, Damola Ajeyemi, Christos G. Cassandras, Wenchao Li

General AI

We propose a mixed-reality, hardware-in-the-loop (HIL) testbed for autonomous vehicles that seamlessly integrates a physical testbed of mobile robots with a high-fidelity simulation environment. The virtual simulation enables the creation of diverse, safety-critical driving scenarios to validate state-of-the-art percep…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.3

SAOT: Self-Supervised Continual Graph Learning with Structure-Aware Optimal Transport

2026-07-01 · Yuting Zhang, Yanbei Liu, Zhitao Xiao, Lei Geng, Yanwei Pang, Xiao Wang

Research Track A · General AI

Self-supervised Continual Graph Learning (CGL) aims to successively learn from a graph sequence with different tasks without label supervision - a paradigm that has attracted widespread attention. Most existing self-supervised CGL methods rely on instance-level consistency objectives that enforce stability of individua…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.2

Grad Detect: Gradient-Based Hallucination Detection in LLMs

2026-06-23 · Anand Kamat, Daniel Blake, Brent M. Werness

General AI

Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse tasks, yet they remain prone to generating hallucinations. Detecting these hallucinations is critical for deploying LLMs reliably in high-stakes applications. We present Grad Detect, a gradient-based approach for predicting hallucinat…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.2

PHANTOM: A Large-Scale Dataset of Multimodal Adversarial Attacks for Vision-Language Models

2026-06-23 · Simone Gallivanone, Hossein Khodadadi, Mauro Dore, Mauro Medda, Nicola Franco

General AI

We introduce a large-scale, open-source dataset of pre-generated adversarial attacks for vision-language models (VLMs). The dataset is designed to be diverse, representative, and practical, extending existing benchmarks by covering 10 high-level categories and 55 subcategories of harmful intents. Our primary goal is to…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.2

AI translation of literary texts is "fine", but readers still prefer human translations

2026-06-24 · Yves Ferstler, Adam Podoxin, Ty Brassington, Roman Grundkiewicz, Maite Taboada, Marzena Karpinska

General AI

AI translation of literary works is increasingly common. While the content may be rendered adequately, we do not know enough about how readers experience it in terms of immersiveness and literary effect, aspects poorly captured by automatic machine translation metrics or human evaluation targeting fluency and adequacy.…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.2

How Robust is OCR-Reasoning? Evaluating OCR-Reasoning Robustness of Vision-Language Models under Visual Perturbations

2026-06-24 · Yuxing Cheng, Yuan Wu, Yi Chang

Research Track A · General AI

Vision-language models (VLMs) have achieved strong performance on OCR-based benchmarks and increasingly focused on text-rich understanding, but their robustness under controlled visual degradation remains insufficiently understood. This gap is critical for OCR reasoning, where visual corruption can induce OCR errors an…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.2

RoboAtlas: Contextual Active SLAM

2026-06-24 · Alexander Schperberg, Shivam K. Panda, Abraham P. Vinod, M. K. Jawed, Stefano Di Cairano

General AI

We present RoboAtlas, a contextual Active SLAM framework that adaptively balances geometric exploration and semantic reasoning using a scalable 3D semantic mapping system, OpenRoboVox. RoboAtlas integrates frontier exploration, global semantic-map reasoning, and egocentric VLM-based reasoning through a contextual multi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.0

LACE: Loss-Adaptive Capacity Expansion for Continual Learning

2026-03-30 · Shivnath Tathe

Research Track A

Fixed representational capacity is a fundamental constraint in continual learning: practitioners must guess an appropriate model width before training, without knowing how many distinct concepts the data contains. We propose LACE (Loss-Adaptive Capacity Expansion), a simple online mechanism that expands a model's repre…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.0

Brainstacks: Cross-Domain Cognitive Capabilities via Frozen MoE-LoRA Stacks for Continual LLM Learning

2026-04-01 · Mohammad R. Abu Ayyash

Research Track A · General AI

We present Brainstacks, a modular architecture for continual multi-domain fine-tuning of large language models that packages domain expertise as frozen adapter stacks composing additively on a shared frozen base at inference. Five interlocking components: (1) MoE-LoRA with Shazeer-style noisy top-2 routing across all s…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.0

When Users Change Their Mind: Evaluating Interruptible Agents in Long-Horizon Web Navigation

2026-04-01 · Henry Peng Zou, Chunyu Miao, Wei-Chieh Huang, Yankai Chen, Yue Zhou, Hanrong Zhang, Yaozu Wu, Liancheng Fang, Zhengyao Gu, Zhen Zhang, Kening Zheng, Fangxin Wang, Yi Nian, Shanghao Li, Wenzhe Fan, Langzhou He, Weizhi Zhang, Xue Liu, Philip S. Yu

Research Track B · General AI

As LLM agents transition from short, static problem solving to executing complex, long-horizon tasks in dynamic environments, the ability to handle user interruptions, such as adding requirement or revising goals, during mid-task execution is becoming a core requirement for realistic deployment. However, existing bench…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.0

Adaptive Data Dropout: Towards Self-Regulated Learning in Deep Neural Networks

2026-04-14 · Amar Gahir, Varshil Patel, Shreyank N Gowda

Research Track A · General AI

Deep neural networks are typically trained by uniformly sampling large datasets across epochs, despite evidence that not all samples contribute equally throughout learning. Recent work shows that progressively reducing the amount of training data can improve efficiency and generalization, but existing methods rely on f…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.0

PRL-Bench: A Comprehensive Benchmark Evaluating LLMs' Capabilities in Frontier Physics Research

2026-04-16 · Tingjia Miao, Wenkai Jin, Muhua Zhang, Jinxin Tan, Yuelin Hu, Tu Guo, Jiejun Zhang, Yuhan Wang, Wenbo Li, Yinuo Gao, Shuo Chen, Weiqi Jiang, Yayun Hu, Zixing Lei, Xianghe Pang, Zexi Liu, Yuzhi Zhang, Linfeng Zhang, Kun Chen, Wei Wang, Weinan E, Siheng Chen

General AI

The paradigm of agentic science requires AI systems to conduct robust reasoning and engage in long-horizon, autonomous exploration. However, current scientific benchmarks remain confined to domain knowledge comprehension and complex reasoning, failing to evaluate the exploratory nature and procedural complexity of real…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.0

Why Fine-Tuning Encourages Hallucinations and How to Fix It

2026-04-16 · Guy Kaplan, Zorik Gekhman, Zhen Zhu, Lotem Rozner, Yuval Reif, Swabha Swayamdipta, Derek Hoiem, Roy Schwartz

Research Track A · General AI

Large language models are prone to hallucinating factually incorrect statements. A key source of these errors is exposure to new factual information through supervised fine-tuning (SFT), which can increase hallucinations w.r.t. knowledge acquired during pre-training. In this work, we explore whether SFT-induced halluci…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.0

Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning

2026-05-01 · Chengshuai Shi, Wenzhe Li, Xinran Liang, Yizhou Lu, Wenjia Yang, Ruirong Feng, Seth Karten, Ziran Yang, Zihan Ding, Gabriel Sarch, Danqi Chen, Karthik Narasimhan, Chi Jin

General AI

Given the rapidly growing capabilities of vision-language models (VLMs), extending them to interactive decision-making tasks such as video games has emerged as a promising frontier. However, existing approaches either rely on large-scale supervised fine-tuning (SFT) on human trajectories or apply reinforcement learning…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.0

Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

2026-05-07 · Yuxing Liu, Jianyu Wang, Tong Zhang

Research Track A · General AI

Optimizers play an important role in both pretraining and finetuning stages when training large language models (LLMs). In this paper, we present an observation that full finetuning with the same optimizer as in pretraining achieves a better learning-forgetting tradeoff, i.e., forgetting less while achieving the same o…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.0

Large Language Models over Networks: Collaborative Intelligence under Resource Constraints

2026-05-09 · Liangqi Yuan, Wenzhi Fang, Shiqiang Wang, H. Vincent Poor, Christopher G. Brinton

General AI

Large language models (LLMs) are transforming society, powering applications from smartphone assistants to autonomous driving. Yet cloud-based LLM services alone cannot serve a growing class of applications, including those operating under intermittent connectivity, sub-second latency budgets, data-residency constraint…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.0

An Executable Benchmarking Suite for Tool-Using Agents

2026-05-10 · Zhiqing Zhong, Zhijing Ye, Jiamin Wang, Xiaodong Yu

Research Track B · General AI

Closed-loop tool-using agents are increasingly evaluated in executable web, code, and micro-task environments, but benchmark reports often conflate workloads, action-generating drivers, and the evidence admitted for systems-facing claims. We present an executable benchmarking suite that makes these objects explicit und…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.0

Don't Click That: Teaching Web Agents to Resist Deceptive Interfaces

2026-05-10 · Yilin Zhang, Yingkai Hua, Chunyu Wei, Xin Wang, Yueguo Chen

Research Track B · General AI

Vision-language model (VLM) based web agents demonstrate impressive autonomous GUI interaction but remain vulnerable to deceptive interface elements. Existing approaches either detect deception without task integration or document attacks without proposing defenses. We formalize deception-aware web agent defense and pr…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.0

Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction

2026-05-12 · Zhong Guan, Yongjian Guo, Haoran Sun, Wen Huang, Shuai Di, Xiong Jun Wu, Likang Wu, Hongke Zhao

General AI

Asynchronous reinforcement learning improves rollout throughput for large language model agents by decoupling sample generation from policy optimization, but it also introduces a critical failure mode for PPO-style off-policy correction. In heterogeneous training systems, the total importance ratio should ideally be de…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.0

WARD: Adversarially Robust Defense of Web Agents Against Prompt Injections

2026-05-14 · Tri Cao, Yulin Chen, Hieu Cao, Yibo Li, Khoi Le, Thong Nguyen, Yuexin Li, Yufei He, Yue Liu, Shuicheng Yan, Bryan Hooi

Research Track B · General AI

Web agents can autonomously complete online tasks by interacting with websites, but their exposure to open web environments makes them vulnerable to prompt injection attacks embedded in HTML content or visual interfaces. Existing guard models still suffer from limited generalization to unseen domains and attack pattern…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.0

ShopGym: An Integrated Framework for Realistic Simulation and Scalable Benchmarking of E-Commerce Web Agents

2026-05-15 · Chinmay Savadikar, Mingyu Zhao, Yuanzheng Zhu, Han Li, Shuang Xie, Alberto Castelo, Tianfu Wu, Lingyun Wang

Research Track B · General AI

Developing and evaluating e-commerce web agents requires environments that preserve meaningful task structure while enabling controllable, reproducible, and scalable scientific comparison. Existing methodologies force a tradeoff: live storefronts provide realism but are non-stationary, difficult to inspect, and irrepro…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.0

Skim: Speculative Execution for Fast and Efficient Web Agents

2026-05-15 · Mike Wong, Kevin Hsieh, Suman Nath, Ravi Netravali

Research Track B · General AI

Skim is a speculative execution framework for web agents that exploits the predictable structure of purpose-built websites. Today's web-agent expense is not intrinsic to the tasks but a property of how agents are composed: frontier-model inference, browser rendering, and ReAct-style planning are applied to every step o…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.0

See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding

2026-05-18 · Boyuan Sun, Bowen Yin, Yuanming Li, Xihan Wei, Qibin Hou

General AI

We present SWIM (See What I Mean), a novel training strategy that aligns vision and language representations to enable fine-grained object understanding solely from textual prompts. Unlike existing approaches that require explicit visual prompts, such as masks or points, SWIM leverages mask supervision only during trai…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.0

SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

2026-05-28 · Yunbo Tang, Chengyi Yang, Shiyu Liu, Zhishang Xiang, Zerui Chen, Qinggang Zhang, Jinsong Su

General AI

Agentic search enables LLMs to solve complex multi-hop questions through iterative reasoning and external search. Despite the effectiveness, these systems often suffer from a critical limitation in practice: agents fail to recognize their own knowledge boundaries, blindly triggering searches when internal knowledge suf…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.0

The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary

2026-05-29 · Dongxin Guo, Jikun Wu, Siu Ming Yiu

Research Track B · General AI

Extended chain-of-thought reasoning can degrade performance on deterministic state-tracking tasks, not due to preference biases, but limits rooted in the information-theoretic capacity of decoder-only attention. We establish: (1) an Attention Bottleneck Theorem with a complementary achievability construction, bounding …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.0

BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling

2026-06-08 · Gianluca Barmina, Annemette Broch Pirchert, Andrea Blasi Núñez, Lukas Galke Poech, Peter Schneider-Kamp

Research Track A · General AI

As deep learning models scale, managing, inspecting, and modifying large checkpoints has become increasingly challenging. Researchers often need to alter model weights for layer restructuring, precision casting, low-rank factorization, and architectural debugging, yet these workflows often rely on fragile ad-hoc Python…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.0

Accurate and Resource-Efficient Federated Continual Learning

2026-06-09 · Jebacyril Arockiaraj, Dhruv Parikh, Jayashree Adivarahan, Rajgopal Kannan, Viktor Prasanna

Research Track A · General AI

Federated continual learning (FCL) must learn from distributed task streams under limited resources, such as communication, computation, memory, and label availability. Existing FCL methods often rely on repeated local optimization, replay, and full supervision. Analytic alternatives avoid iterative training and replay…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.0

GRASP: Gradient-Aligned Sequential Parameter Transfer for Memory-Efficient Multi-Source Learning

2026-06-12 · Mary Isabelle Wisell, Nicholas Jacobs, Aayush Manandhar, Salimeh Yasaei Sekeh

Research Track A · General AI

Multi-source transfer learning faces a fundamental scalability bottleneck: existing approaches require either loading all K source models into memory simultaneously during parameter fusion, requiring O(K) memory, or deploying all models at inference time, making production deployment infeasible. We propose GRASP (Gradi…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 12.0

SafePyramid: A Hierarchical Benchmark for In-context Policy Guardrailing

2026-06-29 · Jiacheng Zhang, Haoyu He, Sen Zhang, Shen Wang, Xiaolei Xu, Yuhao Sun, Meng Shen, Feng Liu

General AI

In real-world applications, guardrails are often expected to identify unsafe user-model interactions according to application-specific safety policies, rather than relying on predefined risk taxonomies. In this work, we study this setting under the paradigm of in-context policy guardrailing, where guardrails predict sa…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.9

Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents

2026-06-25 · Minbyul Jeong

Research Track B · General AI

Web-agent benchmarks overwhelmingly measure depth -- pinning one obscure answer behind a chain of constraints -- while breadth, exhaustively enumerating a closed set and filling each item's attributes, is barely evaluated, especially outside English. Breadth is also hard to build: certifying that a gold set is complete…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.9

Parameter-Efficient Quantum-Inspired Fast Weight Programmers for Traffic-Matrix Forecasting

2026-06-26 · Kuo-Chung Peng, Jiun-Cheng Jiang, Chun-Hua Lin, Tai-Yue Li, Nan-Yow Chen, Samuel Yen-Chi Chen

General AI

Traffic matrices (TMs) capture network-wide origin-destination demand and are central to traffic engineering, yet accurate whole-matrix forecasting remains challenging when prediction must be performed under the memory, update, and training-budget constraints of online network control. This paper investigates whether c…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 11.8

Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detection

2026-05-04 · Mohamad Khajezade, Fatemeh H. Fard, Mohamed Sami Shehata

General AI

Cross-language code clone detection (X-CCD) is challenging because semantically equivalent programs written in different languages often share little surface similarity. Although large language models (LLMs) have shown promise for semantic clone detection, their use as black-box systems raises concerns about cost, repr…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.8

Visual Latents Know More Than They Say: Unsilencing Latent Reasoning in MLLMs

2026-05-04 · Xin Zhang, Qiqi Tao, Jiawei Du, Moyun Liu, Joey Tianyi Zhou

General AI

Continuous latent-space reasoning offers a compact alternative to textual chain-of-thought for multimodal models, enabling high-dimensional visual evidence to be integrated without explicit reasoning tokens. However, we identify a previously overlooked optimization pathology in existing latent visual reasoning methods:…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.8

OpenWatch: A Multimodal Benchmark for Hand Gesture Recognition on Smartwatches

2026-05-06 · Pietro Bonazzi, Youssef Ahmed, Daniel Eckert, Andrea Ronco, Junjie Zeng, Dengxin Dai, Michele Magno

General AI

Despite widespread adoption of smartwatches worldwide, open-benchmarks for wrist-based gesture recognition remain surprisingly limited. In this work, we introduce the first open-access multi-modal benchmark, OpenWatch, for wrist-based gesture recognition using synchronized inertial and physiological sensing on a commer…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.8

Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study

2026-05-07 · Hao Dong, Hongzhao Li, Shupan Li, Muhammad Haris Khan, Eleni Chatzi, Olga Fink

General AI

Despite the growing popularity of Multimodal Domain Generalization (MMDG) for enhancing model robustness, it remains unclear whether reported performance gains reflect genuine algorithmic progress or are artifacts of inconsistent evaluation protocols. Current research is fragmented, with studies varying significantly a…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.8

Debiased Multimodal Personality Understanding through Dual Causal Intervention

2026-05-07 · Yangfu Zhu, Zitong Han, Nianwen Ning, Yuting Wei, Yuandong Wang, Hang Feng, Zhenzhou Shao

General AI

Multimodalpersonalityunderstandingplaysacriticalroleinhuman centered artificial intelligence. Previous work mainly focus on learn-ing rich multimodal representations for video personality under standing. However, they often suffer from potential harm caused by subject bias (e.g., observable age and unobservable mental …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.8

MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems

2026-05-07 · Zhexuan Wang, Xuebo Liu, Li Wang, Zifei Shan, Yutong Wang, Zhenxi Song, Min Zhang

General AI

Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts. While the quality of these prompts is pivotal, jointly optimizing them across interacting agents remains a non-trivial challenge, prim…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.8

Epistemic Uncertainty for Test-Time Discovery

2026-05-11 · Kainat Riaz, Muhammad Ahmed Mohsin, Ahsan Bilal, Muhammad Umer, Ayesha Mohsin, Aqib Riaz, Ali Subhan, John M. Cioffi

General AI

Automated scientific discovery using large language models relies on identifying genuinely novel solutions. Standard reinforcement learning penalizes high-variance mutations, which leads the policy to prioritize familiar patterns. As a result, the maximum reward plateaus even as the average reward increases. Overcoming…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.8

G$^2$TR: Generation-Guided Visual Token Reduction for Separate-Encoder Unified Multimodal Models

2026-05-12 · Junxian Li, Kai Liu, Zizhong Ding, Zhixin Wang, Zhikai Chen, Renjing Pei, Yulun Zhang

General AI

The development of separate-encoder Unified multimodal models (UMMs) comes with a rapidly growing inference cost due to dense visual token processing. In this paper, we focus on understanding-side visual token reduction for improving the efficiency of separate-encoder UMMs. While this topic has been widely studied for …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.8

OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation

2026-05-12 · Guohui Zhang, XiaoXiao Ma, Jie Huang, Hang Xu, Hu Yu, Siming Fu, Yuming Li, Zeyue Xue, Lin Song, Haoyang Huang, Nan Duan, Feng Zhao

General AI

Recent advances in joint audio-video generation have been remarkable, yet real-world applications demand strong per-modality fidelity, cross-modal alignment, and fine-grained synchronization. Reinforcement Learning (RL) offers a promising paradigm, but its extension to multi-objective and multi-modal joint audio-video …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.8

Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling

2026-05-20 · Caleb Winston, Ron Yifeng Wang, Azalia Mirhoseini, Christos Kozyrakis

Research Track B · General AI

Computer-use agents (CUA) automate tasks specified with natural language such as "order the cheapest item from Taco Bell" by generating sequences of calls to tools such as click, type, and scroll on a browser. Current implementations follow a sequential fetch-screenshot-execute loop where each iteration requires an LLM…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.8

Archon: A Unified Multimodal Model for Holistic Digital Human Generation

2026-05-28 · Chong Bao, Shichen Liu, Lijun Yu, David Futschik, Stylianos Moschoglou, Shefali Srivastava, Ziqian Bai, Feitong Tan, Guofeng Zhang, Zhaopeng Cui, Sean Fanello, Yinda Zhang

General AI

Digital humans are fundamental to immersive interaction, yet creating a unified model for holistic modalities, including text, audio, motion, and visual content, remains an open challenge. In this paper, we present Archon, a fully pretrained, human-centric unified multimodal model for holistic avatar generation. Archon…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.8

On the Construction and Implications of Low-Loss Valleys in LoRA-based Bayesian Inference

2026-05-28 · Daniel Dold, Emanuel Sommer, Julius Kobialka, Oliver Dürr, David Rügamer

General AI

While parameter-efficient fine-tuning methods like low-rank adaptation (LoRA) are standard for large language models, principled estimation of epistemic uncertainty remains challenging. Recent results in the LoRA regime suggest that discrete multi-mode approaches such as deep ensembles offer little benefit over single-…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.8

Reasoning with Sampling: Cutting at Decision Points

2026-05-28 · Felix Zhou, Anay Mehrotra, Quanquan C. Liu

General AI

Frontier reasoning models are produced by posttraining base language models with reinforcement learning. Recent work has challenged this by showing that sampling from a sharpened version of the base model's distribution, a so-called power distribution, elicits comparable reasoning without additional training, curated d…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.8

Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection

2026-05-28 · Xiaona Zhou, Muntasir Wahed, Tianjiao Yu, Constantin Brif, Ismini Lourentzou

General AI

Recent advances in Vision-Language Models (VLMs) have achieved impressive performance across many tasks, yet prior studies report unsatisfactory performance when applying large language or multimodal models to finding abnormal patterns in sequential data. Public anomaly detection benchmarks typically provide interval a…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.8

Attractor States Emerge in Multi-Turn LLM Conversations

2026-06-29 · Ting-Wen Ko, Jonas Geiping

General AI

Large language models (LLMs) are increasingly used in open-ended multi-agent settings, but the long-run dynamics of model--model interaction remain poorly understood. We study whether open-ended LLM discussions exhibit attractor-like behavior, i.e. topic-independent stable sets of behaviors which conversations settle i…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.8

Chronos: A Physics-Informed Full-History Framework for Non-Markovian Long-Horizon Manipulation

2026-06-29 · Yulin Zhou, Yimeng Wang, Nengyu Wang, Shaojia Xing, Shiyun Tu, Xiang Li, Jingkai Zhang, Ningbo Jiang, Yuankai Lin, Hua Yang, Xiangrui Zeng, Zhouping Yin

General AI

General-purpose robot policies should be modeled as dynamical systems, yet many VLA and generative imitation policies still rely on present observations or short windows. This Markovian shortcut fails in memory-dependent manipulation: identical observations can demand different actions after different histories. We pre…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.8

On the Internet, Nobody Knows You're an LLM Bot: Unmasking Web Agents with Multi-Layer Fingerprinting

2026-06-29 · Iliana Fayolle, Sihem Bouhenniche, Samuel Pélissier, Pierre Laperdrix, Clémentine Maurice, Walter Rudametkin

Research Track B · General AI

Since 2023, a new class of bots has emerged: Web Agents. They can automate complex tasks on the Web, going beyond traditional browser automation tools such as Selenium, Puppeteer, or Playwright. Leveraging large language models (LLMs), these agents are capable of solving anti-bot mechanisms, mimicking human behavior, a…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.8

SWE-INTERACT: Reimagining SWE Benchmarks as User-Driven Long-Horizon Coding Sessions

2026-06-29 · Mohit Raghavendra, Anisha Gunjal, Aakash Sabharwal, Yunzhong He

Research Track A · General AI

We introduce SWE-Interact, a new testbed for evaluating coding agents on multi-turn, interactive, user-driven software engineering tasks. Existing frontier SWE benchmarks typically provide complete requirements upfront and evaluate agents on autonomous implementation. In contrast, SWE-Interact places agents in a realis…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.8

Sequential Planning via Anchored Robotic Keypoints

2026-06-29 · Bryce Grant, Aryeh Rothenberg, Logan Senning, Zonghe Chua, Zach Patterson, Peng Wang

General AI

We present Sequential Planning via Anchored Robotic Keypoints, SPARK, a training-free neurosymbolic manipulation system that reaches 43.7% on six LIBERO-PRO position \& task cells, more than doubling CaP-Agent0 and Vision-Language-Action (VLA) baselines. CaP-Agent0, a multi-turn code-generation agent, achieves 18.2% by…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.6

Combating Textual Noise and Redundancy: Entropy-Aware Dense Visual Token Pruning

2026-07-02 · Xuehui Wang, Xuankun Yang, Wei Shen

General AI

Visual token pruning is a crucial strategy for accelerating VLMs by compressing redundant image patches, yet existing methods often fail to preserve critical cues under dense instructions and fine-grained queries. In this paper, we investigate this failure and identify two underlying bottlenecks: the widespread dispers…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.6

Controllable Sim Agents with Behavior Latents

2026-07-02 · Juanwu Lu, Junyu Zhu, Ziran Wang

General AI

Realistic traffic simulation requires agents that imitate logged behavior and can also be steered along interpretable axes. Such controllability enables engineers to isolate variables, reproduce specific edge cases, and test autonomous systems without real-world risk. We introduce Controllable Neural Variational Agents…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.6

Neuron-Aware Data Selection for Annotation-Free LLM Self-Distillation

2026-07-02 · Zhuowei Chen, Xiang Lorraine Li

General AI

Post-training large language models (LLMs) without real-world interaction feedback or human-labeled supervision remains challenging, particularly in specialized domains where expert annotations are costly to obtain. Recent annotation-free self-evolution methods address this by using the model's own outputs as supervisi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.6

What LLM Agents Say When No One Is Watching: Social Structure and Latent Objective Emergence in Multi-Agent Debates

2026-07-02 · Arman Ghaffarizadeh, Danyal Mohaddes, Aliakbar Izadkhah, Shahriar Noroozizadeh

General AI

LLM agents will increasingly act in socially structured settings where role, audience, and relational context can shape what is advantageous or costly to say. We study whether such social structure, without any explicit objective in the prompt, changes what an agent expresses publicly relative to an off-the-record (OTR…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.5

Learn2Fold: Structured Origami Generation with World Model Planning

2026-02-02 · Yanjia Huang, Yunuo Chen, Ying Jiang, Jinru Han, Zhengzhong Tu, Yin Yang, Chenfanfu Jiang

General AI

The ability to transform a flat sheet into a complex three-dimensional structure is a fundamental test of physical intelligence. Unlike cloth manipulation, origami is governed by strict geometric axioms and hard kinematic constraints, where a single invalid crease or collision can invalidate the entire folding sequence…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.5

Vibe Code Bench: Evaluating AI Models on End-to-End Web Application Development

2026-03-04 · Hung Tran, Langston Nashold, Rayan Krishnan, Antoine Bigeard, Alex Gu

Research Track B · General AI

Code generation has emerged as one of AI's highest-impact use cases, yet existing benchmarks measure isolated tasks rather than the complete "zero-to-one" process of building a working application from scratch. We introduce Vibe Code Bench, a benchmark of 100 web application specifications (50 public validation, 50 hel…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 11.5

ClawTrap: A MITM-Based Red-Teaming Framework for Real-World OpenClaw Security Evaluation

2026-03-19 · Haochen Zhao, Shaoyang Cui

Research Track B · General AI

Autonomous web agents such as \textbf{OpenClaw} are rapidly moving into high-impact real-world workflows, but their security robustness under live network threats remains insufficiently evaluated. Existing benchmarks mainly focus on static sandbox settings and content-level prompt attacks, which leaves a practical gap …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 11.5

AdaRubric: Task-Adaptive Rubrics for LLM Agent Evaluation

2026-03-22 · Liang Ding

Research Track B · General AI

LLM-as-Judge evaluation fails agent tasks because a fixed rubric cannot capture what matters for this task: code debugging demands Correctness and Error Handling; web navigation demands Goal Alignment and Action Efficiency. We present ADARUBRIC, which closes this gap by generating task-specific evaluation rubrics on th…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.5

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

2026-03-29 · Meituan LongCat Team, Bin Xiao, Chao Wang, Chengjiang Li, Chi Zhang, Chong Peng, Hang Yu, Hao Yang, Haonan Yan, Haoze Sun, Haozhe Zhao, Hong Liu, Hui Su, Jiaqi Zhang, Jiawei Wang, Jing Li, Kefeng Zhang, Manyuan Zhang, Minhao Jing, Peng Pei, Quan Chen, Taofeng Xue, Tongxin Pan, Xiaotong Li, Xiaoyang Li, Xiaoyu Zhao, Xing Hu, Xinyang Lin, Xunliang Cai, Yan Bai, Yan Feng, Yanjie Li, Yao Qiu, Yerui Sun, Yifan Lu, Ying Luo, Yipeng Mei, Yitian Chen, Yuchen Xie, Yufang Liu, Yufei Chen, Yulei Qian, Yuqi Peng, Zhihang Yu, Zhixiong Han, Changran Wang, Chen Chen, Dian Zheng, Fengjiao Chen, Ge Yang, Haowei Guo, Haozhe Wang, Hongyu Li, Huicheng Jiang, Jiale Hong, Jialv Zou, Jiamu Li, Jianping Lin, Jiaxing Liu, Jie Yang, Jing Jin, Jun Kuang, Juncheng She, Kunming Luo, Kuofeng Gao, Lin Qiu, Linsen Guo, Mianqiu Huang, Qi Li, Qian Wang, Rumei Li, Siyu Ren, Wei Wang, Wenlong He, Xi Chen, Xiao Liu, Xiaoyu Li, Xu Huang, Xuanyu Zhu, Xuezhi Cao, Yaoming Zhu, Yifei Cao, Yimeng Jia, Yizhen Jiang, Yufei Gao, Zeyang Hu, Zhenlong Yuan, Zijian Zhang, Ziwen Wang

General AI

The prevailing Next-Token Prediction (NTP) paradigm has driven the success of large language models through discrete autoregressive modeling. However, contemporary multimodal systems remain language-centric, often treating non-linguistic modalities as external attachments, leading to fragmented architectures and subopt…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.5

COvolve: Adversarial Co-Evolution of Large-Language-Model-Generated Policies and Environments via Two-Player Zero-Sum Game

2026-03-30 · Alkis Sygkounas, Rishi Hazra, Andreas Persson, Pedro Zuidberg Dos Martires, Amy Loutfi

Research Track A · General AI

A central challenge in building continually improving agents is that training environments are typically static or manually constructed. This restricts continual learning and generalization beyond the training distribution. We address this with COvolve, a co-evolutionary framework that leverages large language models (…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.5

Emergence WebVoyager: Toward Consistent and Transparent Evaluation of (Web) Agents in The Wild

2026-03-30 · Deepak Akkil, Mowafak Allaham, Amal Raj, Tamer Abuelsaad, Ravi Kokku

Research Track B · General AI

Reliable evaluation of AI agents operating in complex, real-world environments requires methodologies that are robust, transparent, and contextually aligned with the tasks agents are intended to perform. This study identifies persistent shortcomings in existing AI agent evaluation practices that are particularly acute …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.5

Experience Transfer for Multimodal LLM Agents in Minecraft Game

2026-04-07 · Chenghao Li, Jun Liu, Songbo Zhang, Huadong Jian, Hao Ni, Lik-Hang Lee, Sung-Ho Bae, Guoqing Wang, Yang Yang, Chaoning Zhang

General AI

Multimodal LLM agents operating in complex game environments must continually reuse past experience to solve new tasks efficiently. In this work, we propose Echo, a transfer-oriented memory framework that enables agents to derive actionable knowledge from prior interactions rather than treating memory as a passive repo…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.5

Lighting-grounded Video Generation with Renderer-based Agent Reasoning

2026-04-09 · Ziqi Cai, Taoyu Yang, Zheng Chang, Si Li, Han Jiang, Shuchen Weng, Boxin Shi

General AI

Diffusion models have achieved remarkable progress in video generation, but their controllability remains a major limitation. Key scene factors such as layout, lighting, and camera trajectory are often entangled or only weakly modeled, restricting their applicability in domains like filmmaking and virtual production wh…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.5

Many-Tier Instruction Hierarchy in LLM Agents

2026-04-10 · Jingyu Zhang, Tianjian Li, William Jurayj, Hongyuan Zhan, Benjamin Van Durme, Daniel Khashabi

General AI

Large language model agents receive instructions from many sources-system messages, user prompts, tool outputs, and more-each carrying different levels of trust and authority. When these instructions conflict, models must reliably follow the highest-privilege instruction to remain safe and effective. The dominant parad…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.5

Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs

2026-04-12 · Yu Li, Xiaoran Shang, Qizhi Pei, Yun Zhu, Xin Gao, Honglin Lin, Zhanping Zhong, Zhuoshi Pan, Zheng Liu, Xiaoyang Wang, Conghui He, Dahua Lin, Feng Zhao, Lijun Wu

General AI

Post-training data plays a pivotal role in shaping the capabilities of Large Language Models (LLMs), yet datasets are often treated as isolated artifacts, overlooking the systemic connections that underlie their evolution. To disentangle these complex relationships, we introduce the concept of data lineage to the LLM e…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.5

UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

2026-04-20 · Jiaqi Wang, Haoge Deng, Ting Pan, Yang Liu, Chengyuan Wang, Fan Zhang, Yonggang Qi, Xinlong Wang

General AI

Uniform Discrete Diffusion Model (UDM) has recently emerged as a promising paradigm for discrete generative modeling; however, its integration with reinforcement learning remains largely unexplored. We observe that naively applying GRPO to UDM leads to training instability and marginal performance gains. To address thi…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.5

Learning Evidence Highlighting for Frozen LLMs

2026-04-24 · Shaoang Li, Yanhang Shi, Yufei Li, Mingfu Liang, Xiaohan Wei, Yunchen Pu, Fei Tian, Chonglin Sun, Frank Shyu, Luke Simon, Sandeep Pandey, Xi Liu, Jian Li

General AI

Large Language Models (LLMs) can reason well, yet often miss decisive evidence when it is buried in long, noisy contexts. We introduce HiLight, an Evidence Emphasis framework that decouples evidence selection from reasoning for frozen LLM solvers. HiLight avoids compressing or rewriting the input, which can discard or …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.5

Stabilizing Efficient Reasoning with Step-Level Advantage Selection

2026-04-27 · Han Wang, Xiaodong Yu, Jialian Wu, Jiang Liu, Ximeng Sun, Mohit Bansal, Zicheng Liu

General AI

Large language models (LLMs) achieve strong reasoning performance by allocating substantial computation at inference time, often generating long and verbose reasoning traces. While recent work on efficient reasoning reduces this overhead through length-based rewards or pruning, many approaches are post-trained under a …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.5

Learning to Act and Cooperate for Distributed Black-Box Consensus Optimization

2026-05-01 · Zi-Bo Qin, Feng-Feng Wei, Tai-You Chen, Wei-Neng Chen

General AI

Distributed blackbox consensus optimization is a fundamental problem in multi-agent systems, where agents must improve a global objective using only local objective queries and limited neighbor communication. Existing methods largely rely on handcrafted update rules and static cooperation patterns, which often struggle…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.5

MIST: Reliable Streaming Decision Trees for Online Class-Incremental Learning via McDiarmid Bound

2026-05-12 · Phu-Hoa Pham, Chi-Nguyen Tran, Nguyen Lam Phu Quy, Dao Sy Duy Minh, Huynh Trung Kiet, Long Tran-Thanh

Research Track A · General AI

Streaming decision trees are natural candidates for open-world continual learning, as they perform local updates, enjoy bounded memory, and static decision boundaries. Despite these, they still fail in online class-incremental learning due to two coupled miscalibrations: (i) their split criterion grows unreliable as th…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.5

AURA: Intent-Directed Probing for Implicit-Need Surfacing in Situated LLM Agents

2026-06-04 · Yang Li, Jiaxiang Liu, Jiang Cai, Mingkun Xu

General AI

A situated query like "where is Lin Wei?" often encodes more than its literal content: the user may also want to know whether Lin Wei is free, in a good mood, or worth interrupting now. Standard tool-use agents answer the literal question and stop. AURA inserts an inference step between scene perception and tool use th…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.5

IR3DE: A Linear Router for Large Language Models

2026-06-04 · Eros Fanì, Oğuzhan Ersoy

General AI

Foundational Large Language Models (LLMs) demonstrate proficiency on a wide range of general tasks, and achieve remarkable results on various specialized tasks via domain-expert LLMs. With the ever-growing list of available LLMs, inference routers are being proposed to select the most appropriate LLM for each prompt. H…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.5

HydraCIL: Decoupled Class-Incremental Learning through Prototype-Guided Multi-Head Classifiers

2026-06-08 · Daniel Vila-Cruz, Laura Morán-Fernández, Verónica Bolón-Canedo

Research Track A

We present HydraCIL, a decoupled continual learning model based on prototype-guided multi-head classifiers, targeting sustainable deployment in embedded and resource-constrained environments. While most Class-Incremental Learning (CIL) methods rely on powerful hardware and long retraining cycles, real-world systems, su…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.5

Preserving Plasticity in Continual Learning via Dynamical Isometry

2026-06-08 · Andries Rosseau, Robert Müller, Ann Nowé

Research Track A · General AI

Continual training of deep neural networks under non-stationarity often leads to a progressive loss of plasticity, eventually limiting further learning. We relate plasticity to the empirical Neural Tangent Kernel, and identify dynamical isometry (the condition that layer-wise Jacobian singular values remain close to on…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.5

A Gradient Perspective on RLVR Stability and Winner Advantage Policy Optimization

2026-06-15 · Prasanth YSS, Zhichen Ren, Rasa Hosseinzadeh, Ilan Gofman, Yuqi Chen, Zhaoyan Liu, Guangwei Yu, Jesse C. Cresswell, Satya Krishna Gorti

General AI

Reinforcement learning with verifiable rewards (RLVR) improves language-model reasoning, but GRPO-style optimization remains prone to collapse. We analyse this instability through token-level gradient dynamics, deriving a taxonomy that predicts how updates affect next-token probabilities and entropy. The taxonomy shows…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.5

Dimensionality Controls When Modularity Helps in Continual Learning

2026-06-16 · Kathrin Korte, Christian Medeiros Adriano, Joachim Winther Pedersen, Eleni Nisioti, Sebastian Risi

Research Track A

Compositional learning systems must balance plasticity, the ability to acquire new knowledge, with stability, the preservation of previously learned components, especially when tasks share structure and risk interference. We study how modular architecture, task similarity, and representational dimensionality jointly sh…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.5

Reinforcing Dual-Path Reasoning in Spatial Vision Language Models

2026-06-16 · Yatai Ji, An-Chieh Cheng, Yang Fu, Yukang Chen, Han Zhang, Zhaojing Yang, Wei Huang, Ka Chun Cheung, Song Han, Vidya Nariyambut Murali, Pavlo Molchanov, Jan Kautz, Simon See, Hongxu Yin, Ping Luo, Sifei Liu

General AI

Spatial VLMs have made substantial progress in geometric perception, yet complex spatial reasoning requiring multi-step inference over depth, distance, and scene relations remains challenging. Moreover, different spatial queries call for fundamentally different strategies: some are best addressed through purely linguis…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.5

Sumi: Open Uniform Diffusion Language Model from Scratch

2026-06-17 · Mengyu Ye, Keito Kudo, Wataru Ikeda, Ryosuke Matsuda, Keisuke Sakaguchi, Jun Suzuki

General AI

Diffusion models have become a promising alternative to autoregressive models. Among these, uniform diffusion language models (UDLMs) permit any token to be updated at any step, in principle enabling more flexible generation. However, no UDLM has yet been pretrained from scratch at both large parameter scale and large …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.5

DOPD: Dual On-policy Distillation

2026-06-29 · Xinlei Yu, Gen Li, Qingyi Si, Guibin Zhang, Yuqi Xu, Congcong Wang, Shuai Dong, Kaiwen Tuo, Xiangyu Zeng, Kaituo Feng, Qunzhong Wang, Yang Shi, Xiaobin Hu, Xiangyu Yue, Jiaqi Wang, Shuicheng Yan

Research Track A · General AI

On-policy distillation (OPD) offers superior capacity transfer by supervising student-sampled trajectories with dense token-level signals. To furnish high-quality supervision sources and thereby elevate the performance frontier of distillation, an intuitive direction is to infuse privileged information to either teache…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.5

Theory of Continual Learning Against Data Poisoning Attacks

2026-06-29 · Yiting Hu, Lingjie Duan

Research Track A · General AI

Continual learning (CL), where a model is trained on a sequence of data tasks, is increasingly being adopted across key fields such as large language models and image recognition, yet it remains highly vulnerable to data poisoning that triggers learning divergence or severe excess risk. Despite these threats, a princip…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.4

AOHP: An Open-Source OS-Level Agent Harness for Personalized, Efficient and Secure Interaction

2026-06-22 · Shanhui Zhao, Jiacheng Liu, Guohong Liu, Jichao Yan, Jialei Ye, Yuhao Yang, Hao Wen, Shizuo Tian, Yizhen Yuan, Yuxuan Chen, Yunxin Liu, Ju Ren, Ya-Qin Zhang, Chao Huang, Yao Guo, Yuanchun Li

General AI

AI agents are driving a new software paradigm, with the ability to autonomously call tools, extract information, manage memory, and complete tasks that span applications and data sources. Most existing end-user operating systems, however, are designed for application-centric workflows and offer little native support fo…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.4

Are Text-to-Image Models Inductivist Turkeys? A Counterfactual Benchmark for Causal Reasoning

2026-06-23 · Jiayi Lei, Yuandong Pu, Xingyu Han, Rongpeng Zhu, Jing Xu, Jinyao Wang, Zijian Zhou, Bin Fu, Yuewen Cao, Yihao Liu, Yongsheng Li

General AI

Text-to-image (T2I) generation models have achieved remarkable progress in producing visually realistic images from natural language prompts. Yet it remains unclear whether their success reflects genuine causal understanding or sophisticated pattern matching over visual-textual correlations. Inspired by Russell's induc…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.4

NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers?

2026-06-23 · Yuru Wang, Lejun Cheng, Yuxin Zuo, Sihang Zeng, Bingxiang He, Che Jiang, Junlin Yang, Yuchong Wang, Kaikai Zhao, Weifeng Huang, Kai Tian, Zhenzhao Yuan, Jincheng Zhong, Weizhi Wang, Ning Ding, Bowen Zhou, Kaiyan Zhang

General AI

We introduce NatureBench, a cross-discipline benchmark of 90 tasks distilled from peer-reviewed Nature-family publications, designed to evaluate whether AI coding agents can move beyond reproduction toward discovery on real scientific problems. NatureBench is built on NatureGym, an automated pipeline that constructs a …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.4

Trimming the Long-Tail of Visual World Modeling Evaluation

2026-06-23 · Bingxuan Li, Yining Hong, Cheng Qian, Hyeonjeong Ha, Jiateng Liu, Zhenhailong Wang, Yue Guo, Yunzhu Li, Heng Ji

General AI

Physical interactions follow a long-tailed distribution: a set of common and regular interactions dominates human experience and visual data, while a broad spectrum of rare and irregular interactions remains underrepresented. Although recent visual world models, including image and video generation models, achieve impr…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.4

Constraint Tax in Open-Weight LLMs: An Empirical Study of Tool Calling Suppression Under Structured Output Constraints

2026-06-24 · Fangzheng Li, Aimin Zhang, Chen Lv

General AI

Tool Calling and Structured Output are two core capabilities of modern Agent systems, yet their interaction under joint deployment conditions remains insufficiently understood. This paper reports a reproducible phenomenon observed in a production Agent system: when Tool Calling and JSON Schema constraints are simultane…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.4

TheoremGraph: Bridging Formal and Informal Mathematics

2026-06-24 · Simon Kurgan, Evan Wang, Eric Leonen, Sophie Szeto, Luke Alexander, Artemii Remizov, Jarod Alper, Giovanni Inchiostro, Vasily Ilin

General AI

Mathematical knowledge is organized around statements and their dependencies, but this structure is exposed unevenly: informal papers cite mostly at the document level, while formal libraries record fine-grained dependencies over a much smaller body of mathematics. We introduce TheoremGraph, a unified statement-level d…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

SOMA: Strategic Orchestration and Memory-Augmented System for Vision-Language-Action Model Robustness via In-Context Adaptation

2026-03-25 · Zhuoran Li, Zhiyang Li, Kaijun Zhou, Jinyu Gu

Research Track A · General AI

Despite the promise of Vision-Language-Action (VLA) models as generalist robotic controllers, their robustness against perceptual noise and environmental variations in out-of-distribution (OOD) tasks remains fundamentally limited by the absence of long-term memory, causal failure attribution, and dynamic intervention c…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving

2026-03-26 · Zehao Wang, Huaide Jiang, Shuaiwu Dong, Yuping Wang, Hang Qiu, Jiachen Li

General AI

Human driving behavior is inherently personal, which is shaped by long-term habits and influenced by short-term intentions. Individuals differ in how they accelerate, brake, merge, yield, and overtake across diverse situations. However, existing end-to-end autonomous driving systems either optimize for generic objectiv…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Label-Free Cross-Task LoRA Merging with Null-Space Compression

2026-03-27 · Wonyoung Lee, Wooseong Jeong, Kuk-Jin Yoon

General AI

Model merging combines independently fine-tuned checkpoints without joint multi-task training. In the era of foundation-model, fine-tuning with Low-Rank Adaptation (LoRA) is prevalent, making LoRA merging a promising target. Existing approaches can work in homogeneous settings where all target tasks are classification …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

AdaptToken: Entropy-based Adaptive Token Selection for MLLM Long Video Understanding

2026-03-30 · Haozhe Qi, Kevin Qu, Mahdi Rad, Rui Wang, Alexander Mathis, Marc Pollefeys

General AI

Long video understanding remains challenging for Multi-modal Large Language Models (MLLMs) due to high memory costs and context-length limits. Prior approaches mitigate this by scoring and selecting frames/tokens within short clips, but they lack a principled mechanism to (i) compare relevance across distant video clip…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning

2026-03-30 · Philip Schroeder, Thomas Weng, Karl Schmeckpeper, Eric Rosen, Stephen Hart, Ondrej Biza

General AI

Vision-language models (VLMs) have shown impressive capabilities across diverse tasks, motivating efforts to leverage these models to supervise robot learning. However, when used as evaluators in reinforcement learning (RL), today's strongest models often fail under partial observability and distribution shift, enablin…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Do Emotions in Prompts Matter? Effects of Emotional Framing on Large Language Models

2026-04-02 · Minda Zhao, Yutong Yang, Chufei Peng, Rachel Gonsalves, Weiyue Li, Ruyi Yang, Zhixi Liu, Mengyu Wang

General AI

Emotional tone is pervasive in human communication, yet its influence on large language model (LLM) behaviour remains unclear. Here, we examine how first-person emotional framing in user-side queries affect LLM performance across six benchmark domains, including mathematical reasoning, medical question answering, readi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Model-Based Reinforcement Learning for Control under Time-Varying Dynamics

2026-04-02 · Klemens Iten, Bruce Lee, Chenhao Li, Lenart Treven, Andreas Krause, Bhavya Sukhija

General AI

Learning-based control methods typically assume stationary system dynamics, an assumption often violated in real-world systems due to drift, wear, or changing operating conditions. We study reinforcement learning for control under time-varying dynamics. We consider a continual model-based reinforcement learning setting…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

SPARK-IL: Spectral Retrieval-Augmented RAG for Knowledge-driven Deepfake Detection via Incremental Learning

2026-04-04 · Hessen Bougueffa Eutamene, Abdellah Zakaria Sellam, Abdelmalik Taleb-Ahmed, Abdenour Hadid

Research Track A · General AI

Detecting AI-generated images remains a significant challenge because detectors trained on specific generators often fail to generalize to unseen models; however, while pixel-level artifacts vary across models, frequency-domain signatures exhibit greater consistency, providing a promising foundation for cross-generator…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Early Stopping for Large Reasoning Models via Confidence Dynamics

2026-04-06 · Parsa Hosseini, Sumit Nawathe, Mahdi Salmani, Meisam Razaviyayn, Soheil Feizi

General AI

Large reasoning models rely on long chain-of-thought generation to solve complex problems, but extended reasoning often incurs substantial computational cost and can even degrade performance due to overthinking. A key challenge is determining when the model should stop reasoning and produce the final answer. In this wo…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Paper Espresso: From Paper Overload to Research Insight

2026-04-06 · Mingzhe Du, Luu Anh Tuan, Dong Huang, See-kiong Ng

Research Track A · General AI

The accelerating pace of scientific publishing makes it increasingly difficult for researchers to stay current. We present Paper Espresso, an open-source platform that automatically discovers, summarizes, and analyzes trending arXiv papers. The system uses large language models (LLMs) to generate structured summaries w…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

QED-Nano: Teaching a Tiny Model to Prove Hard Theorems

2026-04-06 · LM-Provers, Yuxiao Qu, Amrith Setlur, Jasper Dekoninck, Edward Beeching, Jia Li, Ian Wu, Lewis Tunstall, Aviral Kumar

General AI

Proprietary AI systems have recently demonstrated impressive capabilities on complex proof-based problems, with gold-level performance reported at the 2025 International Mathematical Olympiad (IMO). However, the training pipelines behind these systems remain largely undisclosed, and their reliance on large "internal" m…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

GRASS: Gradient-based Adaptive Layer-wise Importance Sampling for Memory-efficient Large Language Model Fine-tuning

2026-04-09 · Kaiyuan Tian, Yu Tang, Gongqingjian Jiang, Baihui Liu, Yifu Gao, Xialin Su, Linbo Qiao, Dongsheng Li

General AI

Full-parameter fine-tuning of large language models is constrained by substantial GPU memory requirements. Low-rank adaptation methods mitigate this challenge by updating only a subset of parameters. However, these approaches often limit model expressiveness and yield lower performance than full-parameter fine-tuning. …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

RewardFlow: Generate Images by Optimizing What You Reward

2026-04-09 · Onkar Susladkar, Dong-Hwan Jang, Tushar Prakash, Adheesh Juvekar, Vedant Shah, Ayush Barik, Nabeel Bashir, Muntasir Wahed, Ritish Shrirao, Ismini Lourentzou

General AI

We introduce RewardFlow, an inversion-free framework that steers pretrained diffusion and flow-matching models at inference time through multi-reward Langevin dynamics. RewardFlow unifies complementary differentiable rewards for semantic alignment, perceptual fidelity, localized grounding, object consistency, and human…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Ambivalence/Hesitancy Recognition in Videos for Personalized Digital Health Interventions

2026-04-13 · Manuela González-González, Soufiane Belharbi, Muhammad Osama Zeeshan, Masoumeh Sharafi, Muhammad Haseeb Aslam, Lorenzo Sia, Nicolas Richet, Marco Pedersoli, Alessandro Lameiras Koerich, Simon L Bacon, Eric Granger

General AI

Using behavioural science, health interventions focus on behaviour change by providing a framework to help patients acquire and maintain healthy habits that improve medical outcomes. In-person interventions are costly and difficult to scale, especially in resource-limited regions. Digital health interventions offer a c…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

StarVLA-$α$: Reducing Complexity in Vision-Language-Action Systems

2026-04-13 · Jinhui Ye, Ning Gao, Senqiao Yang, Jinliang Zheng, Zixuan Wang, Yuxin Chen, Pengguang Chen, Yilun Chen, Shu Liu, Jiaya Jia

General AI

Vision-Language-Action (VLA) models have recently emerged as a promising paradigm for building general-purpose robotic agents. However, the VLA landscape remains highly fragmented and complex: as existing approaches vary substantially in architectures, training data, embodiment configurations, and benchmark-specific en…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

LIFE -- an energy efficient advanced continual learning agentic AI framework for frontier systems

2026-04-14 · Anne Lee, Gurudutt Hosangadi

Research Track A · General AI

The rapid advancement of AI has changed the character of HPC usage such as dimensioning, provisioning, and execution. Not only has energy demand been amplified, but existing rudimentary continual learning capabilities limit ability of AI to effectively manage HPCs. This paper reviews emerging directions beyond monolith…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Toward Autonomous Long-Horizon Engineering for ML Research

2026-04-14 · Guoxin Chen, Jie Chen, Lei Chen, Jiale Zhao, Fanzhe Meng, Wayne Xin Zhao, Ruihua Song, Cheng Chen, Ji-Rong Wen, Kai Jia

General AI

Autonomous AI research has advanced rapidly, but long-horizon ML research engineering remains difficult: agents must sustain coherent progress across task comprehension, environment setup, implementation, experimentation, and debugging over hours or days. We introduce AiScientist, a system for autonomous long-horizon e…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

From Tokens to Steps: Verification-Aware Speculative Decoding for Efficient Multi-Step Reasoning

2026-04-16 · Kiran Purohit, Ramasuri Narayanam, Soumyabrata Pal

General AI

Speculative decoding (SD) accelerates large language model inference by allowing a lightweight draft model to propose outputs that a stronger target model verifies. However, its token-centric nature allows erroneous steps to propagate. Prior approaches mitigate this using external reward models, but incur additional la…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Targeted Exploration via Unified Entropy Control for Reinforcement Learning

2026-04-16 · Chen Wang, Lai Wei, Yanzhi Zhang, Chenyang Shao, Zedong Dan, Weiran Huang, Ge Lan, Yue Wang

General AI

Recent advances in reinforcement learning (RL) have improved the reasoning capabilities of large language models (LLMs) and vision-language models (VLMs). However, the widely used Group Relative Policy Optimization (GRPO) consistently suffers from entropy collapse, causing the policy to converge prematurely and lose di…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Semantic Area Graph Reasoning for Multi-Robot Language-Guided Search

2026-04-17 · Ruiyang Wang, Hao-Lun Hsu, Jiwoo Kim, Miroslav Pajic

General AI

Coordinating multi-robot systems (MRS) to search in unknown environments is particularly challenging for tasks that require semantic reasoning beyond geometric exploration. Classical coordination strategies rely on frontier coverage or information gain and cannot incorporate high-level task intent, such as searching fo…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

HeLa-Mem: Hebbian Learning and Associative Memory for LLM Agents

2026-04-18 · Jinchang Zhu, Jindong Li, Cheng Zhang, Jiahong Liu, Menglin Yang

Research Track A · General AI

Long-term memory is a critical challenge for Large Language Model agents, as fixed context windows cannot preserve coherence across extended interactions. Existing memory systems represent conversation history as unstructured embedding vectors, retrieving information through semantic similarity. This paradigm fails to …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Beyond Meta-Reasoning: Metacognitive Consolidation for Self-Improving LLM Reasoning

2026-04-19 · Ziqing Zhuang, Linhai Zhang, Jiasheng Si, Deyu Zhou, Yulan He

Research Track A · General AI

Large language models (LLMs) have demonstrated strong reasoning capabilities, and as existing approaches for enhancing LLM reasoning continue to mature, increasing attention has shifted toward meta-reasoning as a promising direction for further improvement. However, most existing meta-reasoning methods remain episodic:…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model

2026-04-21 · Yutian Chen, Shi Guo, Renbiao Jin, Tianshuo Yang, Xin Cai, Yawen Luo, Mingxin Yang, Mulin Yu, Linning Xu, Tianfan Xue

General AI

Sparse-view 3D reconstruction is essential for modeling scenes from casual captures, but remain challenging for non-generative reconstruction. Existing diffusion-based approaches mitigates this issues by synthesizing novel views, but they often condition on only one or two capture frames, which restricts geometric cons…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

DT2IT-MRM: Debiased Preference Construction and Iterative Training for Multimodal Reward Modeling

2026-04-21 · Zhihong Zhang, Jie Zhao, Xiaojian Huang, Jin Xu, Zhuodong Luo, Xin Liu, Jiansheng Wei, Xuejin Chen

General AI

Multimodal reward models (MRMs) play a crucial role in aligning Multimodal Large Language Models (MLLMs) with human preferences. Training a good MRM requires high-quality multimodal preference data. However, existing preference datasets face three key challenges: lack of granularity in preference strength, textual styl…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Discovering a Shared Logical Subspace: Steering LLM Logical Reasoning via Alignment of Natural-Language and Symbolic Views

2026-04-21 · Feihao Fang, My T. Thai, Yuanyuan Lei

General AI

Large Language Models (LLMs) still struggle with multi-step logical reasoning. Existing approaches either purely refine the reasoning chain in natural language form or attach a symbolic solver as an external module. In this work, we instead ask whether LLMs contain a shared internal logical subspace that simultaneously…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

FASTER: Value-Guided Sampling for Fast RL

2026-04-21 · Perry Dong, Alexander Swerdlow, Dorsa Sadigh, Chelsea Finn

General AI

Some of the most performant reinforcement learning algorithms today can be prohibitively expensive as they use test-time scaling methods such as sampling multiple action candidates and selecting the best one. In this work, we propose FASTER, a method for getting the benefits of sampling-based test-time scaling of diffu…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Pause or Fabricate? Training Language Models for Grounded Reasoning

2026-04-21 · Yiwen Qiu, Linjuan Wu, Yizhou Liu, Yuchen Yan, Jin Ma, Xu Tan, Yao Hu, Daoxin Zhang, Wenqi Zhang, Weiming Lu, Jun Xiao, Yongliang Shen

General AI

Large language models have achieved remarkable progress on complex reasoning tasks. However, they often implicitly fabricate information when inputs are incomplete, producing confident but unreliable conclusions -- a failure mode we term ungrounded reasoning. We argue that this issue arises not from insufficient reason…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

PokeVLA: Empowering Pocket-Sized Vision-Language-Action Model with Comprehensive World Knowledge Guidance

2026-04-22 · Yupeng Zheng, Xiang Li, Songen Gu, Yuhang Zheng, Shuai Tian, Weize Li, Linbo Wang, Senyu Fei, Pengfei Li, Yinfeng Gao, Zebin Xing, Yilun Chen, Qichao Zhang, Haoran Li, Wenchao Ding

General AI

Recent advances in Vision-Language-Action (VLA) models have opened new avenues for robot manipulation, yet existing methods exhibit limited efficiency and a lack of high-level knowledge and spatial awareness. To address these challenges, we propose PokeVLA, a lightweight yet powerful foundation model for embodied manip…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

R-CoV: Region-Aware Chain-of-Verification for Alleviating Object Hallucinations in LVLMs

2026-04-22 · Jiahao Xie, Alessio Tonioni, Nathalie Rauschmayr, Federico Tombari, Bernt Schiele

General AI

Large vision-language models (LVLMs) have demonstrated impressive performance in various multimodal understanding and reasoning tasks. However, they still struggle with object hallucinations, i.e., the claim of nonexistent objects in the visual input. To address this challenge, we propose Region-aware Chain-of-Verifica…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Context Unrolling in Omni Models

2026-04-23 · Ceyuan Yang, Zhijie Lin, Yang Zhao, Fei Xiao, Hao He, Qi Zhao, Chaorui Deng, Kunchang Li, Zihan Ding, Yuwei Guo, Fuyun Wang, Fangqi Zhu, Xiaonan Nie, Shenhan Zhu, Shanchuan Lin, Hongsheng Li, Weilin Huang, Guang Shi, Haoqi Fan

General AI

We present Omni, a unified multimodal model natively trained on diverse modalities, including text, images, videos, 3D geometry, and hidden representations. We find that such training enables Context Unrolling, where the model explicitly reasons across multiple modal representations before producing predictions. This p…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

MTT-Bench: Predicting Social Dominance in Mice via Multimodal Large Language Models

2026-04-24 · Yunquan Chen, Haoyu Chen

General AI

Understanding social dominance in animal behavior is critical for neuroscience and behavioral studies. In this work, we explore the capability of Multimodal Large Language Models(MLLMs) to analyze raw behavioral video of mice and predict their dominance hierarchy. We introduce MTT-Bench, a novel benchmark comprising an…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

NeuroClaw Technical Report

2026-04-27 · Cheng Wang, Zhibin He, Zhihao Peng, Shengyuan Liu, Yufan Hu, Lichao Sun, Xiang Li, Yixuan Yuan

General AI

Agentic artificial intelligence systems promise to accelerate scientific workflows, but neuroimaging poses unique challenges: heterogeneous modalities (sMRI, fMRI, dMRI, EEG), long multi-stage pipelines, and persistent reproducibility risks. To address this gap, we present NeuroClaw, a domain-specialized multi-agent re…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

When Prompt Under-Specification Improves Code Correctness: An Exploratory Study of Prompt Wording and Structure Effects on LLM-Based Code Generation

2026-04-27 · Amal AKLI, Mike PAPADAKIS, Maxime CORDY, Yves Le TRAON

General AI

Large language models are increasingly used for code generation, yet the correctness of their outputs depends not only on model capability but also on how tasks are specified. Prior studies demonstrate that small changes in natural language prompts, particularly under-specification can substantially reduce code correct…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

CGU-ILALab at FoodBench-QA 2026: Comparing Traditional and LLM-based Approaches for Recipe Nutrient Estimation

2026-04-28 · Wei-Chun Chen, Yu-Xuan Chen, I-Fang Chung, Ying-Jia Lin

General AI

Accurate nutrient estimation from unstructured recipe text is an important yet challenging problem in dietary monitoring, due to ambiguous ingredient terminology and highly variable quantity expressions. We systematically evaluate models spanning a wide range of representational capacity, from lexical matching methods …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

MarkIt: Training-Free Visual Markers for Precise Video Temporal Grounding

2026-04-28 · Pengcheng Fang, Yuxia Chen, Xiaohao Cai

General AI

Video temporal grounding (VTG) aims to localize the start and end timestamps of the event described by a given query within an untrimmed video. Despite the strong open-world video understanding and recognition ability of video language large models (Vid-LLMs), outputting precise temporal grounding information remains c…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models

2026-04-30 · Hao Chen, Jiaming Liu, Zhonghao Yan, Nuowei Han, Renrui Zhang, Chenyang Gu, Jialin Gao, Ziyu Guo, Siyuan Qian, Yinxi Wang, Peng Jia, Chi-Wing Fu, Shanghang Zhang, Pheng-Ann Heng

General AI

Vision-Language-Action (VLA) models have increasingly incorporated reasoning mechanisms for complex robotic manipulation. However, existing approaches share a critical limitation: whether employing explicit linguistic reasoning that suffers from latency and discretization, or utilizing more expressive continuous latent…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Synthetic Computers at Scale for Long-Horizon Productivity Simulation

2026-04-30 · Tao Ge, Baolin Peng, Hao Cheng, Jianfeng Gao

General AI

Realistic long-horizon productivity work is strongly conditioned on user-specific computer environments, where much of the work context is stored and organized through directory structures and content-rich artifacts. To scale synthetic data creation for such productivity scenarios, we introduce Synthetic Computers at S…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Let ViT Speak: Generative Language-Image Pre-training

2026-05-01 · Yan Fang, Mengcheng Lan, Zilong Huang, Weixian Lei, Yunqing Zhao, Yujie Zhong, Yingchen Yu, Qi She, Yao Zhao, Yunchao Wei

General AI

In this paper, we present \textbf{Gen}erative \textbf{L}anguage-\textbf{I}mage \textbf{P}re-training (GenLIP), a minimalist generative pretraining framework for Vision Transformers (ViTs) designed for multimodal large language models (MLLMs). To better align vision encoders with the autoregressive nature of LLMs, GenLI…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Robust Multimodal Recommendation via Graph Retrieval-Enhanced Modality Completion

2026-05-01 · Yuan Li, Jun Hu, Jiaxin Jiang, Bryan Hooi, Bingsheng He

General AI

Multimodal data plays a critical role in web-based recommendation systems, where information from diverse modalities such as vision and text enhances representation learning. However, real-world multimodal datasets often suffer from modality incompleteness due to sensor failures, annotation scarcity, or privacy constra…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models

2026-05-01 · Sailesh Panda, Pritam Kadasi, Abhishek Upperwal, Mayank Singh

General AI

Large language models (LLMs) often achieve strong performance on reasoning benchmarks, but final-answer accuracy alone does not show whether they faithfully execute the procedure specified in a prompt. We study this question through a controlled diagnostic benchmark for procedural execution, where models are given a st…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Noise-Aware Visual Representation Learning for Medical Visual Question Answering

2026-06-04 · I Putu Adi Pratama, Bahadorreza Ofoghi, Atul Sajjanhar, Shang Gao

General AI

Medical visual question answering (Med-VQA) has strong potential for clinical decision support by enabling AI models to interpret medical images and answer clinically relevant queries. Recent approaches typically connect off-the-shelf vision encoders with large language models (LLMs) through lightweight mapping network…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

PAR3D: A Unified 3D-MLLM with Part-Aware Representation for Scene Understanding

2026-06-04 · Shaohui Dai, Yansong Qu, You Shen, Shengchuan Zhang, Liujuan Cao

General AI

Recent advances in 3D multimodal large language models (3D-MLLMs) have enabled unified solutions for 3D scene understanding tasks, including visual question answering, captioning, and referring segmentation. However, existing 3D-MLLMs remain largely object-centric, limiting their ability to model fine-grained part stru…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Revising Context, Shifting Simulated Stance: Auditing LLM-Based Stance Simulation in Online Discussions

2026-06-04 · Xinnong Zhang, Wanting Shan, Hanjia Lyu, Zhongyu Wei, Jiebo Luo

General AI

Large language models are increasingly used to simulate social media users and infer how individuals may respond to online discussions. However, it remains unclear whether these simulations reflect precise user-specific beliefs or whether they are highly sensitive to semantically independent changes in conversational c…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Vortex: Efficient and Programmable Sparse Attention Serving for AI Agents

2026-06-04 · Zhuoming Chen, Xinrui Zhong, Qilong Feng, Ranajoy Sadhukhan, Yang Zhou, Michael Qizhe Shieh, Zhihao Jia, Beidi Chen

General AI

Sparse attention is becoming increasingly important for serving large language models (LLMs) as generation lengths continue to grow. However, deploying and evaluating new sparse attention algorithms at scale remains highly engineering-intensive, slowing both human researchers and AI agents in exploring the sparse atten…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Correlation Is Not Enough: Embedding Human Metadata for Individual Causal Discovery

2026-06-08 · Suraj Biswas, Saurabh Gupta, Pritam Mukherjee

General AI

Ask a pretrained biomedical language model whether "cortisol 28 ug/dL" and "stock-market volatility" are related, and it returns a cosine similarity of 0.83 on a scale where 1.0 means identical. The two share no mechanism. This is not a corner case: every off-the-shelf biomedical encoder we tested (BioBERT, PubMedBERT,…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

MemoryVLA++: Temporal Modeling via Memory and Imagination in Vision-Language-Action Models

2026-06-08 · Hao Shi, Weiye Li, Bin Xie, Yulin Wang, Renping Zhou, Tiancai Wang, Xiangyu Zhang, Ping Luo, Gao Huang

Research Track A · General AI

Temporal modeling is essential for robotic manipulation, as effective control requires both memory of past interactions and imagination of future states. However, most VLA models rely primarily on the current observation and therefore struggle with long-horizon, temporally dependent tasks. Cognitive science suggests th…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

2026-06-08 · Mingxian Lin, Shengju Qian, Yuqi Liu, Yi-Hua Huang, Yiyu Wang, Wei Huang, Yitang Li, Fan Zhang, Zeyu Hu, Lingting Zhu, Xin Wang, Xiaojuan Qi

General AI

Vision-language model (VLM) agents are increasingly deployed in interactive game environments. Yet game benchmarks for VLM agents typically report a single first-attempt score per (agent, game) pair, focus on single-agent Solo play, and lack unified protocols for evaluating heterogeneous agent classes (commercial VLMs,…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

System Report for CCL25-Eval Task 5: New Dataset and LoRA-Fine-Tuned Qwen2.5

2026-06-10 · Haotao Xie

General AI

Recently, large language models (LLMs) have achieved promising progress in the fields of classical Chinese translation and the generation of classical poetry. However, domain-specific research on precise translation and affective-semantic understanding of classical poetry remains limited. The main challenge is that mos…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Reasoning as Pattern Matching: Shared Mechanisms in Human and LLM Everyday Reasoning

2026-06-11 · Zach Studdiford, Gary Lupyan

General AI

When large language models (LLMs) fail to generalize or make haphazard errors in reasoning, it is often taken as evidence that LLMs are not truly reasoning, but rather performing a kind of pattern matching. The implication is that people's behavior does not exhibit the same types of failures because human reasoning use…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

A Multi-Center Benchmark for Abdominal Disease Diagnosis and Report Generation from Non-Contrast CT

2026-06-15 · Mariam Elbakry, Aliaa Sayed Sheha, Salma Hassan Tantawy, Aya Yassin, Concetto Spampinato, Karim Lekadir, Xiaomeng Li, Marawan Elbatel

General AI

Multiphasic contrast-enhanced CT (CECT) is widely used for abdominal lesion characterization, yet it carries inherent risks of contrast-induced nephropathy, escalates acquisition burden, and heavily contributes to radiologist workload. To address these challenges, we introduce a novel multi-center benchmark for multi-o…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Selection Without Signal, Recovery Through Expression: A Measurement Study of Post-Hoc Falsification Operators for Frozen Small Code Models

2026-06-15 · Mehmet Iscan

General AI

Frozen small code models (<=1.5B parameters, run locally without fine-tuning) suit offline and privacy-constrained use, but often emit plausible-but-wrong programs. A natural remedy is a post-hoc operator that selects, verifies, repairs, or re-processes the model's samples without retraining; in principled form it is P…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Learning Red Agent Policy from Observations for Neurosymbolic Autonomous Cyber Agents

2026-06-16 · Ankita Samaddar, Sandeep Neema, Daniel Balasubramanian, Xenofon Koutsoukos

General AI

With sophisticated cyber-attacks becoming increasingly prevalent, modern networks require intelligent autonomous cyber-defense agents trained via Reinforcement Learning (RL). These agents employ neurosymbolic approaches such as behavior trees with learning-enabled components (LECs) to learn, reason, adapt, and implemen…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Plug-and-Adapt: Multimodal Coreference Resolution at First Sight with a Pretrained Alignment Model

2026-06-16 · Jinghan Wu, Jing Li, Ivor W. Tsang, Xuetao Zhang

General AI

Visual information helps resolve ambiguity in coreference resolution, leading to notable performance gains. However, existing Multi-modal Coreference Resolution (MCR) methods require training with (partially) annotated data from the target dataset before they can be applied, preventing their direct usability and raisin…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

RubricsTree: Scalable and Evolving Open-Ended Evaluation of Personal Health Agents across Health Memory and Medical Skills

2026-06-16 · Weizhi Zhang, Zechen Li, Hamid Palangi, Ben Graef, A. Ali Heydari, Simon A. Lee, Salman Rahman, Ray Luo, Zeinab Esmaeilpour, Erik Schenck, Chloe Zhang, Yamin Li, Menglian Zhou, Philip S. Yu, Daniel McDuff, Lindsey Sunden, Mark Malhotra, Shwetak Patel, Ahmed A. Metwally

General AI

The LLM-empowered personal health agents with user health (sensor) metrics have offered a promising pathway to alleviate global disparities in healthcare access. However, large-scale clinical deployment remains constrained by an open-ended evaluation bottleneck: physician annotation is reliable but costly and unscalabl…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

A Multi-Domain Benchmark for Detecting AI-Generated Text-Rich Images from GPT-Image-2

2026-06-17 · Yijin Wang, Shuyi Wang, Wenhan Zhang, Yuqi Ouyang

General AI

Text-rich images often contain privacy-sensitive, transactional, or decision-relevant information. As recent multimodal image generation models become increasingly capable of synthesizing realistic textual content and structured visual designs, detecting AI-generated text-rich images has become an important challenge f…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Does VLA Even Know the Basics? Measuring Commonsense and World Knowledge Retention in Vision-Language-Action Models

2026-06-17 · Nikita Kachaev, Andrey Moskalenko, Matvey Skripkin, Nikita Kurlaev, Daria Pugacheva, Albina Burlova, Mikhail Kolosov, Denis Shepelev, Andrey Kuznetsov, Elena Tutubalina, Aleksandr I. Panov, Alexey K. Kovalev, Vlad Shakhuro

Research Track A · General AI

Embodied Vision-Language-Action (VLA) models are typically obtained by fine-tuning powerful pretrained VLMs on robotics data, yet it is unclear how much commonsense and factual knowledge they retain after adaptation. Failures on knowledge-sensitive tasks are ambiguous, conflating missing knowledge with poor generalizat…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.3

Trade-offs in Medical LLM Adaptation: An Empirical Study in French QA

2026-06-17 · Ikram Belmadani, Oumaima El Khettari, Carlos Ramisch, Frederic Bechet, Richard Dufour, Benoit Favre

General AI

The development of large language models (LLMs) has led to an increased focus on their adaptation to specialized domains and languages, yet the effectiveness of domain adaptation strategies remains unclear. We present a study of medical domain adaptation using French medical question-answering (QA) as a case study. We …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.2

Where Did It Go Wrong? Process-Level Evaluation of Web Agents with Semantic State Tracking

2026-04-08 · Jiwan Chung, JiHyuk Byun, Vibhav Vineet, Seon Joo Kim

Research Track B · General AI

Web agents act through long interaction sequences, yet existing benchmarks evaluate only terminal success, discarding all process information and offering little guidance on improvement. In this work, we conduct a process-level analysis of web agents. We introduce WebStep, a benchmark of 1,800 task instances with contr…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 11.2

BenchX: Benchmarking AI Models for Cancer Detection and Localization with Demographic and Protocol Biases

2026-06-23 · Qi Chen, Wenxuan Li, Pedro R. A. S. Bassi, Xinze Zhou, Jakob Wasserthal, Ibrahim Ethem Hamamci, Sezgin Er, Ashwin Kumar, Yiwen Ye, Yuhan Wang, Yuyin Zhou, Akshay S. Chaudhari, Curtis Langlotz, Kang Wang, Yang Yang, Alan L. Yuille, Zongwei Zhou

General AI

Artificial intelligence (AI) has achieved remarkable success in medical imaging, but it is widely recognized that these models often perform inconsistently across real-world clinical settings. Such inconsistencies occur when patient demographics and imaging protocols vary, for example, in detecting small tumors, analyz…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.2

An Approach for a Supporting Multi-LLM System for Automated Certification Based on the German IT-Grundschutz

2026-06-24 · Lea Roxanne Muth, Marian Margraf

Research Track A · General AI

This paper presents a novel approach to perform semi-automated BSI IT-Grundschutz certification using a MultiLarge Language Model system (MLS) with Hybrid RetrievalAugmented Generation (HybridRAG). Facing the challenges of the Network and Information Security Directive 2 (NIS2) directive, a shortage of specialists, and…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.2

Detect, Unlearn, Restore: Defending Text Summarization Models Against Data Poisoning

2026-06-24 · Poojitha Thota, Shirin Nilizadeh

General AI

Training-time data poisoning during fine-tuning poses a significant threat to large language models (LLMs) deployed for abstractive text summarization, where small task-specific datasets exert disproportionate influence on model behavior. In this setting, adversaries manipulate fine-tuning data to induce persistent sum…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.2

RevengeBench: Reverse Engineering Code-Space Policies from Behavioral Experiments

2026-06-24 · Babak Rahmani, Sebastian Dziadzio, Joschka Strüber, Sergio Hernández-Gutiérrez, Matthias Bethge

General AI

For most of scientific history, researchers studying behavior could only infer hidden mechanisms from outward actions: an inverse problem that becomes more tractable when observation is augmented by targeted intervention. We pose a computational analogue: given only behavioral traces of an agent in a game environment, …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.1

QFedAgent: Quantum-Enhanced Personalized Federated Learning for Multi-Agent Activity Recognition

2026-07-02 · Quoc Bao Phan, Tuy Tan Nguyen

General AI

Federated learning (FL) enables collaborative model training across distributed devices without sharing raw data, making it suitable for privacy-sensitive robotic sensing applications. However, multi-agent systems generate heterogeneous and non-independent and identically distributed (non-IID) multimodal sensor streams…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 11.0

Bi-CRCL: Bidirectional Conservative-Radical Complementary Learning with Pre-trained Foundation Models for Class-incremental Medical Image Analysis

2026-03-24 · Xinyao Wu, Zhe Xu, Cheng Chen, Jiawei Ma, Yefeng Zheng, Raymond Kai-yu Tong

Research Track A · General AI

Class-incremental learning (CIL) in medical image-guided diagnosis requires retaining prior diagnostic knowledge while adapting to newly emerging disease categories, which is critical for scalable clinical deployment. This problem is particularly challenging due to heterogeneous data and privacy constraints that preven…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 11.0

Critical Patch-Aware Sparse Prompting with Decoupled Training for Continual Learning on the Edge

2026-04-08 · Wonseon Lim, Jaesung Lee, Dae-Won Kim

Research Track A · General AI

Continual learning (CL) on edge devices requires not only high accuracy but also training-time efficiency to support on-device adaptation under strict memory and computational constraints. While prompt-based continual learning (PCL) is parameter-efficient and achieves competitive accuracy, prior work has focused mainly…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.0

From Imitation to Discrimination: Progressive Curriculum Learning for Robust Web Navigation

2026-04-14 · Chuang Peng, Wei Zhang, Renshuai Tao, Xinhao Zhang, Jian Yang

Research Track B · General AI

Text-based web agents offer computational efficiency for autonomous web navigation, yet developing robust agents remains challenging due to the noisy and heterogeneous nature of real-world HTML. Standard Supervised Fine-Tuning (SFT) approaches fail in two critical dimensions: they lack discrimination capabilities to re…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.0

Mistake gating leads to energy and memory efficient continual learning

2026-04-15 · Aaron Pache, Mark CW van Rossum

Research Track A · General AI

Synaptic plasticity is metabolically expensive, yet animals continuously update their internal models without exhausting energy reserves. However, when artificial neural networks are trained, the network parameters are typically updated on every sample that is presented, even if the sample was classified correctly. Ins…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.0

Agentic Fusion of Large Atomic and Language Models to Accelerate Superconductors Discovery

2026-04-29 · Mingze Li, Yu Rong, Songyou Li, Lihong Wang, Jiacheng Cen, Liming Wu, Anyi Li, Zongzhao Li, Qiuliang Liu, Rui Jiao, Tian Bian, Pengju Wang, Hao Sun, Jianfeng Zhang, Ji-Rong Wen, Deli Zhao, Shifeng Jin, Tingyang Xu, Wenbing Huang

General AI

The discovery of novel materials is critical for global energy and quantum technology transitions. While deep learning has fundamentally reshaped this landscape, existing predictive or generative models typically operate in isolation, lacking the autonomous orchestration required to execute the full discovery process. …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.0

SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters

2026-05-01 · Dongxin Guo, Jikun Wu, Siu Ming Yiu

Research Track B · General AI

AI agents execute tens to hundreds of chained LLM calls per task, yet GPU schedulers treat each call as independent, discarding gigabytes of intermediate state between steps and inflating end-to-end latency by 3-8x. We argue that this request-level abstraction is fundamentally mismatched to compound AI workloads, and p…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.0

Autonomous Drift Learning in Data Streams: A Unified Perspective

2026-05-02 · Xiaoyu Yang, En Yu, Jie Lu

Research Track A

In the pursuit of autonomous learning systems, the foundational assumption of stationarity, the premise that data distributions and model behaviors remain constant, is fundamentally untenable. Historically, the research community has addressed non-stationary environments almost exclusively under the scope of concept dr…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.0

BAMI: Training-Free Bias Mitigation in GUI Grounding

2026-05-07 · Borui Zhang, Bo Zhang, Bo Wang, Wenzhao Zheng, Yuhao Cheng, Liang Tang, Yiqiang Yan, Jie Zhou, Jiwen Lu

Research Track B · General AI

GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance. Utilizing the proposed \textbf{Masked Prediction Distribution (MPD)} attribution metho…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.0

BioTool: A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models

2026-05-07 · Xin Gao, Ruiyi Zhang, Meixi Du, Peijia Qin, Pengtao Xie

Research Track A · General AI

Despite the success of large language models (LLMs) on general-purpose tasks, their performance in highly specialized domains such as biomedicine remains unsatisfactory. A key limitation is the inability of LLMs to effectively leverage biomedical tools, which clinical experts and biomedical researchers rely on extensiv…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.0

On the Implicit Reward Overfitting and the Low-rank Dynamics in RLVR

2026-05-07 · Hao Ye, Jisheng Dang, Junfeng Fang, Bimei Wang, Yizhou Zhang, Ning Lv, Wencan Zhang, Hong Peng, Bin Hu, Tat-Seng Chua

Research Track A · General AI

Recent extensive research has demonstrated that the enhanced reasoning capabilities acquired by models through Reinforcement Learning with Verifiable Rewards (RLVR) are primarily concentrated within the rank-1 components. Predicated on this observation, we employed Periodic Rank-1 Substitution and identified a counteri…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.0

The Expense of Seeing: Attaining Trustworthy Multimodal Reasoning Within the Monolithic Paradigm

2026-05-21 · Karan Goyal

General AI

The rapid proliferation of Vision-Language Models (VLMs) is often framed as enabling unified multimodal knowledge discovery but rests on an under-examined assumption: that current VLMs faithfully synthesise multimodal data. We argue they often do not, and this gap reflects a trustworthiness problem in the dominant Visi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.0

Understanding Data Temporality Impact on Large Language Models Pre-training

2026-05-21 · Pilchen Hippolyte, Fabre Romain, Signe Talla Franck, Perez Patrick, Grave Edouard

Research Track A · General AI

Large language models (LLMs) are typically trained on shuffled corpora, yielding models whose knowledge is frozen at train time and whose temporal grounding remains poorly understood. In this work, we study the impact of pre-training dynamics on the acquisition of time-sensitive factual knowledge, focusing specifically…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.0

When Confidence Misleads: Suffix Anchoring and Anchor-Proximity Confidence Modulation for Diffusion Language Models

2026-05-27 · Jungwon Park, Jimyeong Kim, Jungmin Ko, Nojun Kwak, Wonjong Rhee

General AI

Diffusion language models decode text by iteratively denoising masked token sequences, making the choice of which positions to decode a central inference-time decision. Most training-free decoding strategies use model confidence for position selection, assuming that high-confidence positions are ready to be decoded. In…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.0

Reducing Political Manipulation with Consistency Training

2026-05-28 · Long Phan, Devin Kim, Alexander Pan, Alice Blair, Adam Khoja, Dan Hendrycks

General AI

Large language models (LLMs) exhibit systematic political bias across a variety of sensitive contexts. We find that LLMs handle counterpart topics from opposing political sides asymmetrically. We refer to this phenomenon as covert political bias and identify 7 categories of techniques through which it operates. We prop…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.0

When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems

2026-05-28 · Corrado Rainone, Davide Belli, Bence Major, Arash Behboodi

General AI

The design space of agentic AI inference spans two extremes: frontier large language models (LLMs), typically hosted in the cloud and offering strong performance across a wide range of tasks at substantially high cost, and more cost-efficient small language models (SLMs), which are amenable to on-device inference. Hybr…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.0

When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM Agents

2026-06-18 · Kaiyue Yang, Yuyan Bu, Jingwei Yi, Yuchi Wang, Biyu Zhou, Juntao Dai, Songlin Hu, Yaodong Yang

General AI

As LLM agents increasingly select tools autonomously, their choices among tools with different privileges become safety-relevant. However, prior tool-selection studies focus on safety-agnostic metadata preferences, leaving privilege-sensitive choices underexplored. To address this gap, we study over-privileged tool sel…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.0

Whose Agent Are You? Multi-Layer Fingerprinting and Attribution of Autonomous Web Agents

2026-06-18 · Dayeon Kang, Hyejun Jeong, Jade Sheffey, Pubali Datta, Amir Houmansadr

Research Track B · General AI

As AI web agents proliferate, combining large language models with autonomous, browser-level control, indiscriminate content scraping by web agents has emerged as a privacy and security challenge. Existing defenses, such as robots.txt and active bot-blocking, are insufficient, as they are widely violated and easily cir…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.0

Learning Transferable Dynamics Priors from Action to World Modeling

2026-06-28 · Ze Huang, Jiahui Zhang, Hairuo Liu, Chenxi Zhang, Ran Cheng, Li Zhang

General AI

We study action-conditioned world modeling as a scalable way to learn transferable dynamics priors for robot learning. By pretraining a model to predict how actions drive visual scene evolution, the resulting world model captures reusable interaction dynamics beyond appearance-level video generation. Concretely, we pre…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 11.0

Morphing into Hybrid Attention Models

2026-06-29 · Disen Lan, Jianbin Zheng, Yuxi Ren, Xin Xia, Xuanda Wang, Xuefeng Xiao, Xipeng Qiu, Yu Cheng

General AI

Hybrid attention models improve long-context efficiency by retaining only a subset of full-attention layers and replacing the remaining layers with linear attention. However, the effectiveness of Transformer-to-hybrid conversion critically depends on which layers preserve full attention. Existing hybrid layer selection…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 11.0

Walking in the Implicit: Interactive World Exploration via Neural Scene Representation

2026-06-29 · Zhiqi Li, Chengrui Dong, Zhenhua Du, Hangning Zhou, Cong Qiu, Hailong Qin, Mu Yang, Dongxu Wei, Peidong Liu

General AI

Interactive video generation systems for camera-controlled world exploration roll out growing sequences of latent video frames, entangling state transition with high-frequency observation synthesis. We propose Walking in the Implicit, a scene-centric paradigm that changes the rollout variable from frame latents to a fi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.9

Can Scale Save Us From Plasticity Loss in Large Language Models?

2026-06-23 · J. Fernando Hernandez-Garcia, Tomás Figliolia, Beren Millidge

Research Track A · General AI

The loss of plasticity - the ability of a network to learn new information after having already learned older information - is a fundamental challenge in creating artificial neural networks capable of continual learning. Although this phenomenon has been known for decades, it has mostly been studied in older, relativel…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.8

SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering

2026-05-04 · Jiujiu Chen, Yazheng Liu, Sihong Xie, Hui Xiong

General AI

Large language models excel at complex reasoning, yet evaluating their intermediate steps remains challenging. Although process reward models provide step-wise supervision, they often suffer from a risk compensation effect, where incorrect steps are offset by later correct ones, assigning high rewards to flawed reasoni…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.8

ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generation

2026-05-07 · Omar El Khalifi, Thomas Rossi, Oscar Fossey, Thibault Fouque, Ulysse Mizrahi, Philip Torr, Ivan Laptev, Fabio Pizzati, Baptiste Bellot-Gurlet

General AI

For artistic applications, video generation requires fine-grained control over both performance and cinematography, i.e., the actor's motion and the camera trajectory. We present ActCam, a zero-shot method for video generation that jointly transfers character motion from a driving video into a new scene and enables per…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.8

NeuroAgent: LLM Agents for Multimodal Neuroimaging Analysis and Research

2026-05-07 · Lujia Zhong, Yihao Xia, Jianwei Zhang, Shuo huang, Jiaxin Yue, Mingyang Xia, Yonggang Shi

General AI

Multimodal neuroimaging analysis often involves complex, modality-specific preprocessing workflows that require careful configuration, quality control, and coordination across heterogeneous toolchains. Beyond preprocessing, downstream statistical analysis and disease classification commonly require task-specific code, …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.8

Aligning Flow Map Policies with Optimal Q-Guidance

2026-05-12 · Christos Ziakas, Alessandra Russo, Avishek Joey Bose

General AI

Generative policies based on expressive model classes, such as diffusion and flow matching, are well-suited to complex control problems with highly multimodal action distributions. Their expressivity, however, comes at a significant inference cost: generating each action typically requires simulating many steps of the …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.8

Events as Triggers for Behavioral Diversity in Multi-Agent Reinforcement Learning

2026-05-12 · Hannes Büchi, Manon Flageat, Eduardo Sebastián, Amanda Prorok

General AI

Effective multi-agent cooperation requires agents to adopt diverse behaviors as task conditions evolve-and to do so at the right moment. Yet, current Multi-Agent Reinforcement Learning (MARL) frameworks that facilitate this diversity are still limited by the fact that they bind fixed behaviors to fixed agent identities…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.8

LLM Code Smells: A Taxonomy and Detection Approach

2026-05-21 · Zacharie Chenail-Larcher, Brahim Mahmoudi, Naouel Moha, Quentin Stiévenart, Florent Avellaneda

General AI

Large Language Models (LLMs) are increasingly integrated into software systems for diverse purposes, due to their versatility, flexibility, and ability to simulate human reasoning to some extent. However, poor integration of LLM inference in source code can undermine software system quality. Therefore, inadequate LLM i…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.8

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

2026-05-22 · Yifan Yang, Ziyang Gong, Weiquan Huang, Qihao Yang, Ziwei Zhou, Zisu Huang, Yan Li, Xuemei Gao, Qi Dai, Bei Liu, Kai Qiu, Yuqing Yang, Dongdong Chen, Xue Yang, Chong Luo

General AI

Agent skills today are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision, none of which behaves like a deep-learning optimizer for the skill, and none of which reliably improves over its starting point under feedback. We argue the skill should instead be trained as the external state…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.8

Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Component LLM Agents

2026-05-28 · Anany Kotawala

General AI

Multi-component LLM agents assemble probabilistic claims from components that each see only part of a joint problem; the composition can violate basic probability axioms even when every component is locally coherent. We formalise this locally coherent, globally incoherent failure via the compositional residual eps*, th…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.8

SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations

2026-05-28 · Qinpei Luo, Ruichun Ma, Xinyu Zhang, Lili Qiu

General AI

Printed circuit board (PCB) schematic design defines nearly all electronic hardware, but it remains manual and expertise-intensive. While generative AI has advanced digital and analog IC design, PCB schematic generation from natural-language intent is largely unexplored. This paper presents SchGen, the first large lang…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.8

BrainJanus: A Unified Model for Understanding and Generation across Brain, Vision, and Language

2026-06-29 · Haitao Wu, Qirui Zhang, Zhouheng Yao, Shangquan Sun, Qihao Zheng, Mianxin Liu, Chi Zhang, Wanli Ouyang, Chunfeng Song, Changqing Zhang, Jiamin Wu

General AI

Modeling the bidirectional correspondence between external sensory stimuli and internal neural activity has emerged as a critical frontier in neuroscience. However, existing approaches predominantly treat brain encoding and decoding as isolated tasks, relying heavily on unimodal alignment and external priors while over…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.8

LeVo 2: Stable and Melodious Song Generation via Hierarchical Representation Modeling and Progressive Post-Training

2026-06-29 · Shun Lei, Huaicheng Zhang, Dapeng Wu, Yaoxun Xu, Lishi Zuo, Wei Tan, Hangting Chen, Guangzheng Li, Jianwei Yu, Zhiyong Wu, Dong Yu

General AI

Full-length song generation must preserve coherence and musicality, render detailed vocal and accompaniment acoustics, and follow lyrics and prompts. Existing language model-based systems face a structural trade-off: mixed-token modeling preserves vocal-instrument coordination but obscures track-specific details, where…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.8

PyMETA: A Benchmark Dataset for Hierarchical Student Code Error Classification with Python-Interpreter-Based Labels

2026-06-29 · Chuyue Li, Ziqi Tang, Jingyi Wang, Yu Wu, Kazuma Hashimoto, Lingyu Gao

General AI

With the advancement of Large Language Models (LLMs), code error detection has extended beyond traditional IDE diagnostics to context-sensitive debugging in educational scenarios. However, existing approaches lack large-scale datasets, multi-error analysis, and unified error taxonomies. To address this, we introduce Py…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.8

UnfoldArt: Zero-Shot Recovery of Full Articulated 3D Objects from Text or Image

2026-06-29 · Mohamed el amine boudjoghra, Ivan Laptev, Angela Dai

General AI

Articulated 3D objects are essential for interactive environments in embodied AI, robotics, and virtual reality, but reconstructing their structure and motion from sparse observations remains challenging. Existing approaches remain largely constrained by lack of supervised data or lack the priors needed to reliably rec…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.8

AutoMem: Automated Learning of Memory as a Cognitive Skill

2026-07-01 · Shengguang Wu, Hao Zhu, Yuhui Zhang, Xiaohan Wang, Serena Yeung-Levy

General AI

Memory expertise is a learned skill: knowing what to encode, when to retrieve, and how to organize knowledge--a capacity known in cognitive science as metamemory. We bring this perspective to LLMs by treating memory management as a trainable skill. We promote file-system operations to first-class memory actions alongsi…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 10.7

MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management

2026-06-18 · Guangyi Liu, Gao Wu, Congxiao Liu, Pengxiang Zhao, Liang Liu, Mading Li, Qi Zhang, Mengyan Wang, Liang Guo, Yong Liu

Research Track B · General AI

MLLM-based mobile GUI agents have made substantial progress on short-horizon tasks, yet remain unreliable on long-horizon tasks that require retaining intermediate facts across many steps and app transitions. We attribute this limitation to ReAct-style prompting, which passively accumulates per-step records, leading to…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 10.6

Distributed Attacks in Persistent-State AI Control

2026-07-02 · Josh Hills, Ida Caspary, Asa Cooper Stickland

General AI

As AI coding agents become more autonomous, they increasingly ship code iteratively, with the codebase persisting across sessions. This persistence creates a new attack surface: a misaligned or prompt-injected agent can distribute attacks across pull requests (PRs) and time its payload for the PR with the best natural …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 10.6

Human Capital, Not Model Benchmarks, Predicts Hybrid Intelligence in Forecasting

2026-07-02 · Vivienne Ming

General AI

Whether pairing people with AI helps or hurts is usually reported as a single average effect. Using a real-money prediction market (Polymarket) as an objective, externally resolved benchmark, this pilot shows that the value of human-AI collaboration depends on a specific, measurable form of human capital. Analyzed at t…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 10.6

LACUNA: A Testbed for Evaluating Localization Precision for LLM Unlearning

2026-07-02 · Matteo Boglioni, Thibault Rousset, Siva Reddy, Marius Mosbach, Verna Dankers

General AI

LLMs memorize sensitive training data, including personally identifiable information (PII), creating a pressing need for reliable post hoc removal methods. Unlearning has emerged as a promising solution, with state-of-the-art(SOTA) methods often following a localize-first, unlearn-second paradigm that targets specific …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 10.5

Self-Execution Simulation Improves Coding Models

2026-03-11 · Gallil Maimon, Ori Yoran, Felix Kreuk, Michael Hassid, Gal Cohen, Pierre Chambon, Yossi Adi

General AI

A promising research direction in enabling LLMs to generate consistently correct code involves addressing their inability to properly estimate program execution, particularly for code they generate. In this work, we demonstrate that Code LLMs can be trained to simulate program execution in a step-by-step manner and tha…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.5

Evidence of an Emergent "Self" in Continual Robot Learning

2026-03-25 · Adidev Jhunjhunwala, Judah Goldfeder, Hod Lipson

Research Track A

A key challenge to understanding self-awareness has been a principled way of quantifying whether an intelligent system has a concept of a "self," and if so how to differentiate the "self" from other cognitive structures. We propose that the "self" can be isolated by seeking the invariant portion of cognitive process th…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 10.5

Ask or Assume? Uncertainty-Aware Clarification-Seeking in Coding Agents

2026-03-27 · Nicholas Edwards, Sebastian Schuster

General AI

As Large Language Model (LLM) agents are increasingly deployed in open-ended domains like software engineering, they frequently encounter underspecified instructions that lack crucial context. While human developers naturally resolve underspecification by asking clarifying questions, current agents are largely optimize…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.5

Story2Proposal: A Scaffold for Structured Scientific Paper Writing

2026-03-28 · Zhuoyang Qian, Wei Shi, Xu Lin, Li Ling, Meng Luo, Ziming Wang, Zhiwei Zhang, Tengyue Xu, Gaoge Liu, Zhentao Zhang, Shuo Zhang, Ziqi Wang, Zheng Feng, Yan Luo, Shu Xu, Yongjin Chen, Zhibo Feng, Zhuo Chen, Bruce Yuan, Biao Wu, Harry Wang, Kris Chen

General AI

Generating scientific manuscripts requires maintaining alignment between narrative reasoning, experimental evidence, and visual artifacts across the document lifecycle. Existing language-model generation pipelines rely on unconstrained text synthesis with validation applied only after generation, often producing struct…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.5

SNID-SAGE: A Modern Framework for Interactive Supernova Classification and Spectral Analysis

2026-03-30 · Fiorenzo Stoppa, Stephen J. Smartt

Research Track A

We present SNID-SAGE (SuperNova IDentification-Spectral Analysis and Guided Exploration), a framework for supernova spectral classification with both a fully interactive graphical interface and a scriptable command-line pipeline for large-scale processing. The pipeline combines deterministic spectral preprocessing, FFT…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.5

ASI-Evolve: AI Accelerates AI

2026-03-31 · Weixian Xu, Tiantian Mi, Yixiu Liu, Yang Nan, Zhimeng Zhou, Lyumanshan Ye, Lin Zhang, Yu Qiao, Pengfei Liu

General AI

Can AI accelerate the development of AI itself? While recent agentic systems have shown strong performance on well-scoped tasks with rapid feedback, it remains unclear whether they can tackle the costly, long-horizon, and weakly supervised research loops that drive real AI progress. We present ASI-Evolve, an agentic fr…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.5

SeGPruner: Semantic-Geometric Visual Token Pruner for 3D Question Answering

2026-03-31 · Wenli Li, Kai Zhao, Haoran Jiang, Enquan Yang, Yi Su, Dan Zeng

General AI

Vision-language models (VLMs) have been widely adopted for 3D question answering (3D QA). In typical pipelines, visual tokens extracted from multiple viewpoints are concatenated with language tokens and jointly processed by a large language model (LLM) for inference. However, aggregating multi-view observations inevita…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.5

Forecasting Supply Chain Disruptions with Foresight Learning

2026-04-01 · Benjamin Turtel, Paul Wilczewski, Kris Skotheim

General AI

Anticipating supply chain disruptions before they materialize is a core challenge for firms and policymakers alike. A key difficulty is learning to reason reliably about infrequent, high-impact events from noisy and unstructured inputs - a setting where general-purpose models struggle without task-specific adaptation. …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.5

SPASM: Stable Persona-driven Agent Simulation for Multi-turn Dialogue Generation

2026-04-10 · Han Luo, Guy Laban

General AI

Large language models are increasingly deployed in multi-turn settings such as tutoring, support, and counseling, where reliability depends on preserving consistent roles, personas, and goals across long horizons. This requirement becomes critical when LLMs are used to generate synthetic dialogues for training and eval…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.5

HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System

2026-04-15 · Tianshuo Yang, Guanyu Chen, Yutian Chen, Zhixuan Liang, Yitian Liu, Zanxin Chen, Chunpu Xu, Haotian Liang, Jiangmiao Pang, Yao Mu, Ping Luo

General AI

While end-to-end Vision-Language-Action (VLA) models offer a promising paradigm for robotic manipulation, fine-tuning them on narrow control data often compromises the profound reasoning capabilities inherited from their base Vision-Language Models (VLMs). To resolve this fundamental trade-off, we propose HiVLA, a visu…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.5

Reinforcement Learning via Value Gradient Flow

2026-04-15 · Haoran Xu, Kaiwen Hu, Somayeh Sojoudi, Amy Zhang

General AI

We study behavior-regularized reinforcement learning (RL), where regularization toward a reference distribution (the dataset in offline RL or the base model in LLM RL finetuning) is essential to prevent value over-optimization caused by erroneous out-of-distribution extrapolation. Existing methods either rely on repara…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.5

A Semantic Geometry for Uncovering Paradigm Dynamics via Scientific Publications

2026-04-16 · Jinchang Liu, Qingshan Zhou, Hongkan Chen, Yi Bu

Research Track A

Science advances not only by accumulating discovered patterns but by changing how new problems and solutions are expressed. While structural indicators track scholarly attention, they offer only an indirect proxy for the reorganization of meaning. We propose a semantic geometry based on the R-P-C (references, focal pub…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.5

The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation

2026-04-18 · Jiaxin Zhang, Xiangyu Peng, Qinglin Chen, Qinyuan Ye, Caiming Xiong, Chien-Sheng Wu

Research Track A

On-policy distillation (OPD) is an increasingly important paradigm for post-training language models. However, we identify a pervasive Scaling Law of Miscalibration: while OPD effectively improves task accuracy, it systematically traps models in severe overconfidence. We trace this failure to an information mismatch: t…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.5

Enhancing Continual Learning of Vision-Language Models via Dynamic Prefix Weighting

2026-04-20 · Hyeonseo Jang, Hyuk Kwon, Kibok Lee

Research Track A

We investigate recently introduced domain-class incremental learning scenarios for vision-language models (VLMs). Recent works address this challenge using parameter-efficient methods, such as prefix-tuning or adapters, which facilitate model adaptation to downstream tasks by incorporating task-specific information int…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.5

SkillLearnBench: Benchmarking Continual Learning Methods for Agent Skill Generation on Real-World Tasks

2026-04-22 · Shanshan Zhong, Yi Lu, Jingjie Ning, Yibing Wan, Lihan Feng, Yuyi Ao, Leonardo F. R. Ribeiro, Markus Dreyer, Sean Ammirati, Chenyan Xiong

Research Track A · General AI

Skills have become the de facto way to enable LLM agents to perform complex real-world tasks with customized instructions, workflows, and tools, but how to learn them automatically and effectively remains unclear. We introduce SkillLearnBench, the first benchmark for evaluating continual skill learning methods, compris…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.5

Efficient Agent Evaluation via Diversity-Guided User Simulation

2026-04-23 · Itay Nakash, George Kour, Ateret Anaby-Tavor

General AI

Large language models (LLMs) are increasingly deployed as customer-facing agents, yet evaluating their reliability remains challenging due to stochastic, multi-turn interactions. Current evaluation protocols rely on linear Monte Carlo rollouts of complete agent-user conversations to estimate success. However, this appr…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.5

GoClick: Lightweight Element Grounding Model for Autonomous GUI Interaction

2026-04-27 · Hongxin Li, Yuntao Chen, Zhaoxiang Zhang

Research Track B · General AI

Graphical User Interface (GUI) element grounding (precisely locating elements on screenshots based on natural language instructions) is fundamental for agents interacting with GUIs. Deploying this capability directly on resource-constrained devices like mobile phones is increasingly critical for GUI agents requiring lo…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.5

NORACL: Neurogenesis for Oracle-free Resource-Adaptive Continual Learning

2026-04-29 · Karthik Charan Raghunathan, Christian Metzner, Laura Kriener, Melika Payvand

Research Track A · General AI

In a continual learning setting, we require a model to be plastic enough to learn a new task and stable enough to not disturb previously learned capabilities. We argue that this dilemma has an architectural root. A finite network has limited representational and plastic resources, yet the required capacity depends on p…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.5

When Does Structure Matter in Continual Learning? Dimensionality Controls When Modularity Shapes Representational Geometry

2026-04-30 · Kathrin Korte, Joachim Winter Pedersen, Eleni Nisioti, Sebastian Risi

Research Track A

To preserve previously learned representations, continual learning systems must strike a balance between plasticity, the ability to acquire new knowledge, and stability. This stability-plasticity dilemma affects how representations can be reused across tasks: shared structure enables transfer when tasks are similar but…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.5

Beyond Forgetting in Continual Medical Image Segmentation: A Comprehensive Benchmark Study

2026-05-07 · Bomin Wang, Hangqi Zhou, Yibo Gao, Xiahai Zhuang

Research Track A · General AI

Continual learning (CL) is essential for deploying medical image segmentation models in clinical environments where imaging domains, anatomical targets, and diagnostic tasks evolve over time. However, continual segmentation still faces three main challenges. First, the scenarios for this task remain insufficiently stan…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.5

Mela: Test-Time Memory Consolidation based on Transformation Hypothesis

2026-05-11 · Lungchuan Chen

Research Track A · General AI

Memory consolidation, the process by which transient experiences are transformed into stable, structured representations, is a foundational organizing principle in the human brain, yet it remains largely unexplored as a design principle for modern sequence models. In this work, we leverage established neuroscientific t…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.5

ConMoE: Expert-Pool Consolidation via Prototype Reassignment for MoE Compression

2026-05-28 · Yilun Yao, Jiaming Pan, Elsie Dai, Peizhuang Cong, Yaoming Li, Tong Yang

Research Track A · General AI

Mixture-of-Experts (MoE) language models reduce per-token computation but still require storing and serving all experts, making deployment memory-intensive. Existing post-training compression methods mainly shrink this cost by pruning experts or merging their weights. We formulate post-training MoE compression as exper…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.5

BRepCLIP: Contrastive Multimodal Pretraining on BRep Primitives for CAD Understanding

2026-06-03 · Muhammad Usama, Didier Stricker, Mohammad Sadil Khan, Muhammad Zeshan Afzal

General AI

Learning representations of CAD models is a largely open problem. While 3D representation learning has flourished around point clouds and meshes, the native format of CAD - boundary representations BReps, which encodes exact parametric surfaces, curves, and their topology, has received little attention as a representat…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.5

AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents

2026-06-04 · Hao Bai, Rui Yang, Chenlu Ye, Spencer Whitehead, Aviral Kumar, Tong Zhang

Research Track B · General AI

Training vision-language web agents with multi-step RL is compute-intensive, with two dominant forms of inefficiency: idle GPUs in synchronous RL, and trajectories that use more steps and tokens than necessary. We present AsyncWebRL, which addresses both. On the system side, an asynchronous design overlaps rollout, gra…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.5

ForeSci: Evaluating LLM Agents for Forward-Looking AI Research Judgment

2026-06-04 · Qiuyu Tian, Haojie Yin, Yingce Xia, Youyong Kong, Zequn Liu

General AI

AI research often requires decisions before future evidence exists: which bottleneck to attack, which direction to pursue, or where a project should be positioned. We introduce ForeSci, a temporally controlled benchmark for evaluating whether LLM agents can make such forward-looking research judgements from historical …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.5

Subspace-Aware Sparse Autoencoders for Effective Mechanistic Interpretability

2026-06-04 · Seyed Arshan Dalili, Mehrdad Mahdavi

Research Track A · General AI

Sparse Autoencoders (SAEs) are widely used for mechanistic interpretability in large language models, yet their formulation assigns each latent feature a single decoder direction, implicitly assuming features to be one-dimensional. We show that this assumption mismatches with the multi-dimensional structure of model fe…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.5

The Cold-Start Safety Gap in LLM Agents

2026-06-05 · Chung-En Sun, Linbo Liu, Tsui-Wei Weng

General AI

Are tool-calling LLM agents equally safe throughout a conversation? We discover they are not: agents are most vulnerable at the very start of a session and become substantially safer after a few regular agentic tasks -- a phenomenon we term the cold-start safety gap. To study this systematically, we introduce Safety Ov…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 10.5

SKILL.nb: Selective Formalization and Gated Execution for Durable Agent Workflows

2026-06-06 · Amine El Hattami, Nicolas Chapados, Christopher Pal

Research Track B · General AI

AI agents increasingly turn past experience into reusable artifacts such as code, workflows, and procedural memories. Reuse can improve efficiency, but it also creates a lifecycle reliability problem: artifacts that succeed once may fail under environment drift, underspecified tasks, or changing task distributions, esp…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.5

PIPE-Cypher: Automatic Enterprise Benchmark Generation for Text-to-Cypher Systems

2026-06-07 · Suraj Ranganath, Anish Raghavendra

General AI

Enterprise property graphs vary widely in schema structure, internal terminology, domain assumptions, governance constraints, and user interaction patterns. A deployment-relevant Text2Cypher benchmark therefore reflects the questions users and agents actually ask of that graph. Creating such a benchmark is difficult be…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.5

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

2026-06-09 · Xinyu Zhou, Boyu Zhu, Yi Xu, Zhiwei Li, Yingfa Chen, Huiming Wang, Zhijiang Guo

General AI

Chain-of-thought (CoT) supervised fine-tuning (SFT) is widely adopted to improve reasoning ability, yet we find that it systematically degrades long-context recall in hybrid linear-attention models. Across architectures including HypeNet and Jet-Nemotron, retrieval performance on Needle-In-A-Haystack (NIAH) deteriorate…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.5

Risk Under Pressure: Compute-Aware Evaluation of Adversarial Robustness in Language Models

2026-06-09 · Malikeh Ehghaghi, Boglárka Ecsedi, Marsha Chechik, Colin Raffel

General AI

Adversarial robustness evaluations of large language models (LLMs) typically report attack success rate (ASR) under fixed query budgets, implicitly treating all attacks as equally costly. In practice, the computational expense of different attack strategies can vary by orders of magnitude. Consequently, ASR at a fixed …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.5

WorldOlympiad: Can Your World Model Survive a Triathlon?

2026-06-09 · Yuke Zhao, Wangbo Zhao, Weijie Wang, Zeyu Zhang, Dakai An, Akide Liu, Yinghao Yu, Jiasheng Tang, Fan Wang, Wei Wang, Bohan Zhuang

General AI

We introduce WorldOlympiad, a benchmark for diagnosing video-based world models across physical faithfulness, geometric consistency, and interaction fidelity. While existing benchmarks often focus on visual quality, semantic alignment, or short-term temporal coherence, they provide limited insight into whether generate…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.5

Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents

2026-06-11 · Yujun Zhou, Kehan Guo, Haomin Zhuang, Xiangqi Wang, Yue Huang, Zhenwen Liang, Pin-Yu Chen, Tian Gao, Nuno Moniz, Nitesh V. Chawla, Xiangliang Zhang

General AI

Interactive LLM agents are becoming part of daily work, but they do not reliably become easier to work with over time: a correction remembered in one session may still be violated in the next. We study this gap between preference access and preference compliance. In tasks derived from anonymized real-user friction case…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.5

Minim: Privacy-Aware Minimal View for Agents via Trusted Local Sanitization

2026-06-11 · Hexuan Yu, Chaoyu Zhang, Heng Jin, Shanghao Shi, Ning Zhang, Y. Thomas Hou, Wenjing Lou

Research Track B · General AI

Modern LLM-powered autonomous agents increasingly rely on rich user interface (UI) state observations to achieve reliable action grounding in complex digital environments. However, many deployments transmit the full UI state to remote inference servers even when most elements are irrelevant to the current task, which c…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.5

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

2026-06-12 · Chenxin Li, Zhengyao Fang, Zhengyang Tang, Pengyuan Lyu, Xingran Zhou, Xin Lai, Fei Tang, Liang Wu, Yiduo Guo, Weinong Wang, Junyi Li, Yi Zhang, Yang Ding, Huawen Shen, Sunqi Fan, Shangpin Peng, Zheng Ruan, Anran Zhang, Benyou Wang, Chengquan Zhang, Han Hu

General AI

Phone agents are increasingly expected to complete real mobile workflows rather than merely predict the next screen action. However, much of the current mobile-agent literature still evaluates agents primarily as GUI controllers that observe a screen, emit taps and swipes, and are scored by target app state. Real phone…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.5

Who Flips? Self- and Cross-Model Counterarguments Reveal Answer Instability in LLMs

2026-06-14 · Nafiseh Nikeghbal, Amir Hossein Kargaran, Shaghayegh Kolli, Jana Diesner

General AI

Standard accuracy benchmarks are designed to test how closely large language models (LLMs) approach correct answers, but are not suitable for testing whether LLMs stick with a correct answer when that answer is challenged by a plausible counter-argument. We introduce a controlled protocol for evaluating answer stabilit…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.5

When Robots Sleep: Offline Skill Consolidation for Shared-Policy Robot Learning

2026-06-16 · Nethmi Jayasinghe, Diana Gontero, Amit Ranjan Trivedi

Research Track A · General AI

Robots that learn over long deployments must add new skills without losing the shared policy structure that makes earlier skills reusable. We study sequential robot skill learning, where previous trajectories and task losses may be unavailable, and the deployed policy must remain a single shared controller without task…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.5

Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness

2026-06-17 · Zijian Wang, Hanqi Li, Ziyue Yang, Zijian Hu, Shenghan Zuo, Yunzhe Zhang, Da Ma, Danyu Luo, Chenrun Wang, Jing Peng, Tiancheng Huang, Sijia Guo, Huayang Wang, Zichen Zhu, Senyu Han, Yilu Cao, Kai Yu, Lu Chen

General AI

AI systems can increasingly automate scientific workflows, but the reasoning that links prior evidence, generated ideas, experiments and final claims often remains implicit inside model inference. Here we introduce Xcientist, a research harness that externalizes research synthesis and experimental validation into inspe…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.5

HANSEL: Extracting Breadcrumbs from Web Agent Trajectories for Interactive Verification

2026-06-17 · Yujin Zhang, Daye Nam

Research Track B · General AI

AI web agents can perform complex, multi-step tasks such as searching for products, comparing options, and making purchases on behalf of users. However, verifying the correctness of an agent's output remains difficult. Existing transparency mechanisms, including full trajectory logs, source links, screenshots, and LLM-…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.5

Scalable Behaviour Cloning on Browser Using via Skill Distillation

2026-06-30 · Kaisen Yang, Zheng Jiang, Yuzhao Peng, Houde Qian, Boshi Zhang, Youjie Zheng, Shijin Hong, Qingle Liu, Ruoyu Han, Bohan Lyu, Bingxiang He, Eren Cai, Calvin Xiao, Qinhuai Na

Research Track A · Research Track B · General AI

Internet users collectively perform an enormous range of skilled work through web browsers, from software development and document editing to search, forms, and enterprise workflows, making human browsing a highly scalable but under-exploited source of reusable browser skills. We argue that the bottleneck for browser a…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 10.4

World Value Models for Robotic Manipulation

2026-06-23 · Zhihao Wang, Jianxiong Li, Yu Cui, Yuan Gao, Xianyuan Zhan, Junzhi Yu, Xiao Ma

General AI

Generalist value models play a pivotal role in scaling robotic policy learning from large-scale, mixed-quality data. Mathematically, accurate value estimation demands deep temporal understanding, requiring models to both ground the current belief using historical context and plan over future outcomes. However, most exi…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.4

LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing

2026-06-25 · Xinyu Wang, Chongbo Zhao, Fangneng Zhan, Yue Ma

General AI

Streaming video editing has made rapid progress, yet practical deployment is still limited by two core issues: maintaining stable backgrounds and non-edited regions over time, and achieving the low latency required for real-time interactive scenarios. Meanwhile, recent streaming video generation methods are mostly deve…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 10.4

ZooClaw-FashionSigLIP2: Distilled Fine-tuning for Robust Fashion Retrieval

2026-06-26 · Siqiao Xue, Chunxue Xu

General AI

Adapting a foundation vision-language encoder to a specialized retrieval task creates a fundamental tradeoff: gains on the target distribution come at the cost of the foundation model's broad generalization, and fashion retrieval is a stringent instance of this problem. We present ZooClaw-FashionSigLIP2, a fashion-spec…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 10.3

SimGym: Traffic-Grounded Browser Agents for Offline A/B Testing in E-Commerce

2026-02-01 · Alberto Castelo, Zahra Zanjani Foumani, Ailin Fan, Keat Yang Koay, Vibhor Malik, Yuanzheng Zhu, Han Li, Meysam Feghhi, Ronie Uliana, Shuang Xie, Zhaoyu Zhang, Angelo Ocana Martins, Mingyu Zhao, Francis Pelland, Jonathan Faerman, Nikolas LeBlanc, Aaron Glazer, Andrew McNamara, Lingyun Wang, Zhong Wu

Research Track B · General AI

A/B testing remains the gold standard for evaluating e-commerce UI changes, yet it diverts traffic, takes weeks to reach significance, and risks harming user experience. We introduce SimGym, a scalable system for rapid offline A/B testing using traffic-grounded synthetic buyers powered by Large Language Model agents op…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 10.3

The Cognitive Firewall:Securing Browser Based AI Agents Against Indirect Prompt Injection Via Hybrid Edge Cloud Defense

2026-03-24 · Qianlong Lan, Anuj Kaul

Research Track B · General AI

Deploying large language models (LLMs) as autonomous browser agents exposes a significant attack surface in the form of Indirect Prompt Injection (IPI). Cloud-based defenses can provide strong semantic analysis, but they introduce latency and raise privacy concerns. We present the Cognitive Firewall, a three-stage spli…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Natural-Language Agent Harnesses

2026-03-26 · Linyue Pan, Lexiao Zou, Shuo Guo, Jingchen Ni, Hai-Tao Zheng

General AI

Agent performance increasingly depends on \emph{harness engineering}, yet harness design is usually buried in controller code and runtime-specific conventions, making it hard to transfer, compare, and study as a scientific object. We ask whether the high-level control logic of an agent harness can instead be externaliz…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Social Hippocampus Memory Learning

2026-03-26 · Liping Yi, Zhiming Zhao, Qinghua Hu

General AI

Social learning highlights that learning agents improve not in isolation, but through interaction and structured knowledge exchange with others. When introduced into machine learning, this principle gives rise to social machine learning (SML), where multiple agents collaboratively learn by sharing abstracted knowledge.…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Wan-Weaver: Interleaved Multi-modal Generation via Decoupled Training

2026-03-26 · Jinbo Xing, Zeyinzi Jiang, Yuxiang Tuo, Chaojie Mao, Xiaotang Gai, Xi Chen, Jingfeng Zhang, Yulin Pan, Zhen Han, Jie Xiao, Keyu Yan, Chenwei Xie, Chongyang Zhong, Kai Zhu, Tong Shen, Lianghua Huang, Yu Liu, Yujiu Yang

General AI

Recent unified models have made unprecedented progress in both understanding and generation. However, while most of them accept multi-modal inputs, they typically produce only single-modality outputs. This challenge of producing interleaved content is mainly due to training data scarcity and the difficulty of modeling …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

AMIGO: Agentic Multi-Image Grounding Oracle Benchmark

2026-03-30 · Min Wang, Ata Mahjoubfar

General AI

Agentic vision-language models increasingly act through extended interactions, but most evaluations still focus on single-image, single-turn correctness. We introduce AMIGO (Agentic Multi-Image Grounding Oracle Benchmark), a long-horizon benchmark for hidden-target identification over galleries of visually similar imag…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

C2RustXW: Program-Structure-Aware C-to-Rust Translation via Program Analysis and LLM

2026-03-30 · Yanyan Yan, Yang Feng, Jiangshan Liu, Di Liu, Zixi Liu, Hao Teng, Baowen Xu

General AI

The growing adoption of Rust for its memory safety and performance has increased the demand for effective migration of legacy C codebases. However, existing rule-based translators (e.g., \ctorust) often generate verbose, non-idiomatic code that preserves unsafe C semantics, limiting readability, maintainability, and pr…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Less Is More? Selective Visual Attention to High-Importance Regions for Multimodal Radiology Summarization

2026-03-31 · Mst. Fahmida Sultana Naznin, Adnan Ibney Faruq, Mushfiqur Rahman, Niloy Kumar Mondal, Md. Mehedi Hasan Shawon, Md Rakibul Hasan

General AI

Automated radiology report summarization aims to distill verbose findings into concise clinical impressions, but existing multimodal models often struggle with visual noise and fail to meaningfully improve over strong text-only baselines in the FINDINGS $\to$ IMPRESSION transformation. We challenge two prevailing assum…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs

2026-03-31 · Zhuowen Liang, Xiaotian Lin, Zhengxuan Zhang, Yuyu Luo, Haixun Wang, Nan Tang

General AI

Large language models (LLMs) are widely applied to data analytics over documents, yet direct reasoning over long, noisy documents remains brittle and error-prone. Hence, we study document question answering (QA) that consolidates dispersed evidence into a structured output (e.g., a table, graph, or chunks) to support r…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Video Models Reason Early: Exploiting Plan Commitment for Maze Solving

2026-03-31 · Kaleb Newman, Tyler Zhu, Olga Russakovsky

General AI

Video diffusion models exhibit emergent reasoning capabilities like solving mazes and puzzles, yet little is understood about how they reason during generation. We take a first step towards understanding this and study the internal planning dynamics of video models using 2D maze solving as a controlled testbed. Our inv…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

UAV-Track VLA: Embodied Aerial Tracking via Vision-Language-Action Models

2026-04-02 · Qiyao Zhang, Shuhua Zheng, Jianli Sun, Chengxiang Li, Xianke Wu, Zihan Song, Zhiyong Cui, Yisheng Lv, Yonglin Tian

General AI

Embodied visual tracking is crucial for Unmanned Aerial Vehicles (UAVs) executing complex real-world tasks. In dynamic urban scenarios with complex semantic requirements, Vision-Language-Action (VLA) models show great promise due to their cross-modal fusion and continuous action generation capabilities. To benchmark mu…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

VISTA: Visualization of Token Attribution via Efficient Analysis

2026-04-02 · Syed Ahmed, Bharathi Vokkaliga Ganesh, Jagadish Babu P, Karthick Selvaraj, Praneeth Talluri, Sanket Hingne, Anubhav Kumar, Anushka Yadav, Pratham Kumar Verma, Kiranmayee Janardhan, Mandanna A N

General AI

Understanding how Large Language Models (LLMs) process information from prompts remains a significant challenge. To shed light on this "black box," attention visualization techniques have been developed to capture neuron-level perceptions and interpret how models focus on different parts of input data. However, many ex…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Chart-RL: Policy Optimization Reinforcement Learning for Enhanced Visual Reasoning in Chart Question Answering with Vision Language Models

2026-04-03 · Yunfei Bai, Amit Dhanda, Shekhar Jain

General AI

The recent advancements in Vision Language Models (VLMs) have demonstrated progress toward true intelligence requiring robust reasoning capabilities. Beyond pattern recognition, linguistic reasoning must integrate with visual comprehension, particularly for Chart Question Answering (CQA) tasks involving complex data vi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

AnyUser: Translating Sketched User Intent into Domestic Robots

2026-04-06 · Songyuan Yang, Huibin Tan, Kailun Yang, Wenjing Yang, Shaowu Yang

General AI

We introduce AnyUser, a unified robotic instruction system for intuitive domestic task instruction via free-form sketches on camera images, optionally with language. AnyUser interprets multimodal inputs (sketch, vision, language) as spatial-semantic primitives to generate executable robot actions requiring no prior map…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

ClawBench: Can AI Agents Complete Everyday Online Tasks?

2026-04-09 · Yuxuan Zhang, Yubo Wang, Yipeng Zhu, Penghui Du, Junwen Miao, Xuan Lu, Wendong Xu, Yunzhuo Hao, Songcheng Cai, Xiaochen Wang, Huaisong Zhang, Xian Wu, Yi Lu, Minyi Lei, Kai Zou, Huifeng Yin, Ping Nie, Liang Chen, Dongfu Jiang, Wenhu Chen, Kelsey R. Allen

General AI

AI agents may be able to automate your inbox, but can they automate other routine aspects of your life? Everyday online tasks offer a realistic yet unsolved testbed for evaluating the next generation of AI agents. To this end, we introduce ClawBench, an evaluation framework of 153 simple tasks that people need to accom…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Distorted or Fabricated? A Survey on Hallucination in Video LLMs

2026-04-14 · Yiyang Huang, Yitian Zhang, Yizhou Wang, Mingyuan Zhang, Liang Shi, Huimin Zeng, Yun Fu

General AI

Despite significant progress in video-language modeling, hallucinations remain a persistent challenge in Video Large Language Models (Vid-LLMs), referring to outputs that appear plausible yet contradict the content of the input video. This survey presents a comprehensive analysis of hallucinations in Vid-LLMs and intro…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

One Token Away from Collapse: The Fragility of Instruction-Tuned Helpfulness

2026-04-14 · Erfan Baghaei Potraghloo, Seyedarmin Azizi, Souvik Kundu, Massoud Pedram

General AI

Instruction-tuned large language models produce helpful, structured responses, but how robust is this helpfulness when trivially constrained? We show that simple lexical constraints (banning a single punctuation character or common word) cause instruction-tuned LLMs to collapse their responses, losing 14--48% of compre…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Parallax: Why AI Agents That Think Must Never Act

2026-04-14 · Joel Fokou

General AI

Autonomous AI agents are rapidly transitioning from experimental tools to operational infrastructure, with projections that 80% of enterprise applications will embed AI copilots by the end of 2026. As agents gain the ability to execute real-world actions (reading files, running commands, making network requests, modify…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Scepsy: Serving Agentic Workflows Using Aggregate LLM Pipelines

2026-04-16 · Marcel Wagenländer, Otto White, Britannio Jarrett, Pedro Silvestre, Yanda Tao, Guo Li, Huanzhou Zhu, Llúis Vilanova, Peter Pietzuch

General AI

Agentic workflows carry out complex tasks by orchestrating multiple large language models (LLMs) and tools. Serving such workflows at a target throughput with low latency is challenging because they can be defined using arbitrary agentic frameworks and exhibit unpredictable execution times: execution may branch, fan-ou…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

The Agentification of Scientific Research: A Physicist's Perspective

2026-04-16 · Xiao-Liang Qi

General AI

This article argues that the most important significance of the AI revolution, especially the rise of large language models, lies not simply in automation, but in a fundamental change in how complex information and human know-how are carried, replicated, and shared. From this perspective, AI for Science is especially i…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

MARCH: Multi-Agent Radiology Clinical Hierarchy for CT Report Generation

2026-04-17 · Yi Lin, Yihao Ding, Yonghui Wu, Yifan Peng

General AI

Automated 3D radiology report generation often suffers from clinical hallucinations and a lack of the iterative verification found in human practice. While recent Vision-Language Models (VLMs) have advanced the field, they typically operate as monolithic "black-box" systems without the collaborative oversight character…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

ClawEnvKit: Automatic Environment Generation for Claw-Like Agents

2026-04-20 · Xirui Li, Ming Li, Derry Xu, Wei-Lin Chiang, Ion Stoica, Cho-Jui Hsieh, Tianyi Zhou

General AI

Constructing environments for training and evaluating claw-like agents remains a manual, human-intensive process that does not scale. We argue that what is needed is not just a dataset, but an automated pipeline capable of generating diverse, verified environments on demand. To this end, we introduce ClawEnvKit, an aut…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering

2026-04-20 · Manan Gupta, Dhruv Kumar

General AI

Large language models frequently commit unrecoverable reasoning errors mid-generation: once a wrong step is taken, subsequent tokens compound the mistake rather than correct it. We introduce $\textbf{Latent Phase-Shift Rollback}$ (LPSR): at each generation step, we monitor the residual stream at a critical layer lcrit,…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Multilingual Training and Evaluation Resources for Vision-Language Models

2026-04-20 · Daniela Baiamonte, Elena Fano, Matteo Gabburo, Stefano Simonazzi, Leonardo Rigutini, Andrea Zugarini

General AI

Vision Language Models (VLMs) achieved rapid progress in the recent years. However, despite their growth, VLMs development is heavily grounded on English, leading to two main limitations: (i) the lack of multilingual and multimodal datasets for training, and (ii) the scarcity of comprehensive evaluation benchmarks acro…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning

2026-04-21 · Xianming Li, Zongxi Li, Tsz-fung Andrew Lee, Jing Li, Haoran Xie, Qing Li

General AI

Parameter-efficient fine-tuning (PEFT) reduces the training cost of full-parameter fine-tuning for large language models (LLMs) by training only a small set of task-specific parameters while freezing the pretrained backbone. However, existing approaches, such as Low-Rank Adaptation (LoRA), achieve adaptation by inserti…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Diagnosing CFG Interpretation in LLMs

2026-04-22 · Hanqi Li, Lu Chen, Kai Yu

General AI

As LLMs are increasingly integrated into agentic systems, they must adhere to dynamically defined, machine-interpretable interfaces. We evaluate LLMs as in-context interpreters: given a novel context-free grammar, can LLMs generate syntactically valid, behaviorally functional, and semantically faithful outputs? We intr…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Agentic Artificial Intelligence in Finance: A Comprehensive Survey

2026-04-23 · Irene Aldridge, Jolie An, Riley Burke, Michael Cao, Chia-Yi Chien, Kexin Deng, Ruipeng Deng, Yichen Gao, Olivia Guo, Shunran He, Zheng Li, George Lin, Weihang Lin, Percy Lyu, Alex Ng, Qi Wang, Hanxi Xiao, Dora Xu, Yuanyuan Xue, Sheng Zhang, Sirui Zhang, Yun Zhang, Sirui Zhao, Xiaolong Zhao, Yihan Zhao, Waner Zheng

General AI

The emergence of agentic artificial intelligence (AI) represents a fundamental transformation in financial markets, characterized by autonomous systems capable of reasoning, planning, and adaptive decision-making with minimal human intervention. This comprehensive survey synthesizes recent advances in agentic AI across…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Long-Horizon Manipulation via Trace-Conditioned VLA Planning

2026-04-23 · Isabella Liu, An-Chieh Cheng, Rui Yan, Geng Chen, Ri-Zhao Qiu, Xueyan Zou, Sha Yi, Hongxu Yin, Xiaolong Wang, Sifei Liu

General AI

Long-horizon manipulation remains challenging for vision-language-action (VLA) policies: real tasks are multi-step, progress-dependent, and brittle to compounding execution errors. We present LoHo-Manip, a modular framework that scales short-horizon VLA execution to long-horizon instruction following via a dedicated ta…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Machine Behavior in Relational Moral Dilemmas: Moral Rightness, Predicted Human Behavior, and Model Decisions

2026-04-23 · Jiseon Kim, Jea Kwon, Luiz Felipe Vecchietti, Wenchao Dong, Jaehong Kim, Meeyoung Cha

General AI

Human moral judgment is context-dependent and modulated by interpersonal relationships. As large language models (LLMs) increasingly function as decision-support systems, determining whether they encode these social nuances is critical. We characterize machine behavior using the Whistleblower's Dilemma by varying two e…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models

2026-04-23 · Naheed Rayhan, Sohely Jahan

General AI

Large language models (LLMs) are increasingly integrated into sensitive workflows, raising the stakes for adversarial robustness and safety. This paper introduces Transient Turn Injection(TTI), a new multi-turn attack technique that systematically exploits stateless moderation by distributing adversarial intent across …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

ATRS: Adaptive Trajectory Re-splitting via a Shared Neural Policy for Parallel Optimization

2026-04-24 · Jiajun Yu, Guodong Liu, Li Wang, Pengxiang Zhou, Wentao Liu, Yin He, Chao Xu, Fei Gao, Yanjun Cao

General AI

Parallel trajectory optimization via the Alternating Direction Method of Multipliers (ADMM) has emerged as a scalable approach to long-horizon motion planning. However, existing frameworks typically decompose the problem into parallel subproblems based on a predefined fixed structure. Such structural rigidity often cau…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks

2026-04-24 · Longju Bai, Zhemin Huang, Xingyao Wang, Jiao Sun, Rada Mihalcea, Erik Brynjolfsson, Alex Pentland, Jiaxin Pei

General AI

The wide adoption of AI agents in complex human workflows is driving rapid growth in LLM token consumption. When agents are deployed on tasks that require a significant amount of tokens, three questions naturally arise: (1) Where do AI agents spend the tokens? (2) Which models are more token-efficient? and (3) Can agen…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Long-Context Aware Upcycling: A New Frontier for Hybrid LLM Scaling

2026-04-27 · Parsa Ashrafi Fashi, Utkarsh Saxena, Mehdi Rezagholizadeh, Aref Jafari, Akash Haridas, Mingyu Yang, Vansh Bhatia, Guihong Li, Vikram Appia, Emad Barsoum

General AI

Hybrid sequence models that combine efficient Transformer components with linear sequence modeling blocks are a promising alternative to pure Transformers, but most are still pretrained from scratch and therefore fail to reuse existing Transformer checkpoints. We study upcycling as a practical path to convert pretraine…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

The Price of Agreement: Measuring LLM Sycophancy in Agentic Financial Applications

2026-04-27 · Zhenyu Zhao, Aparna Balagopalan, Adi Agrawal, Dilshoda Yergasheva, Waseem Alshikh, Daniel M. Bikel

General AI

Given the increased use of LLMs in financial systems today, it becomes important to evaluate the safety and robustness of such systems. One failure mode that LLMs frequently display in general domain settings is that of sycophancy. That is, models prioritize agreement with expressed user beliefs over correctness, leadi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Machine Collective Intelligence for Explainable Scientific Discovery

2026-04-30 · Gyoung S. Na, Chanyoung Park

Research Track A · General AI

Deriving governing equations from empirical observations is a longstanding challenge in science. Although artificial intelligence (AI) has demonstrated substantial capabilities in function approximation, the discovery of explainable and extrapolatable equations remains a fundamental limitation of modern AI, posing a ce…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

PhyCo: Learning Controllable Physical Priors for Generative Motion

2026-04-30 · Sriram Narayanan, Ziyu Jiang, Srinivasa Narasimhan, Manmohan Chandraker

General AI

Modern video diffusion models excel at appearance synthesis but still struggle with physical consistency: objects drift, collisions lack realistic rebound, and material responses seldom match their underlying properties. We present PhyCo, a framework that introduces continuous, interpretable, and physically grounded co…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Generating Statistical Charts with Validation-Driven LLM Workflows

2026-05-01 · Pavlin G. Poličar, Andraž Pevcin, Blaž Zupan

General AI

Generating diverse, readable statistical charts from tabular data remains challenging for LLMs, as many failures become apparent after rendering and are not detectable from data or code alone. Existing chart datasets also rarely provide fully aligned artifacts, such as executable code, dataset context, and question-ans…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers

2026-06-04 · Lizhi Yang, Junheng Li, Nehar Poddar, Yiling Hou, Gio Huh, Robert Griffin, Georgia Gkioxari, Aaron Ames

General AI

For a humanoid robot to be deployed in the real world, the choice of command space (i.e., the interface between task planning and whole-body control) is crucial. Existing whole-body controllers typically demand dense kinematic or spatial references that planners struggle to synthesize from task semantics. We instead pr…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Scaffold, Not Vocabulary? A Controlled, Two-Tier, Pre-Registered Study of a Popperian Code-Generation Skill

2026-06-04 · Mehmet Iscan

General AI

Large language models increasingly write, review, and judge code, and a fast-growing practice equips them with prompt 'skills' that ask the model to reason like a scientist. A prominent example tells the model to act as a Popperian falsificationist, and such skills are reported to improve generated code. But these gain…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

TRACE: A Temporal Conditional Estimation for Multimodal Time Series Foundation Models

2026-06-04 · Ziwen Kan, Yishuo Chen, Kecheng Li, Andrew Wen, Xiaomeng Wang, Liwei Wang, Jihao Duan, Song Wang, Hongfang Liu, Tianlong Chen

General AI

Time series foundation models (TS-FMs) aim to learn generalizable temporal representations that can be adapted to a wide range of downstream tasks. In real-world multimodal settings, time series are frequently affected by temporal misalignment and partial modality missingness, where different modalities are observed at…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

You Only Index Once: Cross-Layer Sparse Attention with Shared Routing

2026-06-04 · Yutao Sun, Yanqi Zhang, Li Dong, Jianyong Wang, Furu Wei

General AI

Long-context inference in modern LLMs is increasingly constrained by decoding efficiency, especially in reasoning-heavy settings where models generate long intermediate chains of thought. Existing sparse attention methods often face a practical efficiency-quality trade-off. Structured block sparse methods typically pro…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

PTL-Diffusion: Manifold-Aware Diffusion with Periodic Terminal Laws

2026-06-08 · Danqi Zhuang, Jisui Huang, Xiaoyue Xi, Andrew Kiggins, Xiaojie Wang, Ke Chen, Yue Wu

General AI

Standard diffusion models typically use a single time-homogeneous Gaussian terminal distribution as the reference law for generation. While this choice is analytically convenient and empirically powerful, it provides little explicit structure for data concentrated near low-dimensional manifolds, where different regions…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Flaws in the LLM Automation Narrative

2026-06-09 · George Perrett, Javae Elliott, Jennifer Hill, Marc Scott

General AI

Large Language Models (LLMs) are increasingly described as performing at the level of human experts on knowledge economy tasks. These claims are primarily based on how LLMs perform on benchmarking tasks that measure average performance across standardized datasets. Primary limitations of many benchmarking tasks are tha…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Listen, Look, and Learn: Learning Without Forgetting through SAM-Audio

2026-06-09 · Avi Gupta, Nilotpal Sinha, Vishnu Raj, Sambuddha Saha, Pratik Joshi, Koteswar Rao Jerripothula, Tammam Tillo

General AI

Class-Incremental Learning (CIL) aims to continuously learn new classes without forgetting previously acquired knowledge. While recent CIL advances have spurred significant interest across various modalities, the audio-visual setting remains underexplored. Furthermore, although foundational multimodal models like SAM-A…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Multimodal Brain Tumour Classification Using Feature Fusion

2026-06-09 · Wajih ul Islam, Muhammad Yaqoob, Javed Ali Khan, Volker Steuber

General AI

Clinicians diagnose brain tumors by synthesizing patient symptoms, medical history, and quantitative imaging data from modalities such as MRI and CT scans into a unified clinical judgement. However, most deep learning models rely on MRI/CT images alone, failing to replicate the clinicians multimodal reasoning. We explo…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Next Forcing: Causal World Modeling with Multi-Chunk Prediction

2026-06-09 · Gangwei Xu, Qihang Zhang, Jiaming Zhou, Xing Zhu, Yujun Shen, Xin Yang, Yinghao Xu

General AI

Autoregressive video generation has emerged as a powerful paradigm for World Action Models (WAMs). However, existing approaches suffer from slow training convergence and limited converged accuracy, particularly at high frame rates, as the training supervision is confined to the current chunk without explicit signals ab…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

OpenPCC: Open and Confidential LLM Serving on Commodity TEEs

2026-06-09 · Haoling Zhou, Shixuan Zhao, Chao Wang, Zhiqiang Lin

General AI

Generative AI applications such as personal AI agents, image generators, and chat assistants offer advanced capabilities to improve user experience. Behind the scenes, Large Language Models (LLMs) that power these services require a massive amount of computation and are usually deployed in the cloud, available as APIs,…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Bridging the Usability Gap: Lessons from Interpreting Studies for Machine Interpreting Design

2026-06-14 · Claudio Fantinuoli

General AI

Machine interpreting (MI), the live, real-time branch of speech translation, has achieved remarkable progress on standard benchmarks, with some systems approaching human parity on textual fidelity. Yet the user experience remains far inferior to interpreter-mediated communication, revealing what we term the \emph{accur…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Agent trajectories as programs: fingerprinting and programming coding-agent behavior

2026-06-15 · Hamidah Oderinwale

General AI

Benchmark scores tell you what an agent got right; they do not tell you how it got there. In this work, we introduce methods for comparing agents procedurally in different contexts, where the model, tasks, and approaches vary. We compare ten agents and find that they are identifiable by their behavioral habits, which w…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Bayesian Inference and Decision Audits for Public Archives of Frontier AI Evaluations

2026-06-15 · Yanan Long

General AI

Public AI evaluations are often read as terminal leaderboards, yet the underlying evidence is a selective time series shaped by reporting rules, benchmark revisions, and missingness. Repeated public archives for LiveBench and Open LLM Leaderboard v2 serve as the primary longitudinal record; LMArena provides a preferenc…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Geometric Action Model for Robot Policy Learning

2026-06-15 · Jisang Han, Seonghu Jeon, Jaewoo Jung, René Zurbrügg, Honggyu An, Tifanny Portela, Marco Hutter, Marc Pollefeys, Seungryong Kim, Sunghwan Hong

General AI

Generalist robot policies must follow user instructions while reasoning about how objects, cameras, and robot actions interact in the 3D physical world. Recent vision-language-action models (VLAs) and video world-action models (WAMs) inherit strong semantic or temporal priors from large-scale foundation models, but the…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

DRFLOW: A Deep Research Benchmark for Personalized Workflow Prediction

2026-06-16 · Md Tawkat Islam Khondaker, Raymond Li, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Issam H. Laradji

General AI

Deep research (DR) systems are increasingly used for complex information-seeking tasks, but existing works mainly focus on generating reports and summaries. In contrast, many enterprise tasks instead require an agent to identify concrete workflows which is a sequence of action-steps. For example, rather than summarizin…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Invertible Neural Network Adapter for One-Step Flow Matching in Robot Manipulation

2026-06-17 · Yu Zhang, Kangyi Ji, Yongxiang Zou, Rongtao Xu, Feng Zheng, Long Cheng

General AI

This paper presents an invertible neural network adapter for general robotic manipulation, designed to generate precise high-dimensional actions conditioned on multimodal observations, including visual, linguistic, and proprioceptive inputs, through a one-step denoising process. Built upon a flow-matching formulation, …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

Structured Inference with Large Language Gibbs

2026-06-17 · Sanghyeok Choi, Henry Gouk, Esmeralda S. Whitammer

General AI

The knowledge encoded in large language models (LLMs) can serve as a substrate for structured reasoning over variables describing a complex world, but accessing this knowledge in a probabilistically coherent manner poses a difficult inference problem. We propose Large Language Gibbs, a scheme for structured probabilist…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.3

X+Slides: Benchmarking Audience-Conditioned Slide Generation

2026-06-17 · Haodong Chen, Xuanhe Zhou, Wei Zhou, Xinyue Shao, Yanbing Zhu, Bo Wang, Jiawei Hong, Anya Jia, Fan Wu

General AI

Automatically generating slide decks from source documents is an important application of large language models (LLMs). Existing benchmarks primarily assess slide completeness and technical depth, while overlooking the target audience as a critical real-world factor. For instance, specialists demand rigorous proofs, wh…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.2

DiffusionBench: On Holistic Evaluation of Diffusion Transformers

2026-06-23 · Xingjian Leng, Jaskirat Singh, Zhanhao Liang, Ethan Smith, Martin Bell, Aninda Saha, Yuhui Yuan, Liang Zheng

General AI

Diffusion transformer (DiT) research on image generation has converged to a single evaluation setup: class-conditional generation on ImageNet. While methods improve the FID and related metrics, it is increasingly unclear whether they reflect real progress in generative modeling. The natural alternative, i.e., text-to-i…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.2

FLAT: Feedforward Latent Triangle Splatting for Geometrically Accurate Scene Generation

2026-06-23 · Orest Kupyn, Goutam Bhat, Philipp Henzler, Fabian Manhardt, Christian Rupprecht, Federico Tombari

General AI

Generating explorable 3D scenes from a single image requires strong generative priors and accurate geometric representations suitable for downstream use. Current video diffusion models offer high-quality generation and implicitly encode multi-view geometric structure in latent space. However, existing feedforward laten…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.0

Safe and Scalable Web Agent Learning via Recreated Websites

2026-03-11 · Hyungjoo Chae, Jungsoo Park, Alan Ritter

Research Track B · General AI

Training autonomous web agents is fundamentally limited by the environments they learn from: real-world websites are unsafe to explore, hard to reset, and rarely provide verifiable feedback. We propose VeriEnv, a framework that treats language models as environment creators, automatically cloning real-world websites in…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 10.0

STEM Agent: A Self-Adapting, Tool-Enabled, Extensible Architecture for Multi-Protocol AI Agent Systems

2026-03-22 · Alfred Shen, Aaron Shen

Research Track A · General AI

Current AI agent frameworks commit early to a single interaction protocol, a fixed tool integration strategy, and static user models, limiting their deployment across diverse interaction paradigms. To address these constraints, we introduce STEM Agent (Self-adapting, Tool-enabled, Extensible, Multi-agent), a modular ar…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 10.0

Learn by Surprise, Commit by Proof

2026-04-02 · Kang-Sin Choi

Research Track A · General AI

We propose LSCP, a self-gated post-training framework for autonomous knowledge acquisition: learning only what a model does not already know, verified against what it does know, at a strength proportional to conviction, with no external oracle. When a passage produces anomalously high per-token loss, LSCP flags it, gen…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.0

Fast Heterogeneous Serving: Scalable Mixed-Scale LLM Allocation for SLO-Constrained Inference

2026-04-08 · Jiaming Cheng, Duong Tung Nguyen

Research Track A · General AI

Deploying large language model (LLM) inference at scale requires jointly selecting base models, provisioning heterogeneous GPUs, configuring parallelism, and distributing workloads under tight latency, accuracy, and budget constraints. Exact mixed-integer linear programming (MILP) approaches guarantee optimality but sc…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.0

A Wasserstein Geometric Framework for Hebbian Plasticity

2026-04-17 · Ulrich Tan

Research Track A · General AI

We introduce the Tan-HWG framework (Hebbian-Wasserstein-Geometry), a geometric theory of Hebbian plasticity in which memory states are modeled as probability measures evolving through Wasserstein minimizing movements. Hebbian learning rules are formalized as Hebbian energies satisfying a sequential stability condition,…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.0

CTM-AI: A Blueprint for General AI Inspired by a Model of Consciousness

2026-04-30 · Haofei Yu, Yining Zhao, Lenore Blum, Manuel Blum, Paul Pu Liang

Research Track B · General AI

Despite remarkable advances, today's AI systems remain narrow in scope, falling short of the flexible, adaptive, and multisensory intelligence that characterizes human capabilities. This gap has fueled longstanding debates about whether AI might one day achieve human-like generality or even consciousness, and whether t…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 10.0

A Compound AI Agent for Conversational Grant Discovery

2026-05-04 · Zhisheng Tang, Mayank Kejriwal

Research Track B · General AI

Research funding discovery remains fundamentally fragmented: researchers navigate disparate agency portals (e.g., in the United States, NSF, NIH, DARPA, Grants.gov, and many others) with heterogeneous interfaces, search capabilities, and data schemas. We present a compound AI system that unifies this landscape through …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.0

PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors

2026-05-07 · Xinmiao Huang, Jinwei Hu, Rajarshi Roy, Changshun Wu, Yi Dong, Xiaowei Huang

Research Track B · General AI

Large language model (LLM) agents now execute long, tool-using tasks where final outcome checks can arrive too late for intervention. Online warning requires lightweight prefix monitors over heterogeneous traces, but hand-authored event schemas are brittle and deployment-time LLM judging is costly. We introduce PrefixG…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.0

WebTrap: Stealthy Mid-Task Hijacking of Browser Agents During Navigation

2026-05-08 · Zhichao Liu, Wenbo Pan, Haining Yu, Ge Gao, Tianqing Zhu, Xiaohua Jia

Research Track B · General AI

Browser agents are increasingly deployed in long-horizon tasks, which require executing extended action chains to accomplish user goals. However, this prolonged execution process provides attackers with more opportunities to inject malicious instructions. Existing prompt injection attacks against browser agents expose …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.0

GLiNER-Relex: A Unified Framework for Joint Named Entity Recognition and Relation Extraction

2026-05-11 · Ihor Stepanov, Oleksandr Lukashov, Mykhailo Shtopko, Vivek Kalyanarangan

General AI

Joint named entity recognition (NER) and relation extraction (RE) is a fundamental task in natural language processing for constructing knowledge graphs from unstructured text. While recent approaches treat NER and RE as separate tasks requiring distinct models, we introduce GLiNER-Relex, a unified architecture that ex…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.0

SimPersona: Learning Discrete Buyer Personas from Raw Clickstreams for Grounded E-Commerce Agents

2026-05-14 · Zahra Zanjani Foumani, Alberto Castelo, Shuang Xie, Ted Chaiwachirasak, Han Li, Lingyun Wang

Research Track B · General AI

LLM-based web agents can navigate live storefronts, yet they often collapse to a single "average buyer" policy, failing to capture the heterogeneous and distributional nature of real buyer populations. Existing personalization methods rely on hand-crafted prompt-based personas that are brittle, difficult to scale, cont…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 10.0

A Query Engine for the Agents

2026-05-27 · Kenny Daniel

Research Track B · General AI

The fastest-growing data in production today is unstructured text: agent traces, chat logs, reasoning chains, model outputs. People want to analyze it, and the questions worth asking ("show me where the agent got confused") cannot be answered by SQL alone, since text is not queryable without a model in the query path. …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.0

CoHyDE: Iterative Co-Training of LLM Rewriter & Dense Encoder for Tool Retrieval

2026-05-28 · Vaishali Senthil, Ashutosh Hathidara, Sebastian Schreiber

General AI

Tool retrieval over large API catalogs is a core bottleneck for LLM agents: user queries arrive in colloquial, often underspecified language, while the catalog uses technical API vocabulary that no fixed encoder can bridge on its own. The two dominant training approaches, contrastive encoder fine-tuning and HyDE-style …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.0

Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?

2026-05-28 · Yue Zhang, Zun Wang, Han Lin, Yonatan Bitton, Idan Szpektor, Mohit Bansal

General AI

Spatial reasoning is a fundamental capability for vision-language models (VLMs) deployed in real-world environments. However, visual observations are inherently limited representations of a 3D world: occlusion can render objects invisible, and perspective can make geometric properties misleading. Despite this, existing…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.0

Why Far Looks Up: Probing Spatial Representation in Vision-Language Models

2026-05-28 · Cheolhong Min, Jaeyun Jung, Daeun Lee, Hyeonseong Jeon, Yu Su, Jonathan Tremblay, Chan Hee Song, Jaesik Park

General AI

Vision-language models (VLMs) achieve strong performance on spatial reasoning benchmarks, yet it remains unclear whether this reflects structured 3D understanding or reliance on statistical shortcuts in natural images. We introduce a representation-level analysis framework that constructs minimal contrastive pairs to m…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 10.0

TailLoR: Protecting Principal Components in Parameter-Efficient Continual Learning

2026-06-04 · Marius Dragoi, Ioana Pintilie, Alexandra Dragomir, Antonio Barbalau, Florin Brad

Research Track A

Parameter-efficient finetuning methods based on spectral decomposition have enabled progress in Continual Learning. In this paper we introduce TailLoR, which utilizes the singular bases U and V of the pre-trained weights as a fixed reference frame to learn a low-rank update applied to the singular value matrix. A soft …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 10.0

EventVLA: Event-Driven Visual Evidence Memory for Long-Horizon Vision-Language-Action Policies

2026-06-18 · Ganlin Yang, Zhangzheng Tu, Yuqiang Yang, Sitong Mao, Junyi Dong, Tianxing Chen, Jiaqi Peng, Jing Xiong, Jiafei Cao, Jifeng Dai, Wengang Zhou, Yao Mu, Tai Wang

General AI

Memory remains a critical bottleneck for long-horizon robotic manipulation, as standard Vision-Language-Action (VLA) policies often fail when task-relevant cues become occluded or unobservable over time. While existing memory-augmented methods utilize historical context, they either suffer from severe information bottl…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 10.0

UnityShots: Memory-Driven Multi-Shot Audio-Video Generation with Boundary-Aware Gating

2026-06-19 · Jiehui Huang, Yuechen Zhang, Bin Xia, Jiahao Wang, Xu He, Zhenchao Tang, Meng Chu, Xin Tao, Pengfei Wan, Jiaya Jia

General AI

Generating a coherent multi-shot video requires structured cross-shot memory. Subject appearance, scene context, and speaker identity must persist across cuts. Existing approaches either train end-to-end over fixed-length sequences and cannot scale, generate shot-by-shot with memory banks that grow linearly, or orchest…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 10.0

Transferability for General Reasoning: An Automated Curriculum for Multi-Domain RLVR

2026-06-27 · Yongjin Yang, Jiarui Liu, Yinghui He, Lechen Zhang, Bernhard Schölkopf, Zhijing Jin

General AI

Reinforcement learning with verifiable rewards (RLVR) has been extended from single-domain training to multi-domain reasoning suites spanning mathematics, programming, and science. However, the training curriculum (how often each domain is sampled) is typically fixed or hand-tuned, even though reasoning skills transfer…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 10.0

DreamForge-World 0.1 Preview: A Low-Compute Real-Time Controllable World Model

2026-06-29 · Daniyel Ayupov, Artur Markov-Tsoy

General AI

We present DreamForge-World 0.1 Preview, a preview foundational world model for real-time interactive world simulation. The system adapts the LongLive 1 autoregressive video stack, itself derived from Wan2.1-T2V-1.3B, with a residual action pathway inspired by the Matrix-Game family. DreamForge-World 0.1 Preview focuse…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 10.0

The Surprising Effectiveness of Video Diffusion Models for Hand Motion Reconstruction

2026-06-29 · Yuxi Wang, Chengkai Jin, Yufei Liu, Wenqi Ouyang, Tianyi Wei, Zhiwei Zeng, Siyuan Huang, Zhiqi Shen, Xingang Pan

General AI

4D hand motion reconstruction from egocentric video is bottlenecked by clear limitations of existing methods: image-based pipelines depend on a detector that fails under heavy occlusion, while video-based methods rely on temporal modules learned only from scarce hand-pose annotations, a narrow signal insufficient to mo…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.8

CellxPert: Inference-Time MCMC Steering of a Multi-Omics Single-Cell Foundation Model for In-Silico Perturbation

2026-04-30 · Andac Demir, Erik W. Anderson, Jeremy L. Jenkins, Srayanta Mukherjee

General AI

In this work, we introduce CellxPert, a scalable multimodal foundation model that unifies single-cell and spatial multi-omics within a common representation space. CellxPert jointly encodes transcriptomic (scRNA-seq), chromatin-accessibility (ATAC-seq), and surface-proteomic (CITE-seq) measurements, while directly inco…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.8

GIFT: Guided Fine-Tuning and Transfer for Enhancing Instruction-Tuned Language Models

2026-05-02 · Zhiwen Ruan, Yichao Du, Jianjie Zheng, Longyue Wang, Yun Chen, Peng Li, Jinsong Su, Yang Liu, Guanhua Chen

General AI

A promising paradigm for adapting instruction-tuned language models is to learn task-specific updates on a pretrained base model and subsequently merge them into the instruction-tuned model. However, existing approaches typically treat the instruction-tuned model as a passive target that is only involved at the final m…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.8

Flexi-LoRA with Input-Adaptive Ranks: Efficient Finetuning for Speech and Reasoning Tasks

2026-05-03 · Zongqian Li, Yixuan Su, Han Zhou, Zihao Fu, Nigel Collier

General AI

Parameter-efficient fine-tuning methods like Low-Rank Adaptation (LoRA) have become essential for deploying large language models, yet their static parameter allocation remains suboptimal for inputs of varying complexity. We present Flexi-LoRA, a novel framework that dynamically adjusts LoRA ranks based on input comple…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.8

TrajRAG: Retrieving Geometric-Semantic Experience for Zero-Shot Object Navigation

2026-05-03 · Yiyao Wang, Sixian Zhang, Keming Zhang, Xinhang Song, Songjie Du, Shuqiang Jiang

General AI

Existing zero-shot Object Goal Navigation (ObjectNav) methods often exploit commonsense knowledge from large language or vision-language models to guide navigation. However, such knowledge arises from internet-scale text rather than embodied 3D experience, and episodic observations collected during navigation are typic…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.8

AlbumFill: Album-Guided Reasoning and Retrieval for Personalized Image Completion

2026-05-04 · Yu-Ju Tsai, Brian Price, Qing Liu, Luis Figueroa, Daniil Pakhomov, Zhihong Ding, Scott Cohen, Ming-Hsuan Yang

General AI

Personalized image completion aims to restore occluded regions in personal photos while preserving identity and appearance. Existing methods either rely on generic inpainting models that often fail to maintain identity consistency, or assume that suitable reference images are explicitly provided. In practice, suitable …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.8

Bolek: A Multimodal Language Model for Molecular Reasoning

2026-05-04 · Frederic Grabowski, Jacek Szczerbiński, Maciej Jaśkowski, Kalina Jasińska-Kobus, Paweł Dąbrowski-Tumański, Tomasz Jetka, Bartosz Topolski

General AI

Molecular property models increasingly support high-stakes drug-discovery decisions, but their outputs are often difficult to audit: classical predictors return scores without rationale, while language models can produce fluent explanations weakly grounded in the input molecule. We introduce Bolek, a compact multimodal…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.8

DynoSLAM: Dynamic SLAM with Generative Graph Neural Networks for Real-World Social Navigation

2026-05-04 · Danil Tokhchukov, Veronika Morozova, Gonzalo Ferrer

General AI

Traditional Simultaneous Localization and Mapping (SLAM) algorithms rely heavily on the static environment assumption, which severely limits their applicability in real-world spaces populated by moving entities, such as pedestrians. In this work, we propose DynoSLAM, a tightly-coupled Dynamic GraphSLAM architecture tha…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.8

AI Co-Mathematician: Accelerating Mathematicians with Agentic AI

2026-05-07 · Daniel Zheng, Ingrid von Glehn, Yori Zwols, Iuliya Beloshapka, Lars Buesing, Daniel M. Roy, Martin Wattenberg, Bogdan Georgiev, Tatiana Schmidt, Andrew Cowie, Fernanda Viegas, Dimitri Kanevsky, Vineet Kahlon, Hartmut Maennel, Sophia Alj, George Holland, Alex Davies, Pushmeet Kohli

General AI

We introduce the AI co-mathematician, a workbench for mathematicians to interactively leverage AI agents to pursue open-ended research. The AI co-mathematician is optimized to provide holistic support for the exploratory and iterative reality of mathematical workflows, including ideation, literature search, computation…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.8

Recursive Agent Optimization

2026-05-07 · Apurva Gandhi, Satyaki Chakraborty, Xiangjun Wang, Aviral Kumar, Graham Neubig

General AI

We introduce Recursive Agent Optimization (RAO), a reinforcement learning approach for training recursive agents: agents that can spawn and delegate sub-tasks to new instantiations of themselves recursively. Recursive agents implement an inference-time scaling algorithm that naturally allows agents to scale to longer c…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.8

Controllability in preference-conditioned multi-objective reinforcement learning

2026-05-11 · Pau de las Heras Molins, Beyazit Yalcinkaya, Lasse Peters, David Fridovich-Keil, Georgios Bakirtzis

General AI

Multi-objective reinforcement learning (MORL) allows a user to express preference over outcomes in terms of the relative importance of the objectives, but standard metrics cannot capture whether changes in preference reliably change the agent's behavior in the intended way, a property termed controllability. As a resul…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.8

Covering Human Action Space for Computer Use: Data Synthesis and Benchmark

2026-05-12 · Miaosen Zhang, Xiaohan Zhao, Zhihong Tan, Zhou Huoshen, Yijia Fan, Yifan Yang, Kai Qiu, Bei Liu, Justin Wagle, Chenzhong Yin, Mingxi Cheng, Ji Li, Qi Dai, Chong Luo, Xu Yang, Xin Geng, Baining Guo

Research Track B · General AI

Computer-use agents (CUAs) automate on-screen work, as illustrated by GPT-5.4 and Claude. Yet their reliability on complex, low-frequency interactions is still poor, limiting user trust. Our analysis of failure cases from advanced models suggests a long-tail pattern in GUI operations, where a relatively small fraction …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.8

Solve the Loop: Attractor Models for Language and Reasoning

2026-05-12 · Jacob Fein-Ashley, Paria Rashidinejad

General AI

Looped Transformers offer a promising alternative to purely feed-forward computation by iteratively refining latent representations, improving language modeling and reasoning. Yet recurrent architectures remain unstable to train, costly to optimize and deploy, and constrained to small, fixed recurrence depths. We intro…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.8

Self-Evolving Multi-Agent Systems via Decentralized Memory

2026-05-21 · Guangya Hao, Yunbo Long, Zhuokai Zhao

General AI

Self-evolving multi-agent systems (MAS) have emerged as a promising route to LLM agents that continually improve from experience, with persistent memory at their foundation. However, existing designs almost exclusively adopt a centralized repository shared across agents, incurring communication and coordination overhea…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.8

Agentic Proving for Program Verification

2026-05-22 · Alessandro Sosso, Akhil Arora, Bas Spitters

General AI

Agentic systems have recently emerged as state-of-the-art approaches for automated theorem proving in formal mathematics. To assess how far these capabilities extend to program verification, we evaluate Claude Code in an agentic proving framework on CLEVER, a Lean 4 benchmark for verifiable code generation. Our results…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.8

Leveraging Foundation Models for Causal Generative Modeling

2026-05-22 · Aneesh Komanduri, Xintao Wu

General AI

Causal generative modeling is essential for developing reliable and transparent AI systems capable of counterfactual reasoning. While existing approaches focus on integrating causal constraints during the training of generative models, they often lack a unified framework to leverage the zero-shot reasoning capabilities…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.8

SPACENUM: Revisiting Spatial Numerical Understanding in VLMs

2026-05-22 · Jianshu Zhang, Yijiang Li, Huifeixin Chen, Haoran Lu, Letian Xue, Bingyang Wang, Han Liu

General AI

Vision-Language Models (VLMs) are increasingly deployed in embodied environments, where they need produce numerical outputs such as action magnitudes and spatial coordinates. Although these numbers appear meaningful, it remains unclear whether these numerical outputs are genuinely grounded in spatial perception. Theref…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.8

Hierarchical Synthetic Tabular Data Generation: A Hybrid Top-Down and Bottom-Up Framework

2026-05-27 · Junfeng Nie, Alvin Jin, Xiaohui Chen

General AI

Existing approaches for synthetic tabular data generation are based on either purely generative models or LLMs, both of which struggle with data heterogeneity, logical consistency, rare-event coverage, and robustness in low-data regimes. In this paper, we propose a hierarchical hybrid top-down and bottom-up (H-TDBU) fr…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.8

Benchmarking Single-Factor Physical Video-to-Audio Generation

2026-05-28 · Tingle Li, Siddharth Gururani, Kevin J. Shih, Gantavya Bhatt, Sang-gil Lee, Zhifeng Kong, Arushi Goel, Gopala Anumanchipalli, Ming-Yu Liu

General AI

Generative video-to-audio (V2A) models produce highly plausible soundtracks, but it remains unclear whether they capture the underlying physical processes. Existing evaluations emphasize perceptual realism and overlook physical correctness under controlled interventions. In this paper, we introduce FlatSounds, a benchm…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.8

VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion

2026-05-28 · Hidir Yesiltepe, Jiazhen Hu, Tuna Han Salih Meral, Adil Kaan Akan, Kaan Oktay, Hoda Eldardiry, Pinar Yanardag

General AI

Long-rollout causal video diffusion has converged on a fixed-size sliding-window KV cache, with recent progress innovating within this layout by changing which tokens occupy the window or how their positions are encoded. The per-head KV layout itself, a dominant contributor to streaming memory and latency, has been mos…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.8

When, why, and how do diffusion posterior samplers fail? A finite-sample lens

2026-05-28 · Benjamin A. Burns, Sara Fridovich-Keil

General AI

Diffusion models have excellent capacity to model complex distributions of natural data, which has made them a popular and effective choice for posterior sampling in imaging inverse problems. Existing methods can incorporate any measurement model at inference time but must use an inexact approximation for the likelihoo…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.8

If LLMs Have Human-Like Attributes, Then So Does Age of Empires II

2026-05-29 · Adrian de Wynter

General AI

Much research has been carried out on large language models (LLMs) and LLM-powered agentic workflows. However, many works within the field state emergence of, ascribe to, or assume, generalised anthropomorphic attributes to them (e.g., morality or understanding of natural language). Our goal is not to argue in favour o…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.8

SOCO: Benchmarking Semantic Object Correspondence in Vision Foundation Models

2026-05-29 · Olaf Dünkel, Basavaraj Sunagad, Haoran Wang, David T. Hoffmann, Christian Theobalt, Adam Kortylewski

General AI

Measuring structured object understanding in vision foundation models remains challenging due to inconsistent evaluation protocols and limited part-level supervision. Semantic correspondence (SC) evaluates this capability by testing whether object parts can be matched across instances and categories under large variati…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.8

AI Premium

2026-06-29 · Nicola Borri, Yukun Liu, Aleh Tsyvinski

General AI

Using 380 trillion tokens of realized AI consumption across more than four hundred large language models from the licensed proprietary OpenRouter dataset covering approximately 2 percent of current global monthly AI token consumption, we analyze how AI affects firms, markets, and workers. Leveraging the unprecedented s…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.8

C$^{2}$R: Cross-sample Consistency Regularization Mitigates Feature Splitting and Absorption in Sparse Autoencoders

2026-06-29 · Haoran Jin, Xiting Wang, Shijie Ren, Hong Xie, Defu Lian

General AI

Sparse Autoencoders (SAEs) are widely used to interpret large language models by decomposing activations into sparse, human-understandable features, but scaling to large dictionaries exposes fundamental challenges. Systematic studies reveal pervasive feature splitting that fragments coherent concepts into non-atomic la…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.8

Learning from Reliable Latent Prompts for Visual Recognition with Missing Modalities

2026-06-29 · Taixi Chen, Nancy Guo

General AI

Large-scale multimodal models (LMMs) have achieved superior performance in visual recognition by synergizing information across diverse, massive-scale paired modalities. In real-world scenarios, however, missing-modality inputs are ubiquitous, causing models optimized for modality-complete data to exhibit precipitous p…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.8

Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent

2026-06-29 · Lei Bai, Zongsheng Cao, Yang Chen, Zhiyao Cui, Shangheng Du, Yue Fan, Shiyang Feng, Zijie Guo, Haonan He, Liang He, Xiaohan He, Shuyue Hu, Yusong Hu, Songtao Huang, Yichen Jiang, Hao Li, Xin Li, Dahua Lin, Weihao Lin, Fenghua Ling, Dongrui Liu, Zhuo Liu, Runmin Ma, Chunjiang Mu, Haoyang Peng, Tianshuo Peng, Jinxin Shi, Luohe Shi, Boyuan Sun, Zelin Tan, Shengji Tang, Qianyi Wang, Yiming Wu, Yi Xie, Xiangchao Yan, Jingqi Ye, Peng Ye, Fangchen Yu, Jiakang Yuan, Bihao Zhan, Bo Zhang, Chen Zhang, Shufei Zhang, Shuaiyu Zhang, Wenlong Zhang, Yiqun Zhang, Junpeng Zhao, Zhijie Zhong, Bowen Zhou, Yuhao Zhou

General AI

We introduce Agents-A1, a 35B Mixture-of-Experts Agentic Model that reaches trillion-parameter-level performance by scaling the agent horizon. We investigate agent-horizon scaling from two perspectives: scaling long-horizon trajectories and scaling heterogeneous agent abilities. To support this goal, we build a long-ho…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 9.8

Discrete Diffusion Language Models for Interactive Radiology Report Drafting

2026-07-01 · Max Van Puyvelde, Halil Ibrahim Gulluk, Wim Van Criekinge, Olivier Gevaert

General AI

Diffusion language models, which generate text by denoising a token canvas bidirectionally instead of emitting tokens left to right, have become competitive with autoregressive (AR) generation. Medical foundation models, however, remain almost entirely autoregressive. We adapt a mixture-of-experts diffusion language mo…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 9.8

SkillCoach: Self-Evolving Rubrics for Evaluating and Enhancing Agentic Skill-Use

2026-07-02 · Jiayin Zhu, Kelong Mao, Yudong Guo, Dengbo He, Sulong Xu, Simiu Gu, Yutao Yue

General AI

Skills are becoming a reusable operational layer for LLM agents, encoding SOPs, domain rules, tool workflows, scripts, and validation routines. In realistic skill repositories, overlapping skills make reliable skill-use difficult. Final verifier success is too coarse for both evaluation and training, since an agent may…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.6

Next-Generation Agentic Reinforcement Learning Systems Enable Self-Evolving Agents

2026-07-01 · Ran Yan, Wei Fu, Jiale Li, Shusheng Xu, Zhiyu Mei, Jiaxuan Gao, Jiarui Zhang, Wentai Zhang, Hao Dai, Xujie Shen, Chuyi He, Zhen Pu, Jun Mei, Zhiyao Lin, Haitao Wang, Zhiqiang Ding, Jiawei Zhang, Huaijie Wang, Ruida Xu, Honghua Dong, Youhe Jiang, Yi Wu, Tongkai Yang, Binhang Yuan

General AI

LLM agents are rapidly being deployed in production, including coding assistants, customer-support chatbots, and scientific research assistants, yet they remain fundamentally static in enterprise deployment. The LLM weights, system prompts, tool repertoires, and in-context harnesses are frozen at deployment time, and a…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.6

Language Models as Measurement Apparatus for Culture

2026-07-02 · Kent K. Chang

General AI

Language models are increasingly used to quantify cultural phenomena, but what makes such measurement distinctively cultural? This paper argues that NLP work on culture is a material-discursive practice: the apparatus -- model, data, annotation, evaluation -- participates in constituting the cultural reality it measure…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.6

Online Safety Monitoring for LLMs

2026-07-02 · Mona Schirmer, Metod Jazbec, Alexander Timans, Christian Naesseth, Maja Waldron, Eric Nalisnick

General AI

Despite alignment training, LLMs remain prone to generating unsafe outputs at deployment time. Monitoring outputs online and raising an alarm when safety can no longer be assumed is therefore critical. We study a simple real-time monitor that turns a verifier signal from an external model into an alarm decision by thre…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.5

In-Browser Agents for Search Assistance

2026-01-14 · Saber Zerhoudi, Michael Granitzer

Research Track B · General AI

A fundamental tension exists between the demand for sophisticated AI assistance in web search and the need for user data privacy. Current centralized models require users to transmit sensitive browsing data to external services, which limits user control. In this paper, we present a browser extension that provides a vi…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 9.5

SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

2026-02-10 · Talor Abramovich, Maor Ashkenazi, Carl, Putterman, Benjamin Chislett, Tiyasa Mitra, Bita Darvish Rouhani, Ran Zilberstein, Yonatan Geifman

General AI

Speculative Decoding (SD) has emerged as a critical technique for accelerating Large Language Model (LLM) inference. Unlike deterministic system optimizations, SD performance is inherently data-dependent, meaning that diverse and representative workloads are essential for accurately measuring its effectiveness. Existin…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.5

RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots

2026-03-04 · Soroush Nasiriany, Sepehr Nasiriany, Abhiram Maddukuri, Yuke Zhu

Research Track A · General AI

Recent advances in robot learning have accelerated progress toward generalist robots that can perform everyday tasks in human environments. Yet it remains difficult to gauge how close we are to this vision. The field lacks a reproducible, large-scale benchmark for systematic evaluation. To fill this gap, we present Rob…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 9.5

Pixel-level Scene Understanding in One Token: Visual States Need What-is-Where Composition

2026-03-14 · Seokmin Lee, Yunghee Lee, Byeonghyun Pak, Byeongju Woo

General AI

For robotic agents operating in dynamic environments, learning visual state representations from streaming video observations is essential for sequential decision making. Recent self-supervised learning methods have shown strong transferability across vision tasks, but they do not explicitly address what a good visual …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 9.5

Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models

2026-03-15 · Lok-Lam Ieong, Chia-Chien Chen, Chih-Kai Yang, Yu-Han Huang, An-Yu Cheng, Hung-yi Lee

General AI

Chain-of-thought (CoT) prompting has been extended to large audio-language models (LALMs) to elicit reasoning, yet enhancing its effectiveness without training remains challenging. We study inference-time model steering as a training-free approach to improve LALM reasoning. We introduce three strategies using diverse i…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 9.5

How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation

2026-03-19 · Ke-Han Lu, Szu-Wei Fu, Chao-Han Huck Yang, Zhehuai Chen, Sung-Feng Huang, Chih-Kai Yang, Yi-Cheng Lin, Chi-Yuan Hsiao, Wenze Ren, En-Pei Hu, Yu-Han Huang, An-Yu Cheng, Cheng-Han Chiang, Yu Tsao, Yu-Chiang Frank Wang, Hung-yi Lee

General AI

Large language models (LLMs) have been widely used as knowledge backbones of Large Audio Language Models (LALMs), yet how much auditory knowledge they encode through text-only pre-training and how this affects downstream performance remains unclear. We study this gap by comparing different LLMs under two text-only and …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.5

AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling

2026-03-22 · Liang Ding

Research Track B · General AI

LLM agents fail on the majority of real-world tasks -- GPT-4o succeeds on fewer than 15% of WebArena navigation tasks and below 55% pass@1 on ToolBench (Zhou et al., 2024; Qin et al., 2024) -- yet every failed trajectory is routinely discarded, wasting the dominant source of collected experience. We introduce AgentHER,…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.5

IntentWeave: A Progressive Entry Ladder for Multi-Surface Browser Agents in Cloud Portals

2026-03-24 · Wanying Mo, Jijia Lai, Xiaoming Wang

Research Track B · General AI

Browser agents built on LLMs can act in web interfaces, yet most remain confined to a single chat surface (e.g., a sidebar). This mismatch with real browsing can increase context-switching and reduce user control. We introduce \textbf{IntentWeave}, a design space of ten spatial paradigms for embedding agentic assistanc…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.5

DIET: Learning to Distill Dataset Continually for Recommender Systems

2026-03-26 · Jiaqing Zhang, Hao Wang, Mingjia Yin, Bo Chen, Qinglin Jia, Rui Zhou, Ruiming Tang, ChaoYi Ma, Enhong Chen

Research Track A · General AI

Modern deep recommender models are trained under a continual learning paradigm, relying on massive and continuously growing streaming behavioral logs. In large-scale platforms, retraining models on full historical data for architecture comparison or iteration is prohibitively expensive, severely slowing down model deve…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 9.5

VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward

2026-03-27 · Zhaochong An, Orest Kupyn, Théo Uscidda, Andrea Colaco, Karan Ahuja, Serge Belongie, Mar Gonzalez-Franco, Marta Tintore Gazulla

General AI

Large-scale video diffusion models achieve impressive visual quality, yet often fail to preserve geometric consistency. Prior approaches improve consistency either by augmenting the generator with additional modules or applying geometry-aware alignment. However, architectural modifications can compromise the generaliza…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

2026-03-30 · Zhang Li, Zhibo Lin, Qiang Liu, Ziyang Zhang, Shuo Zhang, Zidun Guo, Jiajun Song, Jiarui Zhang, Xiang Bai, Yuliang Liu

General AI

We introduce Multilingual Document Parsing Benchmark, the first benchmark for multilingual digital and photographed document parsing. Document parsing has made remarkable strides, yet almost exclusively on clean, digital, well-formatted pages in a handful of dominant languages. No systematic benchmark exists to evaluat…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

FlowPIE: Test-Time Scientific Idea Evolution with Flow-Guided Literature Exploration

2026-03-31 · Qiyao Wang, Hongbo Wang, Longze Chen, Zhihao Yang, Guhong Chen, Hamid Alinejad-Rokny, Hui Li, Yuan Lin, Min Yang

General AI

Scientific idea generation (SIG) is critical to AI-driven autonomous research, yet existing approaches are often constrained by a static retrieval-then-generation paradigm, leading to homogeneous and insufficiently divergent ideas. In this work, we propose FlowPIE, a tightly coupled retrieval-generation framework that …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.5

Terminal Agents Suffice for Enterprise Automation

2026-03-31 · Patrice Bechard, Orlando Marquez Ayala, Emily Chen, Jordan Skelton, Sagar Davasam, Srinivas Sunkara, Vikas Yadav, Sai Rajeswar

Research Track B · General AI

There has been growing interest in building agents that can interact with digital platforms to execute meaningful enterprise tasks autonomously. Among the approaches explored are tool-augmented agents built on abstractions such as Model Context Protocol (MCP) and web agents that operate through graphical interfaces. Ye…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

Less Detail, Better Answers: Degradation-Driven Prompting for VQA

2026-04-06 · Haoxuan Han, Weijie Wang, Zeyu Zhang, Yefei He, Bohan Zhuang

Research Track A · General AI

Recent advancements in Vision-Language Models (VLMs) have significantly pushed the boundaries of Visual Question Answering (VQA).However,high-resolution details can sometimes become noise that leads to hallucinations or reasoning errors. In this paper,we propose Degradation-Driven Prompting (DDP), a novel framework tha…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

Appear2Meaning: A Cross-Cultural Benchmark for Structured Cultural Metadata Inference from Images

2026-04-08 · Yuechen Jiang, Enze Zhang, Md Mohsinul Kabir, Qianqian Xie, Stavroula Golfomitsou, Konstantinos Arvanitis, Sophia Ananiadou

General AI

Recent advances in vision-language models (VLMs) have improved image captioning for cultural heritage. However, inferring structured cultural metadata (e.g., creator, origin, period) from visual input remains underexplored. We introduce a multi-category, cross-cultural benchmark for this task and evaluate VLMs using an…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence

2026-04-08 · Jianhui Liu, Haoze Sun, Wenbo Li, Yanbing Zhang, Rui Yang, Zhiliang Zhu, Yijun Yang, Shenghe Zheng, Nan Jiang, Jiaxiu Jiang, Haoyang Huang, Tien-Tsin Wong, Nan Duan, Xiaojuan Qi

General AI

Spatial understanding is a fundamental cornerstone of human-level intelligence. Nonetheless, current research predominantly focuses on domain-specific data production, leaving a critical void: the absence of a principled, open-source engine capable of fully unleashing the potential of high-quality spatial data. To brid…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

DiningBench: A Hierarchical Multi-view Benchmark for Perception and Reasoning in the Dietary Domain

2026-04-12 · Song Jin, Juntian Zhang, Xun Zhang, Zeying Tian, Fei Jiang, Guojun Yin, Wei Lin, Yong Liu, Rui Yan

General AI

Recent advancements in Vision-Language Models (VLMs) have revolutionized general visual understanding. However, their application in the food domain remains constrained by benchmarks that rely on coarse-grained categories, single-view imagery, and inaccurate metadata. To bridge this gap, we introduce DiningBench, a hie…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

SciPredict: Can LLMs Predict the Outcomes of Scientific Experiments in Natural Sciences?

2026-04-12 · Udari Madhushani Sehwag, Elaine Lau, Haniyeh Ehsani Oskouie, Shayan Shabihi, Erich Liang, Andrea Toledo, Guillermo Mangialardi, Sergio Fonrouge, Ed-Yeremai Hernandez Cardona, Paula Vergara, Utkarsh Tyagi, Chen Bo Calvin Zhang, Pavi Bhatter, Nicholas Johnson, Furong Huang, Ernesto Gabriel Hernandez Montoya, Bing Liu

General AI

Accelerating scientific discovery requires the identification of which experiments would yield the best outcomes before committing resources to costly physical validation. While existing benchmarks evaluate LLMs on scientific knowledge and reasoning, their ability to predict experimental outcomes - a task where AI coul…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

2026-04-15 · Team HY-World, Chenjie Cao, Xuhui Zuo, Zhenwei Wang, Yisu Zhang, Junta Wu, Zhenyang Liu, Yuning Gong, Yang Liu, Bo Yuan, Chao Zhang, Coopers Li, Dongyuan Guo, Fan Yang, Haiyu Zhang, Hang Cao, Jianchen Zhu, Jiaxin Lin, Jie Xiao, Jihong Zhang, Junlin Yu, Lei Wang, Lifu Wang, Lilin Wang, Linus, Minghui Chen, Peng He, Penghao Zhao, Qi Chen, Rui Chen, Rui Shao, Sicong Liu, Wangchen Qin, Xiaochuan Niu, Xiang Yuan, Yi Sun, Yifei Tang, Yifu Sun, Yihang Lian, Yonghao Tan, Yuhong Liu, Yuyang Yin, Zhiyuan Min, Tengfei Wang, Chunchao Guo

General AI

We introduce HY-World 2.0, a multi-modal world model framework that advances our prior project HY-World 1.0. HY-World 2.0 accommodates diverse input modalities, including text prompts, single-view images, multi-view images, and videos, and produces 3D world representations. With text or single-view image inputs, the mo…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

An Optimal Transport-driven Approach for Cultivating Latent Space in Online Incremental Learning

2026-04-16 · Quyen Tran, Hai Nguyen, Hoang Phan, Quan Dao, Linh Ngo, Khoat Than, Dinh Phung, Dimitris Metaxas, Trung Le

General AI

In online incremental learning, data continuously arrives with substantial distributional shifts, creating a significant challenge because previous samples have limited replay value when learning a new task. Prior research has typically relied on either a single adaptive centroid or multiple fixed centroids to represen…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

Beyond Text-Dominance: Understanding Modality Preference of Omni-modal Large Language Models

2026-04-18 · Xinru Yan, Boxi Cao, Yaojie Lu, Hongyu Lin, Weixiang Zhou, Le Sun, Xianpei Han

General AI

Native Omni-modal Large Language Models (OLLMs) have shifted from pipeline architectures to unified representation spaces. However, this native integration gives rise to a critical yet underexplored phenomenon: modality preference. To bridge this gap, we first systematically quantify modality preference of OLLMs using …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

HSG: Hyperbolic Scene Graph

2026-04-19 · Liyang Wang, Zeyu Zhang, Hao Tang

General AI

Scene graph representations enable structured visual understanding by modeling objects and their relationships, and have been widely used for multiview and 3D scene reasoning. Existing methods such as MSG learn scene graph embeddings in Euclidean space using contrastive learning and attention based association. However…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.5

Mango: Multi-Agent Web Navigation via Global-View Optimization

2026-04-20 · Weixi Tong, Yifeng Di, Tianyi Zhang

Research Track B · General AI

Existing web agents typically initiate exploration from the root URL, which is inefficient for complex websites with deep hierarchical structures. Without a global view of the website's structure, agents frequently fall into navigation traps, explore irrelevant branches, or fail to reach target information within a lim…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 9.5

Mitigating Multimodal Hallucination via Phase-wise Self-reward

2026-04-20 · Yu Zhang, Chuyang Sun, Kehai Chen, Xuefeng Bai, Yang Xiang, Min Zhang

General AI

Large Vision-Language Models (LVLMs) still struggle with vision hallucination, where generated responses are inconsistent with the visual input. Existing methods either rely on large-scale annotated data for fine-tuning, which incurs massive computational overhead, or employ static post-hoc strategies that overlook the…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

Chasing the Public Score: User Pressure and Evaluation Exploitation in Coding Agent Workflows

2026-04-22 · Hardy Chen, Nancy Lau, Haoqin Tu, Shuo Yan, Xiangyan Liu, Zijun Wang, Juncheng Wu, Michael Qizhe Shieh, Alvaro A. Cardenas, Cihang Xie, Yuyin Zhou

General AI

Frontier coding agents are increasingly used in workflows where users supervise progress primarily through repeated improvement of a public score, namely the reported score on a public evaluation file with labels in the workspace, rather than through direct inspection of the agent's intermediate outputs. We study wheth…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.5

ImageHD: Energy-Efficient On-Device Continual Learning of Visual Representations via Hyperdimensional Computing

2026-04-23 · Jebacyril Arockiaraj, Dhruv Parikh, Viktor Prasanna

Research Track A · General AI

On-device continual learning (CL) is critical for edge AI systems operating on non-stationary data streams, but most existing methods rely on backpropagation or exemplar-heavy classifiers, incurring substantial compute, memory, and latency overheads. Hyperdimensional computing (HDC) offers a lightweight alternative thr…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

Seeing Isn't Believing: Uncovering Blind Spots in Evaluator Vision-Language Models

2026-04-23 · Mohammed Safi Ur Rahman Khan, Sanjay Suryanarayanan, Tushar Anand, Mitesh M. Khapra

General AI

Large Vision-Language Models (VLMs) are increasingly used to evaluate outputs of other models, for image-to-text (I2T) tasks such as visual question answering, and text-to-image (T2I) generation tasks. Despite this growing reliance, the reliability of these Evaluator VLMs remains under explored. In this work, we system…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company

2026-04-24 · Zhengxu Yu, Yu Fu, Zhiyuan He, Yuxuan Huang, Lee Ka Yiu, Meng Fang, Weilin Luo, Jun Wang

General AI

Individual agent capabilities have advanced rapidly through modular skills and tool integrations, yet multi-agent systems remain constrained by fixed team structures, tightly coupled coordination logic, and session-bound learning. We argue that this reflects a deeper absence: a principled organisational layer that gove…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

AnalogRetriever: Learning Cross-Modal Representations for Analog Circuit Retrieval

2026-04-25 · Yihan Wang, Lei Li, Yao Lai, Jing Wang, Yan Lu

General AI

Analog circuit design relies heavily on reusing existing intellectual property (IP), yet searching across heterogeneous representations such as SPICE netlists, schematics, and functional descriptions remains challenging. Existing methods are largely limited to exact matching within a single modality, failing to capture…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 9.5

ProEval: Proactive Failure Discovery and Efficient Performance Estimation for Generative AI Evaluation

2026-04-25 · Yizheng Huang, Wenjun Zeng, Aditi Kumaresan, Zi Wang

General AI

Evaluating generative AI models is increasingly resource-intensive due to slow inference, expensive raters, and a rapidly growing landscape of models and benchmarks. We propose ProEval, a proactive evaluation framework that leverages transfer learning to efficiently estimate performance and identify failure cases. ProE…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.5

PageGuide: Browser extension to assist users in navigating a webpage and locating information

2026-04-26 · Tin Nguyen, Thang T. Truong, Runtao Zhou, Trung Bui, Chirag Agarwal, Anh Totti Nguyen

Research Track B · General AI

Users browsing the web daily struggle to quickly locate relevant information in cluttered pages, complete unfamiliar multi-step tasks, and stay focused amid distracting content. State-of-the-art AI assistants (e.g., ChatGPT, Gemini, Claude) and browser agents (e.g., OpenAI Operator, Browser Use) can answer questions an…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms

2026-04-26 · Qi Li, Bo Yin, Weiqi Huang, Ruhao Liu, Bojun Zou, Runpeng Yu, Jingwen Ye, Weihao Yu, Xinchao Wang

General AI

Vision-Language-Action (VLA) models are emerging as a unified substrate for embodied intelligence. This shift raises a new class of safety challenges, stemming from the embodied nature of VLA systems, including irreversible physical consequences, a multimodal attack surface across vision, language, and state, real-time…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

AutoGUI-v2: A Comprehensive Multi-Modal GUI Functionality Understanding Benchmark

2026-04-27 · Hongxin Li, Xiping Wang, Jingran Su, Zheng Ju, Yuntao Chen, Qing Li, Zhaoxiang Zhang

Research Track B · General AI

Autonomous agents capable of navigating Graphical User Interfaces (GUIs) hold the potential to revolutionize digital productivity. However, achieving true digital autonomy extends beyond reactive element matching; it necessitates a predictive mental model of interface dynamics and the ability to foresee the "digital wo…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning

2026-04-27 · Yiming Zhang, Jiacheng Chen, Jiaqi Tan, Yongsen Mao, Wenhu Chen, Angel X. Chang

General AI

Current evaluations of spatial intelligence can be systematically invalid under modern vision-language model (VLM) settings. First, many benchmarks derive question-answer (QA) pairs from point-cloud-based 3D annotations originally curated for traditional 3D perception. When such annotations are treated as ground truth …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

Efficient Training on Multiple Consumer GPUs with RoundPipe

2026-04-29 · Yibin Luo, Shiwei Gao, Huichuan Zheng, Youyou Lu, Jiwu Shu

General AI

Fine-tuning Large Language Models (LLMs) on consumer-grade GPUs is highly cost-effective, yet constrained by limited GPU memory and slow PCIe interconnects. Pipeline parallelism combined with CPU offloading mitigates these hardware bottlenecks by reducing communication overhead. However, existing PP schedules suffer fr…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

Leveraging Verifier-Based Reinforcement Learning in Image Editing

2026-04-30 · Hanzhong Guo, Jie Wu, Jie Liu, Yu Gao, Zilyu Ye, Linxiao Yuan, Xionghui Wang, Yizhou Yu, Weilin Huang

General AI

While Reinforcement Learning from Human Feedback (RLHF) has become a pivotal paradigm for text-to-image generation, its application to image editing remains largely unexplored. A key bottleneck is the lack of a robust general reward model for all editing tasks. Existing edit reward models usually give overall scores wi…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

2026-05-01 · Indraneil Paul, Glavaš Glavas, Iryna Gurevych

General AI

Reward models (RMs) have become an indispensable fixture of the language model (LM) post-training playbook, enabling policy alignment and test-time scaling. Research on the application of RMs in code generation, however, has been comparatively sparse, with existing work largely focusing on execution feedback. This choi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.5

Shortcut Solutions Learned by Transformers Impair Continual Compositional Reasoning

2026-05-06 · William T. Redman, Erik C. Johnson, Brian Robinson

Research Track A · General AI

Identifying and exploiting common features across domains is at the heart of the human ability to make analogies, and is believed to be crucial for the ability to continually learn. To do this successfully, general and flexible computational strategies must be developed. While the extent to which Transformer neural net…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.5

Scene-Adaptive Continual Learning for CSI-based Human Activity Recognition with Mixture of Experts

2026-05-07 · Wenhan Zheng, Yuyi Mao, Ivan Wang-Hei Ho

Research Track A

Channel state information (CSI)-based human activity recognition (HAR) is vulnerable to performance degradation under domain shifts across varying physical environments. Continual learning (CL) offers a principled way to learn new domains sequentially while preserving past knowledge, but existing CL solutions for CSI-b…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

EarlyTom: Early Token Compression Completes Fast Video Understanding

2026-05-28 · Hesong Wang, Xin Jin, Lu Lu, Chenhaowen Li, Jian Chen, Qiang Liu, Huan Wang

Research Track A · General AI

Video large language models (Video-LLMs) have demonstrated strong capabilities in video understanding tasks. However, their practical deployment is still hindered by the inefficiency introduced by processing massive amounts of visual tokens. Although recent approaches achieve extremely low token retention ratios while …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.5

TRACE: Discovering Task-Specific Parameter via Adaptation-Aware Probing for Continual Fine-Tuning

2026-05-29 · Xiaosong Han, Ke Chen, Xindi Dai, Di Liang, Minlong Peng, Wei Pang, Fausto Giunchiglia, Xiaoyue Feng, Yonghao Liu, Renchu Guan

Research Track A · General AI

In real-world deployment, LLMs are often adapted continually across tasks to keep LLMs up-to-date in production, where new fine-tuning should preserve previously learned skills. However, indiscriminately mixing tasks can dilute task specialization, while sequential fine-tuning (full-parameter or low rank adaptation) of…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

Answer Presence Drives RAG Rewriting Gains

2026-06-04 · Yuejie Li, Yueying Hua, Ke Yang, Li Zhang, Yueping He, Ruiqi Li, Bolin Chen, Tao Wang, Bowen Li, Chengjun Mao

General AI

Retrieval-augmented QA pipelines often route retrieved passages through an LLM rewriter before a smaller reader, lifting F1 by tens of points on multi-hop benchmarks; this gain is typically credited to improved evidence quality. We ask whether that lift is causally driven by the gold answer string appearing in the rewr…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.5

FadeMem: Distance-Aware Memory Consolidation for Autoregressive Video Diffusion

2026-06-09 · Yu Lu, Junjie Yang, Piotr Koniusz, YuXin Song, Yi Yang

Research Track A · General AI

Autoregressive video generators synthesize long videos by generating successive temporal segments, but their historical KV cache grows with video length. Existing bounded-cache methods reduce this cost with local windows, sink tokens, or compressed memory states, yet they usually assign fixed roles to different parts o…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

TreeSeeker: Tree-Structured Trial, Error, and Return in Deep Search

2026-06-10 · Zhuofan Shi, Mingzhe Ma, Lu Wang, Fangkai Yang, Pu Zhao, Yiming Guan, Youling Huang, Wei Zhang, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan

General AI

Deep search requires agents to answer complex questions through multi-step web search, browsing, evidence comparison, and synthesis. A central challenge is deciding how to search when several directions look plausible but only some will later lead to reliable evidence. If an agent greedily follows the current best-look…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 9.5

Implicit Reasoning for Large Language Model-based Generative Recommendation

2026-06-15 · Yinhan He, Liam Collins, Bhuvesh Kumar, Jundong Li, Neil Shah, Donald Loveland

General AI

Large Language Models (LLMs) are increasingly adopted as backbones for Generative Recommendation (GR), promising access to pretrained world knowledge. Yet reliably invoking this knowledge for GR remains poorly understood. A key obstacle is that LLM-based GR typically represents items with Semantic IDs (SIDs), disruptin…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

PermaVid: Consistent Video Generation Across Edits via Disentangled Context Memory

2026-06-15 · Shuai Yang, Bingjie Gao, Ziwei Liu, Jiaqi Wang, Dahua Lin, Tong Wu

General AI

Consistent video generation under editing operations requires persistence: when edits modify scene appearance or layout, subsequent generations should remain coherent across time and viewpoints. However, existing memory designs struggle to maintain long-term consistency after such modifications, as stored contexts may …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

ActWorld: From Explorable to Interactive World Model via Action-Aware Memory

2026-06-16 · Zhexiao Xiong, Yizhi Song, Hao Kang, Qing Yan, Liming Jiang, Jenson Yang, Zhoujie Fu, Stathi Fotiadis, Angtian Wang, Zichuan Liu, Bo Liu, Yiding Yang, Xin Lu, Nathan Jacobs

Research Track A · General AI

Interactive world models aim to simulate environment dynamics under real-time user actions. However, their action vocabulary is largely confined to navigation: most actions correspond to motion (e.g., walk, turn, look around), while interaction with objects in the scene (e.g., pick up plates, open doors, or trigger phy…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.5

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

2026-06-16 · Byung-Kwan Lee, Ximing Lu, Shizhe Diao, Minki Kang, Saurav Muralidharan, Karan Sapra, Andrew Tao, Pavlo Molchanov, Yejin Choi, Yu-Chiang Frank Wang, Ryo Hachiuma

General AI

Knowledge distillation transfers a teacher's competence to a small student but is brittle in the small-student regime: forcing the student to imitate logits from a much larger teacher concentrates it on the teacher's sharpest modes, hurting generalization on benchmark families beyond the training corpus. Reinforcement …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Privacy Practices of Browser Agents

2025-12-08 · Alisha Ukani, Hamed Haddadi, Ali Shahin Shamsabadi, Peter Snyder

Research Track B · General AI

This paper presents a systematic evaluation of the privacy behaviors and attributes of eight recent, popular browser agents. Browser agents are software that automate Web browsing using large language models and ancillary tooling. However, the automated capabilities that make browser agents powerful also make them high…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.3

Cognitive Dark Matter: Measuring What AI Misses

2026-03-03 · Patrick J. Mineault, Thomas L. Griffiths, Sean Escola

Research Track A · General AI

We propose that the jagged intelligence landscape of modern AI systems arises from a missing training signal that we call "cognitive dark matter" (CDM): brain functions that meaningfully shape behavior yet are hard to infer from behavior alone. We identify key CDM domains-metacognition, cognitive flexibility, episodic …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.3

Back to Basics: Revisiting ASR in the Age of Voice Agents

2026-03-26 · Geeyang Tay, Wentao Ma, Jaewon Lee, Yuzhi Tang, Daniel Lee, Weisu Yin, Dongming Shen, Silin Meng, Yi Zhu, Mu Li, Alex Smola

General AI

Automatic speech recognition (ASR) systems have achieved near-human accuracy on curated benchmarks, yet still fail in real-world voice agents under conditions that current evaluations do not systematically cover. Without diagnostic tools that isolate specific failure factors, practitioners cannot anticipate which condi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation

2026-03-26 · Yan Li, Zezi Zeng, Ziwei Zhou, Xin Gao, Muzhao Tian, Yifan Yang, Mingxi Cheng, Qi Dai, Yuqing Yang, Lili Qiu, Zhendong Wang, Zhengyuan Yang, Xue Yang, Lijuan Wang, Ji Li, Chong Luo

General AI

Recent advances in image generation models have expanded their applications beyond aesthetic imagery toward practical visual content creation. However, existing benchmarks mainly focus on natural image synthesis and fail to systematically evaluate models under the structured and multi-constraint requirements of real-wo…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

No Hard Negatives Required: Concept Centric Learning Leads to Compositionality without Degrading Zero-shot Capabilities of Contrastive Models

2026-03-26 · Hai X. Pham, David T. Hoffmann, Ricardo Guerrero, Brais Martinez

General AI

Contrastive vision-language (V&L) models remain a popular choice for various applications. However, several limitations have emerged, most notably the limited ability of V&L models to learn compositional representations. Prior methods often addressed this limitation by generating custom training data to obtain hard neg…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Training the Knowledge Base through Evidence Distillation and Write-Back Enrichment

2026-03-26 · Yuxing Lu, Xukai Zhao, Wei Wu, Jinzhuo Wang

General AI

The knowledge base in a retrieval-augmented generation (RAG) system is typically assembled once and never revised, even though the facts a query requires are often fragmented across documents and buried in irrelevant content. We argue that the knowledge base should be treated as a trainable component and propose WriteB…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

2026-03-30 · Omer Dahary, Benaya Koren, Daniel Garibi, Daniel Cohen-Or

General AI

Modern Text-to-Image (T2I) diffusion models have achieved remarkable semantic alignment, yet they often suffer from a significant lack of variety, converging on a narrow set of visual solutions for any given prompt. This typicality bias presents a challenge for creative applications that require a wide range of generat…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

ContextClaim: A Context-Driven Paradigm for Verifiable Claim Detection

2026-03-31 · Yufeng Li, Rrubaa Panchendrarajan, Arkaitz Zubiaga

General AI

Verifiable claim detection asks whether a claim expresses a factual statement that can, in principle, be assessed against external evidence. As an early filtering stage in automated fact-checking, it plays an important role in reducing the burden on downstream verification components. However, existing approaches to cl…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

One-for-All: A Lightweight Stabilized and Parameter-Efficient Pre-trained LLM for Time Series Forecasting

2026-03-31 · Prasanjit Dey, Soumyabrata Dev, Bianca Schoen-Phelan

General AI

We address the challenge of adapting pre-trained Large Language Models (LLMs) for multivariate time-series analysis, where their deployment is often hindered by prohibitive computational and memory demands. Our solution, One-for-All, introduces Gaussian Rank-Stabilized Low-Rank Adapters (rsLoRA) to enable parameter-eff…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Is One Token All It Takes? Graph Pooling Tokens for LLM-based GraphQA

2026-04-01 · Ankit Grover, Lodovico Giaretta, Rémi Bourgerie, Sarunas Girdzijauskas

General AI

The integration of Graph Neural Networks (GNNs) with Large Language Models (LLMs) has emerged as a promising paradigm for Graph Question Answering (GraphQA). However, effective methods for encoding complex structural information into the LLM's latent space remain an open challenge. Current state-of-the-art architecture…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Beyond the Assistant Turn: User Turn Generation as a Probe of Interaction Awareness in Language Models

2026-04-02 · Sarath Shekkizhar, Romain Cosentino, Adam Earle

General AI

Standard LLM benchmarks evaluate the assistant turn: the model generates a response to an input, a verifier scores correctness, and the analysis ends. This paradigm leaves unmeasured whether the LLM encodes any awareness of what follows the assistant response. We propose user-turn generation as a probe of this gap: giv…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Omni123: Exploring 3D Native Foundation Models with Limited 3D Data by Unifying Text to 2D and 3D Generation

2026-04-02 · Chongjie Ye, Cheng Cao, Chuanyu Pan, Yiming Hao, Yihao Zhi, Yuanming Hu, Xiaoguang Han

General AI

Recent multimodal large language models have achieved strong performance in unified text and image understanding and generation, yet extending such native capability to 3D remains challenging due to limited data. Compared to abundant 2D imagery, high-quality 3D assets are scarce, making 3D synthesis under-constrained. …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Generating Synthetic Doctor-Patient Conversations for Long-form Audio Summarization

2026-04-07 · Yanis Labrak, David Grünert, Séverin Baroudi, Jiyun Chun, Pawel Cyrta, Sergio Burdisso, Ahmed Hassoon, David Liu, Adam Rothschild, Reed Van Deusen, Petr Motlicek, Andrew Perrault, Ricard Marxer, Thomas Schaaf

General AI

Long-context audio reasoning is underserved in both training data and evaluation. Existing benchmarks target short-context tasks, and the open-ended generation tasks most relevant to long-context reasoning pose well-known challenges for automatic evaluation. We propose a synthetic data generation pipeline designed to s…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Agentic Compilation: Mitigating the LLM Rerun Crisis for Minimized-Inference-Cost Web Automation

2026-04-08 · Jagadeesh Chundru

Research Track B · General AI

LLM-driven web agents operating through continuous inference loops -- repeatedly querying a model to evaluate browser state and select actions -- exhibit a fundamental scalability constraint for repetitive tasks. We characterize this as the Rerun Crisis: the linear growth of token expenditure and API latency relative t…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.3

Demystifying OPD: Length Inflation and Stabilization Strategies for Large Language Models

2026-04-09 · Feng Luo, Yu-Neng Chuang, Guanchu Wang, Zicheng Xu, Xiaotian Han, Tianyi Zhang, Vladimir Braverman

General AI

On-policy distillation (OPD) trains student models under their own induced distribution while leveraging supervision from stronger teachers. We identify a failure mode of OPD: as training progresses, on-policy rollouts can undergo abrupt length inflation, causing truncated trajectories to dominate the training data. Th…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

ParseBench: A Document Parsing Benchmark for AI Agents

2026-04-09 · Boyang Zhang, Sebastián G. Acosta, Preston Carlson, Sacha Bron, Pierre-Loïc Doulcet, Simon Suo

General AI

AI agents are changing the requirements for document parsing. What matters is \emph{semantic correctness}: parsed output must preserve the structure and meaning needed for autonomous decisions, including correct table structure, precise chart data, semantically meaningful formatting, and visual grounding. Existing benc…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

A Mechanistic Analysis of Looped Reasoning Language Models

2026-04-13 · Hugh Blayney, Álvaro Arroyo, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Michael M. Bronstein, Xiaowen Dong

General AI

Reasoning has become a central capability in large language models. Recent research has shown that reasoning performance can be improved by looping an LLM's layers in the latent dimension, resulting in looped reasoning language models. Despite promising results, few works have investigated how their internal dynamics d…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Enhancing Program Repair with Specification Guidance and Intermediate Behavioral Signals

2026-04-13 · Minh Le-Anh, Cuong Chi Le, Tien N. Nguyen

General AI

Automated Program Repair (APR) has recently benefited from large language models (LLMs). However, most LLM-based APR approaches still rely primarily on coarse end-to-end signals from test-suite outcomes to guide repair, providing limited insight into where a program's internal logic deviates from its intended behavior.…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation

2026-04-13 · Donghao Zhou, Guisheng Liu, Hao Yang, Jiatong Li, Jingyu Lin, Xiaohu Huang, Yichen Liu, Xin Gao, Cunjian Chen, Shilei Wen, Chi-Wing Fu, Pheng-Ann Heng

General AI

In this work, we study Human-Object Interaction Video Generation (HOIVG), which aims to synthesize high-quality human-object interaction videos conditioned on text, reference images, audio, and pose. This task holds significant practical value for automating content creation in real-world applications, such as e-commer…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Retrieval Is Not Enough: Why Organizational AI Needs Epistemic Infrastructure

2026-04-13 · Federico Bottino, Carlo Ferrero, Nicholas Dosio, Pierfrancesco Beneventano

General AI

Organizational knowledge used by AI agents typically lacks epistemic structure: retrieval systems surface semantically relevant content without distinguishing binding decisions from abandoned hypotheses, contested claims from settled ones, or known facts from unresolved questions. We argue that the ceiling on organizat…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Causal Diffusion Models for Counterfactual Outcome Distributions in Longitudinal Data

2026-04-14 · Farbod Alinezhad, Jianfei Cao, Gary J. Young, Brady Post

General AI

Predicting counterfactual outcomes in longitudinal data, where sequential treatment decisions heavily depend on evolving patient states, is critical yet notoriously challenging due to complex time-dependent confounding and inadequate uncertainty quantification in existing methods. We introduce the Causal Diffusion Mode…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation

2026-04-14 · Yecheng Wu, Song Han, Hai Cai

General AI

On-policy distillation (OPD) has emerged as an efficient post-training paradigm for large language models. However, standard OPD requires a live teacher inference server throughout training, resulting in substantial infrastructure overhead. In this work, we investigate whether on-policy distillation can be performed of…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

LogicEval: A Systematic Framework for Evaluating Automated Repair Techniques for Logical Vulnerabilities in Real-World Software

2026-04-14 · Syed Md Mukit Rashid, Abdullah Al Ishtiaq, Kai Tu, Yilu Dong, Tianwei Wu, Ali Ranjbar, Tianchang Yang, Najrin Sultana, Shagufta Mehnaz, Syed Rafiul Hussain

General AI

Logical vulnerabilities in software stem from flaws in program logic rather than memory safety, which can lead to critical security failures. Although existing automated program repair techniques primarily focus on repairing memory corruption vulnerabilities, they struggle with logical vulnerabilities because of their …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

AD4AD: Benchmarking Visual Anomaly Detection Models for Safer Autonomous Driving

2026-04-16 · Fabrizio Genilotti, Arianna Stropeni, Gionata Grotto, Francesco Borsatti, Manuel Barusco, Davide Dalle Pezze, Gian Antonio Susto

General AI

The reliability of a machine vision system for autonomous driving depends heavily on its training data distribution. When a vehicle encounters significantly different conditions, such as atypical obstacles, its perceptual capabilities can degrade substantially. Unlike many domains where errors carry limited consequence…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

ControlFoley: Unified and Controllable Video-to-Audio Generation with Cross-Modal Conflict Handling

2026-04-16 · Jianxuan Yang, Xinyue Guo, Zhi Cheng, Kai Wang, Lipan Zhang, Jinjie Hu, Qiang Ji, Yihua Cao, Yihao Meng, Zhaoyue Cui, Mengmei Liu, Meng Meng, Jian Luan

General AI

Recent advances in video-to-audio (V2A) generation enable high-quality audio synthesis from visual content, yet achieving robust and fine-grained controllability remains challenging. Existing methods suffer from weak textual controllability under visual-text conflict and imprecise stylistic control due to entangled tem…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

DPC: Training-Free Text-to-SQL Candidate Selection via Dual-Paradigm Consistency

2026-04-16 · Boyan Li, Ou Ocean Kun Hei, Yue Yu, Yuyu Luo

General AI

While Large Language Models (LLMs) demonstrate impressive proficiency in generating SQL queries, they fundamentally lack the capability to self-evaluate correctness without an execution oracle. This limitation creates a stark Generation-Selection Gap, where high potential accuracy (Pass@K) fails to translate into execu…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Exploring Agentic Visual Analytics: A Co-Evolutionary Framework of Roles and Workflows

2026-04-17 · Tianqi Luo, Leixian Shen, Yuyu Luo

General AI

Agentic visual analytics (VA) represents an emerging class of systems in which large language model (LLM)-driven agents autonomously plan, execute, evaluate, and iterate across the full visual analytics pipeline. By shifting users from low-level tool operations to high-level analytical goals expressed through natural l…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Using Large Language Models and Knowledge Graphs to Improve the Interpretability of Machine Learning Models in Manufacturing

2026-04-17 · Thomas Bayer, Alexander Lohr, Sarah Weiß, Bernd Michelberger, Wolfram Höpken

General AI

Explaining Machine Learning (ML) results in a transparent and user-friendly manner remains a challenging task of Explainable Artificial Intelligence (XAI). In this paper, we present a method to enhance the interpretability of ML models by using a Knowledge Graph (KG). We store domain-specific data along with ML results…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents

2026-04-19 · Ziao Zhang, Kou Shi, Shiting Huang, Avery Nie, Yu Zeng, Yiming Zhao, Zhen Fang, Qishen Su, Haibo Qiu, Wei Yang, Qingnan Ren, Shun Zou, Wenxuan Huang, Lin Chen, Zehui Chen, Feng Zhao

Research Track A · General AI

As the capability frontier of autonomous agents continues to expand, they are increasingly able to complete specialized tasks through plug-and-play external skills. Yet current benchmarks mostly test whether models can use provided skills, leaving open whether they can discover skills from experience, repair them after…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

BhashaSutra: A Task-Centric Unified Survey of Indian NLP Datasets, Corpora, and Resources

2026-04-20 · Raghvendra Kumar, Devankar Raj, Sriparna Saha

General AI

India's linguistic landscape, spanning 22 scheduled languages and hundreds of marginalized dialects, has driven rapid growth in NLP datasets, benchmarks, and pretrained models. However, no dedicated survey consolidates resources developed specifically for Indian languages. Existing reviews either focus on a few high-re…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

ReCap: Lightweight Referential Grounding for Coherent Story Visualization

2026-04-20 · Aditya Arora, Akshita Gupta, Pau Rodriguez, Marcus Rohrbach

General AI

Story Visualization aims to generate a sequence of images that faithfully depicts a textual narrative that preserve character identity, spatial configuration, and stylistic coherence as the narratives unfold. Maintaining such cross-frame consistency has traditionally relied on explicit memory banks, architectural expan…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

TLoRA: Task-aware Low Rank Adaptation of Large Language Models

2026-04-20 · Weicheng Lin, Yi Zhang, Jiawei Dang, Liang-Jie Zhang

General AI

Low-Rank Adaptation (LoRA) has become a widely adopted parameter-efficient fine-tuning method for large language models, with its effectiveness largely influenced by the allocation of ranks and scaling factors, as well as initialization. Existing LoRA variants typically address only one of these factors, often at the c…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Low-Rank Adaptation for Critic Learning in Off-Policy Reinforcement Learning

2026-04-21 · Yuan Zhuang, Yuexin Bian, Sihong He, Jie Feng, Qing Su, Songyang Han, Jonathan Petit, Shihao Ji, Yuanyuan Shi, Fei Miao

General AI

Scaling critic capacity is a promising direction for enhancing off-policy reinforcement learning (RL). However, larger critics are prone to overfitting and unstable in replay-buffer-based bootstrap training. This paper leverages Low-Rank Adaptation (LoRA) as a structural-sparsity regularizer for off-policy critics. Our…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

PC2Model: ISPRS benchmark on 3D point cloud to model registration

2026-04-21 · Mehdi Maboudi, Said Harb, Jackson Ferrao, Kourosh Khoshelham, Yelda Turkan, Karam Mawas

General AI

Point cloud registration involves aligning one point cloud with another or with a three-dimensional (3D) model, enabling the integration of multimodal data into a unified representation. This is essential in applications such as construction monitoring, autonomous driving, robotics, and virtual or augmented reality (VR…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery

2026-04-21 · Abhinav Agarwal

General AI

LLM-assisted defect discovery has a precision crisis: plausible-but-wrong reports overwhelm maintainers and degrade credibility for real findings. We present Refute-or-Promote, an inference-time reliability pattern combining Stratified Context Hunting (SCH) for candidate generation, adversarial kill mandates, context a…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

EVENT5Ws: A Large Dataset for Open-Domain Event Extraction from Documents

2026-04-23 · Praval Sharma, Ashok Samal, Leen-Kiat Soh, Deepti Joshi

General AI

Event extraction identifies the central aspects of events from text. It supports event understanding and analysis, which is crucial for tasks such as informed decision-making in emergencies. Therefore, it is necessary to develop automated event extraction approaches. However, existing datasets for algorithm development…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Grounding Video Reasoning in Physical Signals

2026-04-23 · Alibay Osmanli, Zixu Cheng, Shaogang Gong

General AI

Physical video understanding requires more than naming an event correctly. A model can answer a question about pouring, sliding, or collision from textual regularities while still failing to localize the event in time or space. We introduce a grounded benchmark for physical video understanding that extends the what--wh…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Seeing Fast and Slow: Learning the Flow of Time in Videos

2026-04-23 · Yen-Siang Wu, Rundong Luo, Jingsen Zhu, Tao Tu, Ali Farhadi, Matthew Wallingford, Yu-Chiang Frank Wang, Steve Marschner, Wei-Chiu Ma

General AI

How can we tell whether a video has been sped up or slowed down? How can we generate videos at different speeds? Although videos have been central to modern computer vision research, little attention has been paid to perceiving and controlling the passage of time. In this paper, we study time as a learnable visual conc…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale

2026-04-23 · Jun Wang, Ziyin Zhang, Rui Wang, Hang Yu, Peng Di

General AI

Real-time detection and mitigation of technical anomalies are critical for large-scale cloud-native services, where even minutes of downtime can result in massive financial losses and diminished user trust. While customer incidents serve as a vital signal for discovering risks missed by monitoring, extracting actionabl…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Can QPP Choose the Right Query Variant? Evaluating Query Variant Selection for RAG Pipelines

2026-04-24 · Negar Arabzadeh, Andrew Drozdov, Michael Bendersky, Matei Zaharia

General AI

Large Language Models (LLMs) have made query reformulation ubiquitous in modern retrieval and Retrieval-Augmented Generation (RAG) pipelines, enabling the generation of multiple semantically equivalent query variants. However, executing the full pipeline for every reformulation is computationally expensive, motivating …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

EV-CLIP: Efficient Visual Prompt Adaptation for CLIP in Few-shot Action Recognition under Visual Challenges

2026-04-24 · Hyo Jin Jon, Longbin Jin, Eun Yi Kim

General AI

CLIP has demonstrated strong generalization in visual domains through natural language supervision, even for video action recognition. However, most existing approaches that adapt CLIP for action recognition have primarily focused on temporal modeling, often overlooking spatial perception. In real-world scenarios, visu…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

GazeVLA: Learning Human Intention for Robotic Manipulation

2026-04-24 · Chengyang Li, Kaiyi Xiong, Yuan Xu, Lei Qian, Yizhou Wang, Wentao Zhu

General AI

Embodied foundation models have achieved significant breakthroughs in robotic manipulation, yet they still depend heavily on large-scale robot demonstrations. Although recent works have explored leveraging human data to alleviate this dependency, effectively extracting transferable knowledge remains a significant chall…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

QuantClaw: Precision Where It Matters for OpenClaw

2026-04-24 · Manyi Zhang, Ji-Fu Li, Zhongao Sun, Xiaohao Liu, Zhenhua Dong, Xianzhi Yu, Haoli Bai, Xiaobo Xia

General AI

Autonomous agent systems such as OpenClaw introduce significant efficiency challenges due to long-context inputs and multi-turn reasoning. This results in prohibitively high computational and monetary costs in real-world development. While quantization is a standard approach for reducing cost and latency, its impact on…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Can Current Agents Close the Discovery-to-Application Gap? A Case Study in Minecraft

2026-04-27 · Zhou Ziheng, Huacong Tang, Jinyuan Zhang, Haowei Lin, Bangcheng Yang, Qian Long, Fang Sun, Yizhou Sun, Yitao Liang, Ying Nian Wu, Demetri Terzopoulos, Xiaofeng Gao

Research Track A · General AI

Discovering causal regularities and applying them to build functional systems--the discovery-to-application loop--is a hallmark of general intelligence, yet evaluating this capacity has been hindered by the vast complexity gap between scientific discovery and real-world engineering. We introduce SciCrafter, a Minecraft…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Defective Task Descriptions in LLM-Based Code Generation: Detection and Analysis

2026-04-27 · Amal Akli, Mike Papadakis, Maxime Cordy, Yves Le Traon

General AI

Large language models are widely used for code generation, yet they rely on an implicit assumption that the task descriptions are sufficiently detailed and well-formed. However, in practice, users may provide defective descriptions, which can have a strong effect on code correctness. To address this issue, we develop S…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Meta-CoT: Enhancing Granularity and Generalization in Image Editing

2026-04-27 · Shiyi Zhang, Yiji Cheng, Tiankai Hang, Zijin Yin, Runze He, Yu Xu, Wenxun Dai, Yunlong Lin, Chunyu Wang, Qinglin Lu, Yansong Tang

General AI

Unified multi-modal understanding/generative models have shown improved image editing performance by incorporating fine-grained understanding into their Chain-of-Thought (CoT) process. However, a critical question remains underexplored: what forms of CoT and training strategy can jointly enhance both the understanding …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses

2026-04-28 · Jiahang Lin, Shichun Liu, Chengjun Pan, Lizhi Lin, Shihan Dou, Xuanjing Huang, Hang Yan, Zhenhua Han, Tao Gui

General AI

Harnesses have become a central determinant of coding-agent performance, shaping how models interact with repositories, tools, and execution environments. Yet automating harness engineering is hard: a heterogeneous action space, sparse and noisy evaluation signal, multi-million-token trajectories, and edits whose effec…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models

2026-04-28 · Ajmain Inqiad Alam, Palash Roy, Chanchal K. Roy, Banani Roy, Kevin A. Schneider

General AI

The accelerating adoption of Large Language Models (LLMs) in software engineering (SE) has brought with it a silent crisis: unsustainable computational cost. While these models demonstrate remarkable capabilities in different SE tasks, they are unmanageably large, slow to deploy, memory-intensive, and carbon-heavy. Thi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios

2026-04-28 · Jinxiang Meng, Shaoping Huang, Fangyu Lei, Jingyu Guo, Haoxiang Liu, Jiahao Su, Sihan Wang, Yao Wang, Enrui Wang, Ye Yang, Hongze Chai, Jinming Lv, Anbang Yu, Huangjing Zhang, Yitong Zhang, Yiming Huang, Zeyao Ma, Shizhu He, Jun Zhao, Kang Liu

General AI

Real-world data visualization (DV) requires native environmental grounding, cross-platform evolution, and proactive intent alignment. Yet, existing benchmarks often suffer from code-sandbox confinement, single-language creation-only tasks, and assumption of perfect intent. To bridge these gaps, we introduce DV-World, a…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Explainable AI for Jet Tagging: A Comparative Study of GNNExplainer, GNNShap, and GradCAM for Jet Tagging in the Lund Jet Plane

2026-04-28 · Pahal D. Patel, Sanmay Ganguly

General AI

Graph neural networks such as ParticleNet and transformer based networks on point clouds such as ParticleTransformer achieve state-of-the-art performance on jet tagging benchmarks at the Large Hadron Collider, yet the physical reasoning behind their predictions remains opaque. We present different methods, i.e. perturb…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

From Threads to Trajectories: A Multi-LLM Pipeline for Community Knowledge Extraction from GitHub Issue Discussions

2026-04-28 · Nazia Shehnaz Joynab, Soneya Binta Hossain

General AI

Resolution of complex post-production issues in large-scale open-source software (OSS) projects requires significant cognitive effort, as developers need to go through long, unstructured and fragmented issue discussion threads before that. In this paper, we present SWE-MIMIC-Bench, an issue trajectory dataset generated…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Electricity price forecasting across Norway's five bidding zones in the post-crisis era

2026-04-29 · My Thi Diem Phan, Trung Tuyen Truong, Hoai Phuong Ha, Dat Thanh Nguyen

General AI

Norway's electricity market is heavily dominated by hydropower, but the 2021--2022 energy crisis and stronger integration with Continental Europe have fundamentally altered price formation, reducing the reliability of forecasting models calibrated on historical data. Despite the critical need for updated models, a unif…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Multimodal LLMs are not all you need for Pediatric Speech Language Pathology

2026-04-29 · Darren Fürst, Sebastian Steindl, Ulrich Schäfer

General AI

Speech Sound Disorders (SSD) affect roughly five percent of children, yet speech-language pathologists face severe staffing shortages and unmanageable caseloads. We test a hierarchical approach to SSD classification on the granular multi-task SLPHelmUltraSuitePlus benchmark. We propose a cascading approach from binary …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Virtual-reality based patient-specific simulation of spine surgical procedures: A fast, highly automated and high-fidelity system for surgical education and planning

2026-04-29 · Raj Kumar Ranabhat, Tayler D Ross, Tony Jiao, Jeremie Larouche, Joel Finkelstein, Michael Hardisty

General AI

Surgical training involves didactic teaching, mentor-led learning, surgical skills laboratories, and direct exposure to surgery; however, increasing clinical pressures have limited operating room (OR) exposure. This work leverages virtual reality (VR) to provide a safe and immersive training environment. Existing VR tr…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

World2VLM: Distilling World Model Imagination into VLMs for Dynamic Spatial Reasoning

2026-04-29 · Wanyue Zhang, Wenxiang Wu, Wang Xu, Jiaxin Luo, Helu Zhi, Yibin Huang, Shuo Ren, Zitao Liu, Jiajun Zhang

General AI

Vision-language models (VLMs) have shown strong performance on static visual understanding, yet they still struggle with dynamic spatial reasoning that requires imagining how scenes evolve under egocentric motion. Recent efforts address this limitation either by scaling spatial supervision with synthetic data or by cou…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

2026-04-30 · Chenxin Li, Zhengyang Tang, Huangxin Lin, Yunlong Lin, Shijue Huang, Shengyuan Liu, Bowen Ye, Rang Li, Lei Li, Benyou Wang, Yixuan Yuan

General AI

LLM agents are expected to complete end-to-end units of work across software tools, business services, and local workspaces. Yet many agent benchmarks freeze a curated task set at release time and grade mainly the final response, making it difficult to evaluate agents against evolving workflow demand or verify whether …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Low Rank Adaptation for Adversarial Perturbation

2026-04-30 · Han Liu, Shanghao Shi, Yevgeniy Vorobeychik, Chongjie Zhang, Ning Zhang

General AI

Low-Rank Adaptation (LoRA), which leverages the insight that model updates typically reside in a low-dimensional space, has significantly improved the training efficiency of Large Language Models (LLMs) by updating neural network layers using low-rank matrices. Since the generation of adversarial examples is an optimiz…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

REBENCH: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names (Extended Version)

2026-04-30 · Jun Yeon Won, Xin Jin, Shiqing Ma, Zhiqiang Lin

General AI

Large Language Models (LLMs) have achieved remarkable progress in recent years, driving their adoption across a wide range of domains, including computer security. In reverse engineering, LLMs are increasingly applied to critical tasks such as function and variable name recovery and type inference. However, despite the…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Towards Neuro-symbolic Causal Rule Synthesis, Verification, and Evaluation Grounded in Legal and Safety Principles

2026-04-30 · Zainab Rehan, Christian Medeiros Adriano, Sona Ghahremani, Holger Giese

General AI

Rule-based systems remain central in safety-critical domains but often struggle with scalability, brittleness, and goal misspecification. These limitations can lead to reward hacking and failures in formal verification, as AI systems tend to optimize for narrow objectives. In previous research, we developed a neuro-sym…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

FinSafetyBench: Evaluating LLM Safety in Real-World Financial Scenarios

2026-05-01 · Yutao Hou, Yihan Jiang, Yuhan Xie, Jian Yang, Liwen Zhang, Hailiang Huang, Guanhua Chen, Yun Chen

General AI

Large language models (LLMs) are increasingly applied in financial scenarios. However, they may produce harmful outputs, including facilitating illegal activities or unethical behavior, posing serious compliance risks. To systematically evaluate LLM safety in finance, we propose FinSafetyBench, a bilingual (English-Chi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Position: agentic AI orchestration should be Bayes-consistent

2026-05-01 · Theodore Papamarkou, Pierre Alquier, Matthias Bauer, Wray Buntine, Andrew Davison, Gintare Karolina Dziugaite, Maurizio Filippone, Andrew Y. K. Foong, Vincent Fortuin, Dimitris Fouskakis, Jes Frellsen, Eyke Hüllermeier, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Nikita Kotelevskii, Salem Lahlou, Yingzhen Li, Fang Liu, Clare Lyle, Thomas Möllenhoff, Konstantina Palla, Maxim Panov, Yusuf Sale, Kajetan Schweighofer, Artem Shelmanov, Siddharth Swaroop, Martin Trapp, Willem Waegeman, Andrew Gordon Wilson, Alexey Zaytsev

General AI

LLMs excel at predictive tasks and complex reasoning tasks, but many high-value deployments rely on decisions under uncertainty, for example, which tool to call, which expert to consult, or how many resources to invest. While the usefulness and feasibility of Bayesian approaches remain unclear for LLM inference, this p…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

From Failed Trajectories to Reliable LLM Agents: Diagnosing and Repairing Harness Flaws

2026-06-04 · Mengzhuo Chen, Junjie Wang, Zhe Liu, Yawen Wang, Qing Wang

General AI

LLM-based agents increasingly rely on harnesses that provide execution environments, tool interfaces, context, lifecycle orchestration, observability, verification, and governance. Existing self-improving agents and automatic harness evolution methods mainly improve agents through runtime supervision, prompt optimizati…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Human Adults and LLMs as Scientists: Who Benefits from Active Exploration?

2026-06-04 · Mandana Samiei, Eunice Yiu, Anthony GX-Chen, Dongyan Lin, Jocelyn Shen, Blake A. Richards, Alison Gopnik, Doina Precup

General AI

A long-standing finding in the causal learning literature is that adults struggle to identify conjunctive causal rules, where an effect requires the simultaneous presence of multiple causes, while performing better in disjunctive settings. However, most demonstrations of this ``conjunctive handicap'' rely on passive ob…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

PAMF: Prior-Aware Multimodal Fusion for Incomplete Time Series Data

2026-06-04 · Ziwen Kan, Wugeng Zheng, Tianlong Chen, Song Wang

General AI

In healthcare, multimodal time series tasks often operate on incomplete observations in practice, for example when ECG segments are lost because electrodes detach or an entire respiratory channel is unavailable during overnight monitoring. Such missingness typically appears in two structurally distinct patterns: within…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

RREDCoT: Segment-Level Reward Redistribution for Reasoning Models

2026-06-04 · Mykyta Ielanskyi, Kajetan Schweighofer, Lukas Aichberger, Sepp Hochreiter

General AI

Recent advancements in reasoning language models have been driven by Reinforcement Learning (RL) fine-tuning. Most often, these rely on the Group Relative Policy Optimization (GRPO) algorithm or modifications thereof to steer the models to produce Chain-of-Thought (CoT) traces. The final answer can only be verified, an…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Signal-Driven Observation for Long-Horizon Web Agents

2026-06-04 · Shubham Gaur, Ian Lane

Research Track B · General AI

Web agents operating over long horizons ingest raw DOM and accessibility trees -- routinely tens of thousands of tokens -- at every action step, causing progressive context degradation that erodes reasoning well before tasks complete. We argue that this coupling of observation frequency to action frequency is an archit…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.3

TLA-Prover: Verifiable TLA+ Specification Synthesis via Preference-Optimized Low-Rank Adaptation

2026-06-04 · Eric Spencer, Arslan Bisharat, Brian Ortiz, Khushboo Bhadauria, TaiNing Wang, George K. Thiruvathukal, Konstantin Laufer, Mohammed Abuhamad

General AI

TLA+ is a formal specification language for verifying distributed systems and safety-critical protocols. Large language models (LLMs) frequently produce TLA+ specifications that fail the TLC model checker for semantic reasons. Across 25 LLMs, the best public baseline is 26.6% syntactic parse and 8.6% semantic model-che…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies

2026-06-04 · Dong Jing, Jingchen Nie, Tianqi Zhang, Jiaqi Liu, Huaxiu Yao, Zhiwu Lu, Mingyu Ding

General AI

Robot manipulation alternates between low-risk transit phases that call for fast execution and high-risk contact stages that demand slow, precise motion. Yet existing Vision-Language-Action models (VLAs) only inherit a single fixed speed from training demonstrations. Prior efforts to accelerate VLAs through model compr…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators

2026-06-04 · Chenming Zhu, Jingli Lin, Yilin Long, Peizhou Cao, Tai Wang, Jiangmiao Pang, Xihui Liu

General AI

While Vision-Language Models (VLMs) have shown strong visual reasoning capabilities, their spatial reasoning abilities remain largely constrained to the observed images and text-oriented chain-of-thought. They often struggle to infer unobserved layouts, maintain cross-view consistency, and reason from alternative viewp…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Auditing Training Data in Domain-adapted LLMs: LoRA-MINT

2026-06-05 · Gonzalo Mancera, Daniel DeAlcala, Aythami Morales, Julian Fierrez, Ruben Tolosana, Francisco Jurado

General AI

We present LoRA-MINT, a new methodology for Membership Inference Test (MINT) applied to recent Large Language Models (LLMs) fine-tuned for specific Natural Language Processing (NLP) tasks through Low-Rank Adaptation (LoRA). The primary goal is to assess whether individual samples were part of the training data of these…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.3

Learning All-Terrain Locomotion for a Planetary Rover with Actively Articulated Suspension

2026-06-05 · Arthur Bouton, Tristan D. Hasseler, Michael Paton, Travis Brown, Jacob Levy, William Reid, Joshua Martin, Hari Nayar

Research Track A · General AI

This paper presents ERNEST, a four-wheeled planetary rover concept equipped with a two-degree-of-freedom Active Gimbal Suspension that combines yaw and roll actuation to enable wheel reconfiguration, steering, and active load redistribution. A single neural network controller, trained to track a desired path across cha…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.3

AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing

2026-06-08 · Jisong Cai, Long Ling, Shiwei Chu, Zhongshan Liu, Jiayue Kang, Zhixuan Liang, Wenjie Xu, Yinan Mao, Weinan Zhang, Xiaokang Yang, Ru Ying, Ran Zheng, Yao Mu

General AI

World-action models have emerged as a promising paradigm for robot manipulation, jointly modeling visual scene dynamics and actions to inject physical priors into policy learning. However, existing world-action models couple world prediction and action execution at the same temporal resolution, forcing the world branch…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Observability for Delegated Execution in Agentic AI Systems

2026-06-08 · Abhinav Mishra, Kumar Sharad

General AI

Delegation-scoped execution is not identifiable from standard observables: audit logs and execution traces can be identical under multiple incompatible delegation assignments. This gap is especially acute in LLM-based agentic systems, where agents dynamically select tools, vary execution sequences across runs for the s…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research

2026-06-08 · Pu Ning, Quan Chen, Kun Tao, Xinyu Tang, Tianshu Wang, Qianggang Cao, Xinyu Kong, Zujie Wen, Zhiqiang Zhang, Jun Zhou

General AI

Large language models are increasingly expected to handle complex, long-horizon real-world tasks whose context demands can grow without bound, yet model context windows remain inherently finite. Recent work explores a paradigm where a main agent decomposes tasks and dispatches subtasks to subagents, which execute and r…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

iMaC: Translating Actions into Motion and Contact Images for Embodied World Models

2026-06-08 · Zhenyu Wu, Xiuwei Xu, Yukun Zhou, Yifan Li, Qiuping Deng, Xiaofeng Wang, Zheng Zhu, Bingyao Yu, Ziwei Wang, Jiwen Lu, Haibin Yan

General AI

Embodied world models have emerged as a pivotal paradigm for visual robotic decision-making and interactive environment simulation. However, conventional embodied frameworks rely on low-dimensional structured action vectors (e.g., joint angles and end-effector poses), which suffer from limited expressive capacity, poor…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

A Unifying Lens on Supervised Fine-Tuning Through Target Distribution Design

2026-06-09 · Tong Xie, Yuanhao Ban, Yunqi Hong, Sohyun An, Yihang Chen, Cho-Jui Hsieh

General AI

Supervised fine-tuning (SFT) typically maximizes the likelihood of every token in a demonstrated trajectory. However, an observed token can be non-unique, noisy, or misaligned with the model prior. Strictly fitting toward this one-hot target may be suboptimal, especially when the pretrained model encodes a rich knowled…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents

2026-06-09 · Weixian Xu, Shilong Liu, Mengdi Wang

General AI

In this paper, we propose EEVEE, the first multi-dataset test-time prompt learning framework for LLM agents, enabling test-time prompt learning under real-world task streams. Existing methods are largely designed for single-dataset settings, while real-world applications require models to handle heterogeneous input str…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

A RAG-Enhanced Bi-Level Cognitive Orchestration Framework for LEO Satellite Networks

2026-06-13 · Yuhong Jiang, Zhishu Shen, Tong Yin, Qiushi Zheng, Yichao Jin, Fidan Mehmeti, Jiong Jin

General AI

The rapid growth of remote sensing data in Low Earth Orbit (LEO) satellite networks is increasingly constrained by limited downlink capacity to terrestrial networks. Satellite edge computing alleviates this pressure by enabling in-orbit data processing. However, it introduces a new challenge of spatio-temporal resource…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.3

R2RDreamer: 3D-aware Data Augmentation for Spatially-generalized 2D Manipulation Policies

2026-06-15 · Xiuwei Xu, Haowen Sun, Angyuan Ma, Yiwei Zhang, Zhenyu Wu, Xiaofeng Wang, Bingyao Yu, Zheng Zhu, Jie Zhou, Jiwen Lu

General AI

Spatial generalization is critical for imitation-learned manipulation policies, but achieving it typically requires scaling demonstrations across diverse object poses, robot configurations, and camera viewpoints. Data augmentation from a few source demonstrations offers a practical alternative to costly real-world coll…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Scalable Circuit Learning for Interpreting Large Language Models

2026-06-15 · Naiyu Yin, Dennis Wei, Tian Gao, Amit Dhurandhar, Karthikeyan Natesan Ramamurthy, Yue Yu

General AI

A prominent research direction in mechanistic interpretability is learning sparse circuits over LLM components to reveal how they jointly produce model behavior. However, raw neurons are polysemantic, making learned circuits hard to interpret. Sparse autoencoder (SAE) features alleviate this, but their high dimensional…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.3

Seeing Is Not Screening: Multimodal Hidden Instruction Attacks on Agent Skill Scanners

2026-06-16 · Xiaojun Jia, Jie Liao, Simeng Qin, Ke Ma, Wenbo Guo, Yebo Feng, Aishan Liu, Yang Liu

General AI

Agent skills are emerging as an important attack surface in LLM-based systems. Through an empirical study of existing skill scanners, we find that current defenses primarily rely on textual descriptions, manifests, and source code as the main signals for security analysis, which can leave visually conveyed malicious in…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.2

Bayesian Adaptation Gym: A Benchmark for the Bayesian Low-Rank Adaptation of Multi-Modal Language Models

2026-06-20 · Colin Samplawski, Ramneet Kaur, Manoj Acharya, Anirban Roy, Adam D. Cobb

General AI

Large multi-modal language models are increasingly deployed in high-stakes domains, making well-calibrated uncertainty essential. Traditional Bayesian methods approximate posteriors over all model weights, which becomes intractable for modern large models. For this reason, recent work instead considers Bayesian low-ran…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.2

"Zooming In" on Agentic Web Browsers as Assistive Technologies: A Case Study with a Low-Vision Technology Expert

2026-06-23 · Laura Colazzo, Giuseppe Anzillotti

General AI

Agentic Web Browsers (AWBs), powered by Large Language Models (LLMs), are emerging as autonomous systems capable of navigating the Web on behalf of users. Beyond enhancing productivity, they could also offer significant promise as Assistive Technologies (ATs) for visually-impaired individuals, transforming web interact…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.2

OpenThoughts-Agent: Data Recipes for Agentic Models

2026-06-23 · Negin Raoof, Richard Zhuang, Marianna Nezhurina, Etash Guha, Atula Tejaswi, Ryan Marten, Charlie F. Ruan, Tyler Griggs, Alexander Glenn Shaw, Hritik Bansal, E. Kelly Buchanan, Artem Gazizov, Reinhard Heckel, Chinmay Hegde, Sankalp Jajee, Daanish Khazi, Emmanouil Koukoumidis, Xiangyi Li, Hange Liu, Shlok Natarajan, Harsh Raj, Nicholas Roberts, Ethan Shen, Nishad Singhi, Michael Siu, Ashima Suvarna, Hanwen Xing, Patrick Yubeaton, Robert Zhang, Leon Liangyu Chen, Xiaokun Chen, Steven Dillmann, Saadia Gabriel, Xunyi Jiang, Anurag Kashyap, Boxuan Li, Yein Park, Minh Pham, Sujay Sanghavi, Lin Shi, Ke Sun, Yixin Wang, Zhiwei Xu, Erica Zhang, Siyan Zhao, Wanjia Zhao, Jenia Jitsev, Alex Dimakis, Benjamin Feuer, Ludwig Schmidt

General AI

Agentic language models dramatically expand the applications of AI yet little is publicly known about how to curate training data for broadly capable agents. Existing open efforts such as SWE-Smith, SERA, and Nemotron-Terminal typically target a single benchmark, leaving open the question of how to train models that ge…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.2

World Models in Pieces: Structural Certification for General Agents

2026-06-23 · Yikai Lu, Yifei Wu, Xinyu Lu, Tongxin Li

General AI

In the big-world regime, agents cannot be universally capable and their ability is inevitably specialized across a world model in pieces. Consequently, standard uniform guarantees fail to distinguish between the understanding of critical bottlenecks and irrelevant failures. We first formalize this limitation by proving…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.2

Model Forensics: Investigating Whether Concerning Behavior Reflects Misalignment

2026-06-24 · Aditya Singh, Gerson Kroiz, Senthooran Rajamanoharan, Neel Nanda

General AI

A central goal of safety research is determining whether a model is misaligned. Prior work has largely focused on detecting concerning behavior. But behavior alone does not establish misalignment: a concerning action can arise from benign causes such as confusion. This motivates model forensics: investigating whether t…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.2

On-Policy Self-Distillation with Sampled Demonstrations Reduces Output Diversity

2026-06-24 · Andrei Liviu Nicolicioiu, Mohammad Pezeshki, Aaron Courville

General AI

On-policy self-distillation achieves strong pass@1 accuracy by using a single model as both teacher and student, with the teacher conditioned on a correct demonstration to provide dense token-level feedback. We show that this could come at a hidden cost: rollout diversity decreases and pass@k curves flatten (i.e., gene…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.2

The Unfireable Safety Kernel: Execution-Time AI Alignment for AI Agents and Other Escapable AI Systems

2026-06-24 · Seth Dobrin, Łukasz Chmiel

General AI

AI agents are granted access to tools, APIs, and other infrastructure, making them active principals in those systems. The dominant approach places controls inside the agent's own runtime: system prompts, output filters, and guardrail libraries. Any control in the agent's address space is reachable by inputs that influ…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.0

Similarity-Aware Mixture-of-Experts for Data-Efficient Continual Learning

2026-03-24 · Connor Mclaughlin, Nigel Lee, Lili Su

Research Track A

Machine learning models often need to adapt to new data after deployment due to structured or unstructured real-world dynamics. The Continual Learning (CL) framework enables continuous model adaptation, but most existing approaches either assume each task contains sufficiently many data samples or that the learning tas…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.0

Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding

2026-04-09 · Mu Nan, Muquan Yu, Weijian Mai, Jacob S. Prince, Hossein Adeli, Rui Zhang, Jiahang Cao, Benjamin Becker, John A. Pyles, Margaret M. Henderson, Chunfeng Song, Nikolaus Kriegeskorte, Michael J. Tarr, Xiaoqing Hu, Andrew F. Luo

Research Track A · General AI

Visual decoding from brain signals is a key challenge at the intersection of computer vision and neuroscience, requiring methods that bridge neural representations and computational models of vision. A field-wide goal is to achieve generalizable, cross-subject models. A major obstacle towards this goal is the substanti…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.0

TCL: Enabling Fast and Efficient Cross-Hardware Tensor Program Optimization via Continual Learning

2026-04-14 · Chaoyao Shen, Linfeng Jiang, Yixian Shen, Tao Xu, Guoqing Li, Anuj Pathania, Andy D. Pimentel, Meng Zhang

Research Track A

Deep learning (DL) compilers rely on cost models and auto-tuning to optimize tensor programs for target hardware. However, existing approaches depend on large offline datasets, incurring high collection costs and offering suboptimal transferability across platforms. In this paper, we introduce TCL, a novel efficient an…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 9.0

Adaptive Unknown Fault Detection and Few-Shot Continual Learning for Condition Monitoring in Ultrasonic Metal Welding

2026-04-15 · Ahmadreza Eslaminia, Kuan-Chieh Lu, Klara Nahrstedt, Chenhui Shao

Research Track A

Ultrasonic metal welding (UMW) is widely used in industrial applications but is sensitive to tool wear, surface contamination, and material variability, which can lead to unexpected process faults and unsatisfactory weld quality. Conventional monitoring systems typically rely on supervised learning models that assume a…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.0

Do not copy and paste! Rewriting strategies for code retrieval

2026-05-08 · Andrea Gurioli, Federico Pennino, Maurizio Gabbrielli

General AI

Embedding-based code retrieval often suffers when encoders overfit to surface syntax. Prior work mitigates this by using LLMs to rephrase queries and corpora into a normalized style, but leaves two questions open: how much representational shift helps, and when is the per-query LLM call justified? We study a hierarchy …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 9.0

L2P: Unlocking Latent Potential for Pixel Generation

2026-05-12 · Zhennan Chen, Junwei Zhu, Xu Chen, Jiangning Zhang, Jiawei Chen, Zhuoqi Zeng, Wei Zhang, Chengjie Wang, Jian Yang, Ying Tai

General AI

Pixel diffusion models have recently regained attention for visual generation. However, training advanced pixel-space models from scratch demands prohibitive computational and data resources. To address this, we propose the Latent-to-Pixel (L2P) transfer paradigm, an efficient framework that directly harnesses the rich…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.0

WildRelight: A Real-World Benchmark and Physics-Guided Adaptation for Single-Image Relighting

2026-05-12 · Lezhong Wang, Mehmet Onurcan Kaya, Siavash Bigdeli, Jeppe Revall Frisvad

General AI

Recent single-image relighting methods, powered by advanced generative models, have achieved impressive photorealism on synthetic benchmarks. However, their effectiveness in the complex visual landscape of the real world remains largely unverified. A critical gap exists, as current datasets are typically designed for m…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 9.0

Benchmarking Composed Image Retrieval for Applied Earth Observation

2026-05-23 · Bill Psomas, Dionysis Christopoulos, Thanasis Petropoulos, Nikos Efthymiadis, Ioannis Kakogeorgiou, Ondřej Chum, Yannis Avrithis, Giorgos Tolias, Konstantinos Karantzalos

General AI

Remote sensing composed image retrieval (RSCIR) enables search in large satellite image archives using composed queries that combine a reference image with a textual modifier. Although RSCIR offers a flexible interface for expressing targeted retrieval intent, the transferability of modern composition methods to Earth …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 9.0

CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM

2026-05-24 · Yubo Li, Yidi Miao

Research Track B · General AI

Long-horizon LLM inference turns the key--value (KV) cache into the dominant GPU memory consumer and makes per-token attention increasingly expensive. Many common eviction policies use static recency windows or historical attention, leaving unused a signal computed on every decoding step: the model's current uncertaint…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.0

IORM: Hierarchical I/O Governance for Thousands of Consolidated Databases on Oracle Exadata

2026-05-27 · Rajarshi Chowdhury, Akshay Shah, Zakaria Alrmaih, Chenhao Guo, Anubhav Singh, Sue Lee

Research Track A · General AI

Oracle Exadata consolidates thousands of tenant databases onto shared storage infrastructure deployed at hundreds of customer sites worldwide. Oracle Multitenant architecture enables this extreme density, with thousands of tenant databases sharing a single Exadata storage system -- but this creates a multi-level resour…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 9.0

PRISM: A Multi-Dimensional Benchmark for Evaluating LLM Peer Reviewers

2026-05-27 · Ngoc Phan Phuoc Loc, Toan Huynh La Viet, Thanh Tran Khanh, Duy A Nguyen, Tuan Anh Nguyen Pham, Thanh Nguyen, Nitesh V. Chawla, Wray Buntine, Kok-Seng Wong, Khoa D. Doan, Binh T. Nguyen

General AI

The rapid growth in submissions to machine learning venues has strained the scientific peer-review system and intensified interest in LLM-based automated peer reviewers. However, how good these systems are actually, especially compared to human reviewers at catching scientific gaps, remains poorly understood. In this w…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.0

In-Context Reward Adaptation for Robust Preference Modeling

2026-05-28 · Zhenyu Sun, Zheng Xu, Ermin Wei

Research Track A · General AI

Reinforcement Learning from Human Feedback (RLHF) typically relies on static reward models to align Large Language Models with human preferences. However, human values are inherently diverse and heterogeneous, and a single reward model often lacks the robustness required to generalize to unseen preference domains. Whil…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 9.0

SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks

2026-05-29 · Wai-Chung Kwan, Aryo Pradipta Gema, Joshua Ong Jun Leang, Pasquale Minervini

General AI

Self-play can train language models without external supervision. However, existing methods require rule-checkable answers, leaving open-ended tasks dependent on curated prompts or frontier-model judges. We introduce SCOPE, a data-free self-play framework for open-ended tasks that co-evolves two policies: a Challenger …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.0

CloudCons: A Comprehensive End-to-End Benchmark for Cloud Resource Consolidation

2026-06-11 · Xiaobin Zhang, Lefei Shen, Mouxiang Chen, Zhuo Li, Hongkai Li, Han Fu, Jianling Sun, Xiaoxue Ren, Chenghao Liu

Research Track A · General AI

Driven by conservative over-provisioning to guarantee service reliability, resource utilization in cloud data centers remains at low levels. To mitigate this, the forecast-then-optimize paradigm has emerged to optimize consolidation by anticipating future demands. While emerging time series foundation models promise to…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.0

A Compositional Framework for Open-ended Intelligence

2026-06-13 · Ida Momennejad, Roberta Raileanu

Research Track A

Open-ended intelligence is the capacity to adapt to novel problems and environments that are substantially different from those in training. A mathematics of open-ended intelligence requires two pillars: first, a minimal set of representational primitives (e.g., states, actions) and algorithmic primitives (e.g., neares…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.0

SCAN: A Decision-Making Framework for Effective Task Allocation with Generative AI

2026-06-14 · Fendi Tsim, Alina Gutoreva

Research Track A

We introduce SCAN -- a human-centric decision-making framework to facilitate learners for effective task allocation with Generative Artificial Intelligence (GenAI) based on Vygotsky's Zone of Proximal Development and Metacognition. In SCAN, we systematize and formalize AI-human interaction by introducing a task-identif…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 9.0

Energy-Aware Scheduling for Serverless LLM Serving on Shared GPUs

2026-06-29 · Tianyu Wang, Gourav Rattihalli, Aditya Dhakal, Longfei Shangguan, Dejan Milojicic

Research Track A

As LLM inference becomes a major cloud workload, its growing energy footprint makes cluster-wide energy optimization increasingly important. Serverless LLM serving helps platforms absorb traffic volatility by elastically sharing GPU resources across models, but this sharing also makes energy optimization difficult. Mul…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.8

Autonomous LLM Agent Worms: Cross-Platform Propagation, Automated Discovery and Temporal Re-Entry Defense

2026-05-04 · Mingming Zha, Xiaofeng Wang

General AI

Autonomous LLM agents operate as long-running processes with persistent workspaces, memory files, scheduled task state, and messaging integrations. These features create a new propagation risk: attacker-influenced content can be written into persistent agent state, re-enter the LLM decision context through scheduled au…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.8

FlexSQL: Flexible Exploration and Execution Make Better Text-to-SQL Agents

2026-05-04 · Quang Hieu Pham, Yang He, Ping Nie, Canwen Xu, Davood Rafiei, Yuepeng Wang, Xi Ye, Jocelyn Qiaochu Chen

General AI

Text-to-SQL over large analytical databases requires navigating complex schemas, resolving ambiguous queries, and grounding decisions in actual data. Most current systems follow a fixed pipeline where schema elements are retrieved once upfront and the database is only revisited for post-hoc repair, limiting recovery fr…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.8

OphMAE: Bridging Volumetric and Planar Imaging with a Foundation Model for Adaptive Ophthalmological Diagnosis

2026-05-04 · Tienyu Chang, Zhen Chen, Renjie Liang, Jinyu Ding, Jie Xu, Sunu Mathew, Amir Reza Hajrasouliha, Andrew J. Saykin, Ruogu Fang, Yu Huang, Jiang Bian, Qingyu Chen

General AI

The advent of foundation models has heralded a new era in medical artificial intelligence (AI), enabling the extraction of generalizable representations from large-scale unlabeled datasets. However, current ophthalmic AI paradigms are predominantly constrained to single-modality inference, thereby creating a dissonance…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.8

TRACE: Temporal Reasoning over Context and Evidence for Activity Recognition in Smart Homes

2026-05-04 · Yingtian Shi, Abivishaq Balasubramanian, Jessica Herring, Jiachen Li, Juan Macias Romero, Rosemarie Santa Gonzalez, Varun Mishra, Agata Rozga, Xiang Zhi Tan, Thomas Plötz

General AI

Human activity recognition (HAR) in smart homes remains challenging because many daily activities exhibit similar local sensor patterns, while minimally intrusive sensing provides sparse and ambiguous observations. As a result, methods based on short temporal or event windows often fail to capture the broader temporal …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.8

When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition

2026-05-04 · Pehuén Moure, Niclas Pokel, Bilal Bounajma, Yingqiang Gao, Roman Boehringer, Longbiao Cheng, Shih-Chii Liu

General AI

Automatic speech recognition (ASR) systems remain brittle on dysarthric and other atypical speech. Recent audio-language models raise the possibility of improving performance by conditioning on additional clinical context at inference time, but it is unclear whether these models can make use of such information. We int…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.8

PSK at SemEval-2026 Task 9: Multilingual Polarization Detection Using Ensemble Gemma Models with Synthetic Data Augmentation

2026-05-06 · Srikar Kashyap Pulipaka

General AI

We present our system for SemEval-2026 Task 9: Multilingual Polarization Detection, a binary classification task spanning 22 languages. Our approach fine-tunes separate Gemma~3 models (12B and 27B parameters) per language using Low-Rank Adaptation (LoRA), augmented with synthetic data generated by a large language mode…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.8

SIGMA-ASL: Sensor-Integrated Multimodal Dataset for Sign Language Recognition

2026-05-07 · Xiaofang Xiao, Guangchao Li, Guangrong Zhao, Qi Lin, Wen Ma, Hongkai Wen, Yanxiang Wang, Yiran Shen

General AI

Automatic sign language recognition (SLR) has become a key enabler of inclusive human-computer interaction, fostering seamless communication between deaf individuals and hearing communities. Despite significant advances in multimodal learning, existing SLR research remains dominated by vision-based datasets, which are …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.8

Beyond Binary Edits Robust Multimodal Knowledge Editing with Adversarial Subspace Alignment

2026-05-22 · Haoyuan Wang, Xiaohao Liu, Jiajie Su, Jianmao Xiao, Chaochao Chen

General AI

Multimodal large language models (MLLMs) need efficient mechanisms to update knowledge without degrading existing capabilities. While intrinsic multimodal knowledge editing achieves strong reliability and locality, it often exhibits limited generality, failing to propagate edits across semantically equivalent visual an…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.8

LLMSurgeon: Diagnosing Data Mixture of Large Language Models

2026-05-28 · Yaxin Luo, Jiacheng Cui, Xiaohan Zhao, Xinyi Shang, Jiacheng Liu, Xinyue Bi, Zhaoyi Li, Zhiqiang Shen

General AI

The pretraining data mixture of Large Language Models (LLMs) constitutes their "digital DNA", shaping model behaviors, capabilities, and failure modes. Yet this composition is rarely disclosed, making post-hoc auditing of data combination or provenance difficult. In this work, we formalize $\textbf{Data Mixture Surgery…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.8

Lumos-Nexus: Efficient Frequency Bridging with Homogeneous Latent Space for Video Unified Models

2026-05-29 · Jiazheng Xing, Hangjie Yuan, Lingling Cai, Xinyu Liu, Yujie Wei, Fei Du, Hai Ci, Tao Feng, Jiasheng Tang, Weihua Chen, Fan Wang, Yong Liu

General AI

Connector-based video unified models have demonstrated strong capability in instruction-grounded video synthesis, but integrating a large high-fidelity generator into the unified training loop is computationally prohibitive, limiting achievable visual quality. We therefore propose Lumos-Nexus, a training-efficient unif…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.8

OrcaRouter: A Production-Oriented LLM Router with Hybrid Offline-Online Learning

2026-05-29 · Zhenghua Bao, Fengya Tian, Chris Zhang, Zhenjun Chen, Xile Ma, Yi Shi

General AI

The rapid development of large language models, each with distinct capabilities and inference costs, raises a practical deployment question: given an incoming request, which model should handle it? We present OrcaRouter, a production-oriented LLM router that combines a LinUCB-based contextual bandit over lexical and se…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.8

VisionPulse: Dynamic Visual Sparsity for Efficient Multimodal Reasoning

2026-05-29 · Hengbo Xu, Shengjie Jin, Yanbiao Ma, Zhiwu Lu

General AI

With the rapid advancement of large multimodal models (LMMs), inference-time overhead has become a key bottleneck for real-world deployment. Existing methods typically prune visual tokens at prefill, assuming the required visual evidence remains static during reasoning. However, we empirically show that visual evidence…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.8

What Am I Missing? Question-Answering as Hidden State Probing

2026-05-29 · Chu Fei Luo, Samuel Dahan, Xiaodan Zhu

General AI

Test-time reasoning has become a significant field of study since the introduction of chain-of-thought reasoning in large language models (LLMs). However, the mechanisms of this reasoning process are still under-explored -- from the same input prompt, and even the same partial solution, LLMs can produce varied answers …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.8

Fog Computing and Large Language Models: A vision for the mutual beneficiaries

2026-06-28 · Satish Narayana Srirama

General AI

Fog computing utilizes proximal computational resources for sensor data processing and actuation, and addresses the latency, network load, and privacy issues of cloud-centric Internet of Things. On the other hand, Large Language Models (LLMs) are a type of deep learning AI models, which are trained on enormous text dat…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.8

Building Multi-Task Agentic LLMs via Two-Phase Distillation

2026-06-29 · Huaijie Wang, Shusheng Xu, Yi Wu, Kaifeng Lyu

General AI

A key step toward artificial general intelligence is to train models that can perform multiple tasks. In this paper, we study how to build such models by first training separate RL experts for individual tasks and then consolidating them via distillation, as an alternative to directly training a single model on mixed t…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.8

MESA: Prioritizing Vulnerable Communication Channels for Securing Multi-Agent Systems

2026-06-29 · Kunyang Li, Kyle Domico, Jonathan Gregory, Patrick McDaniel

General AI

Multi-agent systems (MAS) are increasingly used to automate complex, distributed workflows. However, their inter-agent communication channels introduce new attack surfaces that remain poorly understood and are difficult to defend against. In this paper, we address how defenders should prioritize limited security effort…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.8

Open-Vocabulary and Referring Segmentation for 3D Gaussians Using 2D Detectors

2026-06-29 · Jameel Hassan, Yasiru Ranasinghe, Vishal Patel

General AI

3D Gaussian Splatting (3DGS) has emerged at the forefront of 3D scene reconstruction. Extending 3DGS with language-driven, open-vocabulary understanding has gained significant attention for real-world applications such as embodied AI. Recent methods achieve this by learning an instance feature attribute and assigning s…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.8

Optimization Dynamics Imprint Semantic Specificity in Contrastive Embedding Norms

2026-06-29 · Ziwei Su, Junyu Ren, Victor Veitch

General AI

Contrastive embedding models trained with scale-invariant losses are typically paired with distance metrics like cosine similarity, effectively ignoring embedding magnitudes. However, surprisingly, empirical studies reveal that despite this, these "discarded" norms seem to correlate with semantic properties such as con…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.8

Orca: The World is in Your Mind

2026-06-29 · Yihao Wang, Yuheng Ji, Mingyu Cao, Yanqing Shen, Runze Xiao, Huaihai Lyu, Senwei Xie, Euan Liu, Klara Tian, Tianfeng Long, Yichi Zhang, Zhengliang Cai, Ruike Chen, Jifan Zhao, Ruochuan Shi, Zihan Tang, Jing Lyu, Wenxing Tan, Ningbo Zhang, Yangtao Hu, Yuming Gao, Xiansheng Chen, Junkai Zhao, Congsheng Xu, Boan Zhu, Ziqi Wang, Yupu Feng, Qiongqiong Zhang, Yingli Zhao, Yulong Ao, Shaoxuan Xie, You Liu, Guocai Yao, Leiduo Zhang, Xiaodan Liu, Yunyan Zhang, Yance Jiao, Xinyan Yang, Jiaxing Wei, Xu Liu, Tengfei Pan, Shaokai Nie, Chunlei Men, Sen Cui, Xiaojie Jin, Hongyang Li, Jianlan Luo, Yao Mu, Yunchao Wei, Jun Yan, Hang Zhao, Xiaolong Zheng, Jiaming Li, Yonghua Lin, Tiejun Huang, Zhongyuan Wang, Pengwei Wang

General AI

We introduce Orca, an initial instantiation of a general world foundation model. Orca learns a unified world latent space from multimodal world signals and exposes it through multimodal readout interfaces. Rather than optimizing isolated next-token, next-frame, or next-action prediction, we are centered on Next-State-P…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.8

Poller: Are LLMs Suitable for Evaluating the Poetry Understanding Task?

2026-06-29 · Shanshan Wang, Derek F. Wong, Jingming Yao, Lidia S. Chao

General AI

Traditional automatic evaluation methods have been shown to be unsuitable for modern Chinese poetry because of the distinct nature of this literary genre. Human evaluation remains reliable, but is expensive and not applicable to large-scale data. In this paper, we propose Poller (Poetry LLM Evaluator), a novel method l…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.8

Set-Inclusive Uncertainty Modeling for Robust Brain Tumor Segmentation

2026-06-29 · Seunghun Baek, Jihwan Park, Jaeyoon Sim, Hoseok Lee, Seungjoo Lee, Won Hwa Kim

General AI

Multimodal MRI is essential for accurate brain tumor segmentation. However, acquiring all modalities at inference is often challenging in practice, which causes intrinsic uncertainty due to unavoidable information loss. Without modeling this uncertainty, existing methods encode incomplete evidence into deterministic re…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.8

AGVBench: A Reliability-Oriented Benchmark of Data Augmentation for Vein Recognition

2026-07-02 · Haiyang Li, Yuming Fu, Qun Song, Hongchao Liao, Jing Chen, Mounim A. EI-Yacoubi, Xin Jin

General AI

Vein recognition is a secure biometric technology often constrained by limited annotated data and imaging variations. While data augmentation mitigates this, strategies designed for natural images may disrupt the fine-grained topology and textures essential for identity discrimination. We present AGVBench, which evalua…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.6

Adoption and Ecosystem Health: A Longitudinal Analysis of Open-Source Multi-Agent Frameworks

2026-07-02 · Xi Zhang, Papi Menon, Vivian Chu, Koray Cosguner

General AI

Since ChatGPT's launch in November 2022, open-source agentic frameworks have proliferated, making framework selection important for engineering teams while obscured by popularity signals such as GitHub stars. This paper analyzes 15 major open-source AI agent framework repositories from late 2022 to early 2026, using 80…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.5

VectorGym: A Multitask Benchmark for SVG Code Generation, Sketching, and Editing

2026-02-22 · Juan Rodriguez, Haotian Zhang, Abhay Puri, Tianyang Zhang, Rishav Pramanik, Meng Lin, Xiaoqing Xie, Marco Terral, Darsh Kaushik, Aly Shariff, Perouz Taslakian, Spandana Gella, Sai Rajeswar, David Vazquez, Christopher Pal, Marco Pedersoli

General AI

We introduce VectorGym, a comprehensive benchmark suite for Scalable Vector Graphics (SVG) that spans generation from text and sketches, complex editing, and visual understanding. VectorGym addresses the lack of realistic, challenging benchmarks aligned with professional design workflows. Our benchmark comprises four t…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.5

Working Notes on Late Interaction Dynamics: Analyzing Targeted Behaviors of Late Interaction Models

2026-03-27 · Antoine Edy, Max Conti, Quentin Macé

General AI

While Late Interaction models exhibit strong retrieval performance, many of their underlying dynamics remain understudied, potentially hiding performance bottlenecks. In this work, we focus on two topics in Late Interaction retrieval: a length bias that arises when using multi-vector scoring, and the similarity distrib…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.5

Alertness Optimization for Shift Workers Using a Physiology-based Mathematical Model

2026-03-30 · Zidi Tao, A. Agung Julius, John T Wen

Research Track A

Sleep is vital for maintaining cognitive function, facilitating metabolic waste removal, and supporting memory consolidation. However, modern societal demands, particularly shift work, often disrupt natural sleep patterns. This can induce excessive sleepiness among shift workers in critical sectors such as healthcare a…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 8.5

DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing

2026-03-30 · Kailai Feng, Yuxiang Wei, Bo Chen, Yang Pan, Hu Ye, Songwei Liu, Chenqian Yan, Yuan Gao

General AI

Diffusion models have made significant progress in both text-to-image (T2I) generation and text-guided image editing. However, these models are typically built with billions of parameters, leading to high latency and increased deployment challenges. While on-device diffusion models improve efficiency, they largely focu…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 8.5

CutClaw: Agentic Hours-Long Video Editing via Music Synchronization

2026-03-31 · Shifang Zhao, Yihan Hu, Ying Shan, Yunchao Wei, Xiaodong Cun

General AI

Editing the video content with audio alignment forms a digital human-made art in current social media. However, the time-consuming and repetitive nature of manual video editing has long been a challenge for filmmakers and professional content creators alike. In this paper, we introduce CutClaw, an autonomous multi-agen…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 8.5

MedGemma 1.5 Technical Report

2026-04-06 · Andrew Sellergren, Chufan Gao, Fereshteh Mahvar, Timo Kohlberger, Fayaz Jamil, Madeleine Traverse, Alberto Tono, Bashir Sadjad, Lin Yang, Charles Lau, Liron Yatziv, Tiffany Chen, Bram Sterling, Kenneth Philbrick, Richa Tiwari, Yun Liu, Madhuram Jajoo, Chandrashekar Sankarapu, Swapnil Vispute, Harshad Purandare, Abhishek Bijay Mishra, Sam Schmidgall, Tao Tu, Anil Palepu, Chunjong Park, Tim Strother, Rahul Thapa, Yong Cheng, Preeti Singh, Kat Black, Yossi Matias, Katherine Chou, Avinatan Hassidim, Kavi Goel, Joelle Barral, Tris Warkentin, Shravya Shetty, Dale Webster, Sunny Virmani, David F. Steiner, Can Kirmizibayrak, Daniel Golden

General AI

We introduce MedGemma 1.5 4B, the latest model in the MedGemma collection. MedGemma 1.5 expands on MedGemma 1 by integrating additional capabilities: high-dimensional medical imaging (CT/MRI volumes and histopathology whole slide images), anatomical localization via bounding boxes, multi-timepoint chest X-ray analysis,…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.5

Small Vision-Language Models are Smart Compressors for Long Video Understanding

2026-04-09 · Junjie Fei, Jun Chen, Zechun Liu, Yunyang Xiong, Chong Zhou, Wei Wen, Junlin Han, Mingchen Zhuge, Saksham Suri, Qi Qian, Shuming Liu, Lemeng Wu, Raghuraman Krishnamoorthi, Vikas Chandra, Mohamed Elhoseiny, Chenchen Zhu

General AI

Adapting Multimodal Large Language Models (MLLMs) for hour-long videos is bottlenecked by context limits. Dense visual streams saturate token budgets and exacerbate the lost-in-the-middle phenomenon. Existing heuristics, like sparse sampling or uniform pooling, blindly sacrifice fidelity by discarding decisive moments …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.5

CONSCIENTIA: Can LLM Agents Learn to Strategize? Emergent Deception and Trust in a Multi-Agent NYC Simulation

2026-04-10 · Aarush Sinha, Arion Das, Soumyadeep Nag, Charan Karnati, Shravani Nag, Chandra Vadhan Raj, Aman Chadha, Vinija Jain, Suranjana Trivedy, Amitava Das

General AI

As large language models (LLMs) are increasingly deployed as autonomous agents, understanding how strategic behavior emerges in multi-agent environments has become an important alignment challenge. We take a neutral empirical stance and construct a controlled environment in which strategic behavior can be directly obse…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.5

SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context

2026-04-13 · Shuquan Lian, Juncheng Liu, Yazhe Chen, Yuhong Chen, Hui Li

General AI

Prior representative ReAct-style approaches in autonomous Software Engineering (SWE) typically lack the explicit System-2 reasoning required for deep analysis and handling complex edge cases. While recent reasoning models demonstrate the potential of extended Chain-of-Thought (CoT), applying them to the multi-turn SWE …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.5

Lyra 2.0: Explorable Generative 3D Worlds

2026-04-14 · Tianchang Shen, Sherwin Bahmani, Kai He, Sangeetha Grama Srinivasan, Tianshi Cao, Jiawei Ren, Ruilong Li, Zian Wang, Nicholas Sharp, Zan Gojcic, Sanja Fidler, Jiahui Huang, Huan Ling, Jun Gao, Xuanchi Ren

Research Track A

Recent advances in video generation enable a new paradigm for 3D scene creation: generating camera-controlled videos that simulate scene walkthroughs, then lifting them to 3D via feed-forward reconstruction techniques. This generative reconstruction approach combines the visual fidelity and creative capacity of video m…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.5

A Unified Model and Document Representation for On-Device Retrieval-Augmented Generation

2026-04-15 · Julian Killingback, Ofer Meshi, Henry Li, Hamed Zamani, Maryam Karimzadehgan

Research Track A · General AI

Traditional Retrieval-Augmented Generation (RAG) approaches generally assume that retrieval and generation occur on powerful servers removed from the end user. While this reduces local hardware constraints, it introduces significant drawbacks: privacy concerns regarding data access, recurring maintenance and storage co…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.5

Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips

2026-04-16 · Ido Galil, Moshe Kimhi, Ran El-Yaniv

General AI

Deep Neural Networks (DNNs) can be catastrophically disrupted by flipping only a handful of parameter bits. We introduce Deep Neural Lesion (DNL), a data-free and optimizationfree method that locates critical parameters, and an enhanced single-pass variant, 1P-DNL, that refines this selection with one forward and backw…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.5

WavAlign: Enhancing Intelligence and Expressiveness in Spoken Dialogue Models via Adaptive Hybrid Post-Training

2026-04-16 · Yifu Chen, Shengpeng Ji, Qian Chen, Tianle Liang, Yangzhuo Li, Ziqing Wang, Wen Wang, Jingyu Lu, Haoxiao Wang, Xueyi Pu, Fan Zhuo, Zhou Zhao

General AI

End-to-end spoken dialogue models have garnered significant attention because they offer a higher potential ceiling in expressiveness and perceptual ability than cascaded systems. However, the intelligence and expressiveness of current open-source spoken dialogue models often remain below expectations. Motivated by the…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.5

Code-Switching Information Retrieval: Benchmarks, Analysis, and the Limits of Current Retrievers

2026-04-19 · Qingcheng Zeng, Yuheng Lu, Zeqi Zhou, Heli Qi, Puxuan Yu, Fuheng Zhao, Hitomi Yanaka, Weihao Xuan, Naoto Yokoya

General AI

Code-switching is a pervasive linguistic phenomenon in global communication, yet modern information retrieval systems remain predominantly designed for, and evaluated within, monolingual contexts. To bridge this critical disconnect, we present a holistic study dedicated to code-switching IR. We introduce CSR-L (Code-Sw…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.5

Dual-View Training for Instruction-Following Information Retrieval

2026-04-20 · Qingcheng Zeng, Puxuan Yu, Aman Mehta, Fuheng Zhao, Rajhans Samdani

General AI

Instruction-following information retrieval (IF-IR) studies retrieval systems that must not only find documents relevant to a query, but also obey explicit user constraints such as required attributes, exclusions, or output preferences. However, most retrievers are trained primarily for semantic relevance and often fai…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.5

Hybrid Policy Distillation for LLMs

2026-04-22 · Wenhong Zhu, Ruobing Xie, Rui Wang, Pengfei Liu

General AI

Knowledge distillation (KD) is a powerful paradigm for compressing large language models (LLMs), whose effectiveness depends on intertwined choices of divergence direction, optimization strategy, and data regime. We break down the design of existing KD methods and present a unified view that establishes connections bet…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.5

Learning Hippo: Multi-attractor Dynamics and Stability Effects in a Biologically Detailed CA3 Extension of Hopfield Networks

2026-04-22 · Daniele Corradetti, Renato Corradetti

Research Track A · General AI

We present a biologically detailed extension of the classical Hopfield/Marr auto-associative memory model for CA3, implementing ten populations (two asymmetric pyramidal subtypes, eight GABAergic interneuron classes), forty-seven compartments, multi-rule plasticity (recurrent Hebb, BCM anti-saturation, mossy-fiber shor…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 8.5

Encoder-Free Human Motion Understanding via Structured Motion Descriptions

2026-04-23 · Yao Zhang, Zhuchenyang Liu, Thomas Ploetz, Yu Xiao

General AI

The world knowledge and reasoning capabilities of text-based large language models (LLMs) are advancing rapidly, yet current approaches to human motion understanding, including motion question answering and captioning, have not fully exploited these capabilities. Existing LLM-based methods typically learn motion-langua…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.5

StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition

2026-04-23 · Kwan Yun, Changmin Lee, Ayeong Jeong, Youngseo Kim, Seungmi Lee, Junyong Noh

General AI

Creative face stylization aims to render portraits in diverse visual idioms such as cartoons, sketches, and paintings while retaining recognizable identity. However, current identity encoders, which are typically trained and calibrated on natural photographs, exhibit severe brittleness under stylization. They often mis…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 8.5

dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model

2026-04-24 · Yaxuan Li, Zhongyi Zhou, Yefei Chen, Yaokai Xue, Yichen Zhu

General AI

Evaluating robotics policies across thousands of environments and thousands of tasks is infeasible with existing approaches. This motivates the need for a new methodology for scalable robotics policy evaluation. In this paper, we propose dWorldEval, which uses a discrete diffusion world model as a scalable evaluation p…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.5

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

2026-04-27 · NVIDIA, Amala Sanjay Deshmukh, Kateryna Chumachenko, Tuomas Rintamaki, Matthieu Le, Tyler Poon, Danial Mohseni Taheri, Ilia Karmanov, Guilin Liu, Jarno Seppanen, Arushi Goel, Mike Ranzinger, Greg Heinrich, Guo Chen, Lukas Voegtle, Philipp Fischer, Timo Roman, Karan Sapra, Collin McCarthy, Shaokun Zhang, Fuxiao Liu, Hanrong Ye, Yi Dong, Mingjie Liu, Yifan Peng, Piotr Zelasko, Zhehuai Chen, Nithin Rao Koluguri, Nune Tadevosyan, Lilit Grigoryan, Ehsan Hosseini Asl, Pritam Biswas, Leili Tavabi, Yuanhang Su, Zhiding Yu, Peter Jin, Alexandre Milesi, Netanel Haber, Yao Xu, Sarah Amiraslani, Nabin Mulepati, Eric Tramel, Jaehun Jung, Ximing Lu, Brandon Cui, Jin Xu, Zhiqi Li, Shihao Wang, Yuanguo Kuang, Huck Yang, Boyi Li, Hongxu Yin, Song Han, Pavlo Molchanov, Adi Renduchintala, Charles Wang, David Mosallanezhad, Soumye Singhal, Luis Vega, Katherine Cheung, Sreyan Ghosh, Yian Zhang, Alexander Bukharin, Venkat Srinivasan, Johnny Greco, Andre Manoel, Maarten Van Segbroeck, Suseella Panguliri, Rohit Watve, Divyanshu Kakwani, Shubham Pachori, Jeffrey Glick, Radha Sri-Tharan, Aileen Zaman, Khanh Nguyen, Shi Chen, Jiaheng Fang, Qing Miao, Wenfei Zhou, Yu Wang, Zaid Pervaiz Bhat, Varun Praveen, Arihant Jain, Ramanathan Arunachalam, Tomasz Kornuta, Ashton Sharabiani, Amy Shen, Wei Huang, Yi-Fu Wu, Ali Roshan Ghias, Huiying Li, Brian Yu, Nima Tajbakhsh, Chen Cui, Wenwen Gao, Li Ding, Terry Kong, Manoj Kilaru, Anahita Bhiwandiwalla, Marek Wawrzos, Daniel Korzekwa, Pablo Ribalta, Grzegorz Chlebus, Besmira Nushi, Ewa Dobrowolska, Maciej Jakub Mikulski, Kunal Dhawan, Steve Huang, Jagadeesh Balam, Yongqiang Wang, Nikolay Karpov, Valentin Mendelev, George Zelenfroynd, Meline Mkrtchyan, Omri Almog, Bhavesh Pawar, Rameshwar Shivbhakta, Sudeep Sabnis, Ashrton Sharabiani, Negar Habibi, Geethapriya Venkataramani, Pamela Peng, Prerit Rodney, Serge Panev, Richard Mazzarese, Nicky Liu, Michael Fukuyama, Andrii Skliar, Roger Waleffe, Duncan Riach, Yunheng Zou, Jian Hu, Hao Zhang, Binfeng Xu, Yuhao Yang, Zuhair Ahmed, Carlo del Mundo, Chad Voegele, Zhiyu Cheng, Nave Assaf, Daniel Afrimi, Natan Bagrov, Ran Zilberstein, Ofri Masad, Eugene Khvedchenia, Borys Tymchenko, Tomer Asida, Parth Mannan, Victor Cui, Michael Evans, Katherine Luna, Jie Lou, Pinky Xu, Guyue Huang, Michael Boone, Pradeep Thalasta, Adeola Adesoba, Dina Yared, Christopher Parisien, Leon Derczynski, Shaona Ghosh, Wes Feely, Micah Schaffer, Barnaby Simkin, Tomasz Grzegorzek, Rishabh Garg, Aastha Jhunjhunwala, Sergei Kolchenko, Farzan Memarian, Haran Kumar, Shiv Kumar, Isabel Hulseman, Anjali Shah, Kari Briski, Padmavathy Subramanian, Joey Conway, Udi Karpas, Jane Polak Scowcroft, Annie Surla, Shilpa Ammireddy, Ellie Evans, Jesse Oliver, Tom Balough, Chia-Chih Chen, Sandip Bhaskar, Alejandra Rico, Bardiya Sadeghi, Seph Mard, Meredith Price, Laya Sleiman, Saori Kaji, Wesley Helmholz, Wendy Quan, Michael Lightstone, Jonathan Cohen, Jian Zhang, Oleksii Kuchaiev, Boris Ginsburg, Jan Kautz, Eileen Long, Mohammad Shoeybi, Mostofa Patwary, Oluwatobi Olabiyi, Andrew Tao, Bryan Catanzaro

Research Track B · General AI

We introduce Nemotron 3 Nano Omni, the latest model in the Nemotron multimodal series and the first to natively support audio inputs alongside text, images, and video. Nemotron 3 Nano Omni delivers consistent accuracy improvements over its predecessor, Nemotron Nano V2 VL, across all modalities, enabled by advances in …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.5

Zero-to-CAD: Agentic Synthesis of Interpretable CAD Programs at Million-Scale Without Real Data

2026-04-27 · Mohammadmehdi Ataei, Farzaneh Askari, Kamal Rahimi Malekshan, Pradeep Kumar Jayaraman

General AI

Computer-Aided Design (CAD) models are defined by their construction history: a parametric recipe that encodes design intent. However, existing large-scale 3D datasets predominantly consist of boundary representations (B-Reps) or meshes, stripping away this critical procedural information. To address this scarcity, we …

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 8.5

BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate

2026-04-28 · Arnon Mazza, Elad Levi

General AI

Deploying guardrails for custom policies remains challenging, as generic safety models fail to capture task-specific requirements, while prompting LLMs suffers from inconsistent boundary-case performance and high inference costs. Training custom classifiers achieves both accuracy and efficiency, yet demands substantial…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.5

WAAA! Web Adversaries Against Agentic Browsers

2026-05-06 · Sohom Datta, Alex Nahapetyan, William Enck, Alexandros Kapravelos

Research Track B · General AI

Large language models (LLMs) are increasingly being integrated into web browsers to create agentic browsing systems that execute actions on behalf of the user. Prior work considering the security of agentic browsers focuses exclusively on indirect prompt-injection attacks. However, by failing to consider traditional we…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.5

GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs

2026-05-07 · Pranav Mantini, Shishir K. Shah

Research Track A

We address the challenge of knowledge composition in Vision-Language Models (VLMs), where accumulating expertise across multiple domains or tasks typically leads to catastrophic forgetting. We introduce GeoStack (Geometric Stacking), a modular framework that allows independently trained domain experts to be composed in…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.5

Masked Generative Transformer Is What You Need for Image Editing

2026-05-11 · Wei Chow, Linfeng Li, Xian Sun, Lingdong Kong, Zefeng Li, Qi Xu, Hang Song, Tian Ye, Xian Wang, Jinbin Bai, Shilin Xu, Xiangtai Li, Junting Pan, Shaoteng Liu, Ran Zhou, Tianshu Yang, Songhua Liu

Research Track A · General AI

Diffusion models dominate image editing, yet their global denoising mechanism entangles edited regions with surrounding context, causing modifications to propagate into areas that should remain intact. We propose a fundamentally different approach by leveraging Masked Generative Transformers (MGTs), whose localized tok…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.5

Self-Regulated Learning in Essay Writing: Consistency of Strategies and Impact on Outcomes

2026-05-14 · Gloria Fernández-Nieto, Kiyoshige Garcés, Mladen Raković, Tongguang Li, Xinyu Li, Linxuan Zhao, Dragan Gašević

Research Track A

Background: Abilities for effective self-regulated learning (SRL) are critical for lifelong learning, particularly during adolescence when these skills consolidate and strongly influence future learning. Their importance has grown with the rise of online and blended education. Yet, little is known about how secondary s…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.5

EMMA: Extracting Multiple physical parameters from Multimodal Data

2026-05-21 · Farhat Shaikh, Ayan Banerjee, Sandeep Gupta

General AI

We introduce EMMA, a physics-informed multimodal framework that recovers all identifiable dynamical parameters of a system directly from raw video, audio, and image-based time-series observations. Unlike prior video-only approaches that struggle with occluded states, hidden actuation inputs, or assumptions about known …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.5

Towards Explainability of SLMs by investigating Token Level Activation

2026-05-21 · Sayantani Ghosh, Rajashik Datta, Amit Kumar Das, Amlan Chakrabarti

Research Track A

Transformer-based language models such as BERT having 110M+ parameters have revolutionized natural language understanding, yet their internal mechanisms remain largely opaque to researchers and practitioners. Traditional attention-based interpretability methods often emphasize structurally important but semantically we…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.5

Developing a UXR Point of View for Cognitive Accessibility in Mobile Learning with Generative AI

2026-05-29 · Fatima Ahmad Muazu, Festus Adedoyin, Huseyin Dogan, Abiodun Adedeji, Melike Akca, Olumuyiwa Ayorinde

Research Track A · General AI

This study investigates how UX research (UXR) principles, combined with Large Language Model (LLM)-supported analysis, can be used to improve the quality of requirements for mobile learning systems designed for learners with cognitive disabilities. Using the UXR Point-of-View (PoV) pyramid as a methodological framework…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.5

The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset

2026-06-01 · Richard Schwarzkopf, Fabian Immel, Alexander Blumberg, Jonas Merkert, Nils Rack, Kaiwen Wang, Fabian Konstantinidis, Julian Truetsch, Carlos Fernandez, Annika Bätz, Kevin Rösch, Marlon Steiner, Willi Poh, Yinzhe Shen, Royden Wagner, Felix Hauser, Dominik Strutz, Jaime Villa, Gleb Stepanov, Holger Caesar, Ömer Şahin Taş, Frank Bieder, Jan-Hendrik Pauls, Christoph Stiller

General AI

Existing autonomous driving datasets have enabled major progress, but fall short in sensor fidelity, map completeness, or geographic diversity. We present KITScenes Multimodal, a European dataset built around high-fidelity sensors and maps. Our fully synchronized sensor suite combines high-resolution global-shutter cam…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.5

Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops

2026-06-08 · Ziqian Zhong, Ivgeni Segal, Ivan Bercovich, Shashwat Saxena, Kexun Zhang, Aditi Raghunathan

General AI

Agent benchmarks score submissions with outcome verifiers that are typically hand-written and brittle, leaving them open to reward hacking. We audit 1,968 tasks across five terminal-agent benchmarks and find 323 (16%) hackable by frontier models given only the task description. This corrupts both leaderboard rankings a…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.5

From 2D Grids to 1D Tokens: Reforming Shared Representations for Multimodal Image Fusion

2026-06-10 · Yuchen Xian, Yunqiu Xu, Yang He, Yi Yang

General AI

Multimodal image fusion aims to integrate complementary information from different modalities into a fused image that preserves rich local details while maintaining globally consistent appearance. Existing approaches build shared representations on 2D feature grids, which excel at modeling local structures but offer li…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.5

JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence

2026-06-10 · Dingyu Yao, Junhao Zhou, Chenxu Yang, Chuanyu Qin, Haowen Hou, Zheming Liang, Congcong Wang, Yuhang Cao, Shenglong Ye, Shuai Xie, Shuhuan Gu, Haoyang Huang, Qingyi Si, Nan Duan, Jiaqi Wang

General AI

Many moments in the real world do not wait for a user to ask. A fire starts on a security monitor, an expression flickers across a video call, or a product a viewer wants flashes by in a livestream. Yet today's large models remain mostly turn-based by design: they answer only when addressed, and even video-call apps th…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.4

Large-Scale Tunnel Air-Ground Collaboration With FLISP: Fast LiDAR-IMU Synchronized Path Planner

2026-06-25 · Fenghe Guo, Runjie Shen, Chenyang Sun, Junrui Zhang, Quanxi Zhan, Yongchun Wang, Junjie Zhang

General AI

Hydropower tunnel inspection is critical for infrastructure integrity yet remains inefficient and hazardous using manual methods. We propose FLISP (Fast LiDAR-IMU Synchronized Path Planner), a mapless planning framework for cooperative UGV-UAV inspection. Unlike traditional map-based paradigms, FLISP features three cor…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

Not All Layers Are Created Equal: Adaptive LoRA Ranks for Personalized Image Generation

2026-03-23 · Donald Shenaj, Federico Errica, Antonio Carta

General AI

Low Rank Adaptation (LoRA) is the de facto fine-tuning strategy to generate personalized images from pre-trained diffusion models. Choosing a good rank is extremely critical, since it trades off performance and memory consumption, but today the decision is often left to the community's consensus, regardless of the pers…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

MetaKube: An Experience-Aware LLM Framework for Kubernetes Failure Diagnosis

2026-03-24 · Wei Sun, Ting Wang, Xinran Tian, Wanshun Lan, Xuhan Feng, Haoyue Li, Fangxin Wang

General AI

Existing LLM-based Kubernetes diagnostic systems cannot learn from operational experience, operating on static knowledge bases without improving from past resolutions. We present MetaKube, an experience-aware LLM framework through three synergistic innovations: (1) an Episodic Pattern Memory Network (EPMN) that abstrac…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

Beyond Benchmarks: How Users Evaluate AI Chat Assistants

2026-03-26 · Moiz Sadiq Awan, Muhammad Haris Noor, Muhammad Salman Munaf

Research Track A · General AI

Automated benchmarks dominate the evaluation of large language models, yet no systematic study has compared user satisfaction, adoption motivations, and frustrations across competing platforms using a consistent instrument. We address this gap with a cross-platform survey of 388 active AI chat users, comparing satisfac…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

Hierarchy-Guided Multimodal Representation Learning for Taxonomic Inference

2026-03-26 · Sk Miraj Ahmed, Xi Yu, Yunqi Li, Yuewei Lin, Wei Xu

General AI

Accurate biodiversity identification from large-scale field data is a foundational problem with direct impact on ecology, conservation, and environmental monitoring. In practice, the core task is taxonomic prediction - inferring order, family, genus, or species from imperfect inputs such as specimen images, DNA barcode…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

RefAlign: Representation Alignment for Reference-to-Video Generation

2026-03-26 · Lei Wang, YuXin Song, Ge Wu, Haocheng Feng, Hang Zhou, Jingdong Wang, Yaxing Wang, jian Yang

General AI

Reference-to-video (R2V) generation is a controllable video synthesis paradigm that constrains the generation process using both text prompts and reference images, enabling applications such as personalized advertising and virtual try-on. In practice, existing R2V methods typically introduce additional high-level seman…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

KAT-Coder-V2 Technical Report

2026-03-29 · Fengxiang Li, Han Zhang, Haoyang Huang, Jinghui Wang, Jinhua Hao, Kun Yuan, Mengtong Li, Minglei Zhang, Pengcheng Xu, Wenhao Zhuang, Yizhen Shao, Zongxian Feng, Can Tang, Chao Wang, Chengxiao Tong, Fan Yang, Gang Xiong, Haixuan Gao, Han Gao, Hao Wang, Haochen Liu, Hongliang Sun, Jiabao Li, Jingwen Chang, Jun Du, Junyi Peng, Leizhen Cui, Meimei Jing, Mingqi Wu, Shangpeng Yan, Shaotong Qi, Suzhe Xu, Wenxuan Zhao, Xianda Sun, Xuan Xie, Yanbo Wang, Yao Xia, Yinghan Cui, Yingpeng Chen, Yong Wang, Yuze Shi, Zhiwei Shen, Ziyu Wang, Ming Sun, Lin Ye, Bin Chen

General AI

We present KAT-Coder-V2, an agentic coding model developed by the KwaiKAT team at Kuaishou. KAT-Coder-V2 adopts a "Specialize-then-Unify" paradigm that decomposes agentic coding into five expert domains - SWE, WebCoding, Terminal, WebSearch, and General - each undergoing independent supervised fine-tuning and reinforce…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

A Techno-Economic Framework for Cost Modeling and Revenue Opportunities in Open and Programmable AI-RAN

2026-03-30 · Gabriele Gemmi, Michele Polese, Tommaso Melodia

General AI

The large-scale deployment of 5G networks has not delivered the expected return on investment for mobile network operators, raising concerns about the economic viability of future 6G rollouts. At the same time, surging demand for Artificial Intelligence (AI) inference and training workloads is straining global compute …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

DRIVE-Nav: Directional Reasoning, Inspection, and Verification for Efficient Open-Vocabulary Navigation

2026-03-30 · Maoguo Gao, Zejun Zhu, Zhiming Sun, Zhengwei Ma, Longze Yuan, Zhongjing Ma, Zhigang Gao, Jinhui Zhang, Suli Zou

General AI

Open-Vocabulary Object Navigation (OVON) requires an embodied agent to locate a language-specified target in unknown environments. Existing zero-shot methods often reason over dense frontier points under incomplete observations, causing unstable route selection, repeated revisits, and unnecessary action overhead. We pr…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

Dynamic Lookahead Distance via Reinforcement Learning-Based Pure Pursuit for Autonomous Racing

2026-03-30 · Mohamed Elgouhary, Amr S. El-Wakeel

General AI

Pure Pursuit (PP) is a widely used path-tracking algorithm in autonomous vehicles due to its simplicity and real-time performance. However, its effectiveness is sensitive to the choice of lookahead distance: shorter values improve cornering but can cause instability on straights, while longer values improve smoothness …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

Graphilosophy: Graph-Based Digital Humanities Computing with The Four Books

2026-03-30 · Minh-Thu Do, Quynh-Chau Le-Tran, Duc-Duy Nguyen-Mai, Thien-Trang Nguyen, Khanh-Duy Le, Minh-Triet Tran, Tam V. Nguyen, Trung-Nghia Le

General AI

The Four Books have shaped East Asian intellectual traditions, yet their multi-layered interpretive complexity limits their accessibility in the digital age. While traditional bilingual commentaries provide a vital pedagogical bridge, computational frameworks are needed to preserve and explore this wisdom. This paper b…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

GreenFLag: A Green Agentic Approach for Energy-Efficient Federated Learning

2026-03-31 · Theodora Panagea, Nikolaos Koursioumpas, Lina Magoula, Ramin Khalili

General AI

Progressing toward a new generation of mobile networks, a clear focus on integrating distributed intelligence across the system is observed to drive performance, autonomy, and real-time adaptability. Federated learning (FL) stands out as a key emerging technique, enabling on-device model training while preserving data …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

SkillReducer: Optimizing LLM Agent Skills for Token Efficiency

2026-03-31 · Yudong Gao, Zongjie Li, Yuanyuanyuan, Zimo Ji, Pingchuan Ma, Shuai Wang

General AI

LLM-based coding agents rely on \emph{skills}, pre-packaged instruction sets that extend agent capabilities, yet every token of skill content injected into the context window incurs both monetary cost and attention dilution. To understand the severity of this problem, we conduct a large-scale empirical study of 55,315 …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

Adapting Text LLMs to Speech via Multimodal Depth Up-Scaling

2026-04-01 · Kazuki Yano, Jun Suzuki, Shinji Watanabe

General AI

Adapting pre-trained text Large Language Models (LLMs) into Speech Language Models (Speech LMs) via continual pretraining on speech data is promising, but often degrades the original text capabilities. We propose Multimodal Depth Upscaling, an extension of an emerging strategy in continual LLM pre-training, where new t…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

Answering the Wrong Question: Reasoning Trace Inversion for Abstention in LLMs

2026-04-02 · Abinitha Gourabathina, Inkit Padhi, Manish Nagireddy, Subhajit Chaudhury, Prasanna Sattigeri

General AI

For Large Language Models (LLMs) to be reliably deployed, models must effectively know when not to answer: abstain. Reasoning models, in particular, have gained attention for impressive performance on complex tasks. However, reasoning models have been shown to have worse abstention abilities. Taking the vulnerabilities…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

Impact of Multimodal and Conversational AI on Learning Outcomes and Experience

2026-04-02 · Karan Taneja, Anjali Singh, Ashok K. Goel

General AI

Multimodal Large Language Models (MLLMs) offer an opportunity to support multimedia learning through conversational systems grounded in educational content. However, while conversational AI is known to boost engagement, its impact on learning in visually-rich STEM domains remains under-explored. Moreover, there is limi…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

VOID: Video Object and Interaction Deletion

2026-04-02 · Saman Motamed, William Harvey, Benjamin Klein, Luc Van Gool, Zhuoning Yuan, Ta-Ying Cheng

General AI

Existing video object removal methods excel at inpainting content "behind" the object and correcting appearance-level artifacts such as shadows and reflections. However, when the removed object has more significant interactions, such as collisions with other objects, current models fail to correct them and produce impl…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

Debiasing LLMs by Fine-tuning

2026-04-03 · Zhenyu Gao, Wenxi Jiang, Yutong Yan

General AI

Prior research shows that large language models (LLMs) exhibit systematic extrapolation bias when forming predictions from both experimental and real-world data, and that prompt-based approaches appear limited in alleviating this bias. We propose a supervised fine-tuning (SFT) approach that uses Low-Rank Adaptation (Lo…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

Scalable Variational Bayesian Fine-Tuning of LLMs via Orthogonalized Low-Rank Adapters

2026-04-03 · Haotian Xiang, Bingcong Li, Qin Lu

General AI

When deploying large language models (LLMs) to safety-critical applications, uncertainty quantification (UQ) is of utmost importance to self-assess the reliability of the LLM-based decisions. However, such decisions typically suffer from overconfidence, particularly after parameter-efficient fine-tuning (PEFT) for down…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

Agentic Federated Learning: The Future of Distributed Training Orchestration

2026-04-06 · Rafael O. Jarczewski, Gabriel U. Talasso, Leandro Villas, Allan M. de Souza

General AI

Although Federated Learning (FL) promises privacy and distributed collaboration, its effectiveness in real-world scenarios is often hampered by the stochastic heterogeneity of clients and unpredictable system dynamics. Existing static optimization approaches fail to adapt to these fluctuations, resulting in resource un…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

Analyzing Symbolic Properties for DRL Agents in Systems and Networking

2026-04-06 · Mohammad Zangooei, Jannis Weil, Amr Rizk, Mina Tahmasbi Arashloo, Raouf Boutaba

General AI

Deep reinforcement learning (DRL) has shown remarkable performance on complex control problems in systems and networking, including adaptive video streaming, wireless resource management, and congestion control. For safe deployment, however, it is critical to reason about how agents behave across the range of system st…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

Gym-Anything: Turn any Software into an Agent Environment

2026-04-07 · Pranjal Aggarwal, Graham Neubig, Sean Welleck

General AI

Computer-use agents hold the promise of assisting in a wide range of digital economic activities. However, current research has largely focused on short-horizon tasks over a limited set of software with limited economic value, such as basic e-commerce and OS-configuration tasks. A key reason is that creating environmen…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

Lightweight Multimodal Adaptation of Vision Language Models for Species Recognition and Habitat Context Interpretation in Drone Thermal Imagery

2026-04-07 · Hao Chen, Fang Qiu, Fangchao Dong, Defei Yang, Eve Bohnett, Li An

General AI

This study proposes a lightweight multimodal adaptation framework to bridge the representation gap between RGB-pretrained VLMs and thermal infrared imagery, and demonstrates its practical utility using a real drone-collected dataset. A thermal dataset was developed from drone-collected imagery and was used to fine-tune…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

MAESTRO: Adapting GUIs and Guiding Navigation with User Preferences in Conversational Agents with GUIs

2026-04-07 · Sangwook Lee, Sang Won Lee, Adnan Abbas, Young-Ho Kim, Yan Chen

General AI

Modern task-oriented chatbots present GUI elements alongside natural-language dialogue, yet the agent's role has largely been limited to interpreting natural-language input as GUI actions and following a linear workflow. In preference-driven, multi-step tasks such as booking a flight or reserving a restaurant, earlier …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

PSI: Shared State as the Missing Layer for Coherent AI-Generated Instruments in Personal AI Agents

2026-04-09 · Zhiyuan Wang, Erzhen Hu, Mark Rucker, Laura E. Barnes

General AI

Personal AI tools can now be generated from natural-language requests, but they often remain isolated after creation. We present PSI, a shared-state architecture that turns independently generated modules into coherent instruments: persistent, connected, and chat-complementary artifacts accessible through both GUIs and…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

Polynomial Expansion Rank Adaptation: Enhancing Low-Rank Fine-Tuning with High-Order Interactions

2026-04-12 · Wenhao Zhang, Lin Mu, Li Ni, Peiquan Jin, Yiwen Zhang

General AI

Low-rank adaptation (LoRA) is a widely used strategy for efficient fine-tuning of large language models (LLMs), but its strictly linear structure fundamentally limits expressive capacity. The bilinear formulation of weight updates captures only first-order dependencies between low-rank factors, restricting the modeling…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

Autonomous Diffractometry Enabled by Visual Reinforcement Learning

2026-04-13 · J. Oppliger, M. Stifter, A. Rüegg, I. Biało, L. Martinelli, P. G. Freeman, D. Prabhakaran, J. Zhao, Q. Wang, J. Chang

General AI

Automation underpins progress across scientific and industrial disciplines. Yet, automating tasks requiring interpretation of abstract visual information remain challenging. For example, crystal alignment strongly relies on humans with the ability to comprehend diffraction patterns. Here we introduce an autonomous syst…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts

2026-04-13 · Chenxi Qing, Junxi Wu, Zheng Liu, Yixiang Qiu, Hongyao Yu, Bin Chen, Hao Wu, Shu-Tao Xia

General AI

Recently, large language models (LLMs) are capable of generating highly fluent textual content. While they offer significant convenience to humans, they also introduce various risks, like phishing and academic dishonesty. Numerous research efforts have been dedicated to developing algorithms for detecting AI-generated …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection

2026-04-13 · Wei Zhao, Zhe Li, Peixin Zhang, Jun Sun

General AI

Tool-augmented Large Language Model (LLM) agents have demonstrated impressive capabilities in automating complex, multi-step real-world tasks, yet remain vulnerable to indirect prompt injection. Adversaries exploit this weakness by embedding malicious instructions within tool-returned content, which agents directly inc…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

Detecting Safety Violations Across Many Agent Traces

2026-04-13 · Adam Stein, Davis Brown, Hamed Hassani, Mayur Naik, Eric Wong

General AI

To identify safety violations, auditors often search over large sets of agent traces. This search is difficult because failures are often rare, complex, and sometimes even adversarially hidden and only detectable when multiple traces are analyzed together. These challenges arise in diverse settings such as misuse campa…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

Dynamic Summary Generation for Interpretable Multimodal Depression Detection

2026-04-13 · Shiyu Teng, Jiaqing Liu, Hao Sun, Yu Li, Shurong Chai, Ruibo Hou, Tomoko Tateyama, Lanfen Lin, Yen-Wei Chen

General AI

Depression remains widely underdiagnosed and undertreated because stigma and subjective symptom ratings hinder reliable screening. To address this challenge, we propose a coarse-to-fine, multi-stage framework that leverages large language models (LLMs) for accurate and interpretable detection. The pipeline performs bin…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

Multi-Task LLM with LoRA Fine-Tuning for Automated Cancer Staging and Biomarker Extraction

2026-04-14 · Jiahao Shao, Anam Nawaz Khan, Christopher Brett, Tom Berg, Xueping Li, Bing Yao

General AI

Pathology reports serve as the definitive record for breast cancer staging, yet their unstructured format impedes large-scale data curation. While Large Language Models (LLMs) offer semantic reasoning, their deployment is often limited by high computational costs and hallucination risks. This study introduces a paramet…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

Enhancing Large Language Models with Retrieval Augmented Generation for Software Testing and Inspection Automation

2026-04-16 · Zoe Fingleton, Nazanin Siavash, Armin Moin

General AI

In this paper, we focus on automating two of the widely used Verification and Validation (V&V) activities in the Software Development Lifecycle (SDLC): Software testing and software inspection (also known as review). Concerning the former, we concentrate on automated test case generation using Large Language Models (LL…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography

2026-04-16 · Mélanie Roschewitz, Kenneth Styppa, Yitian Tao, Jiwoong Sohn, Jean-Benoit Delbrouck, Benjamin Gundersen, Nicolas Deperrois, Christian Bluethgen, Julia Vogt, Bjoern Menze, Farhad Nooralahzadeh, Michael Krauthammer, Michael Moor

General AI

Vision-language models (VLM) have markedly advanced AI-driven interpretation and reporting of complex medical imaging, such as computed tomography (CT). Yet, existing methods largely relegate clinicians to passive observers of final outputs, offering no interpretable reasoning trace for them to inspect, validate, or re…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

Towards Faster Language Model Inference Using Mixture-of-Experts Flow Matching

2026-04-16 · Aihua Li

General AI

Flow matching retains the generation quality of diffusion models while enabling substantially faster inference, making it a compelling paradigm for generative modeling. However, when applied to language modeling, it exhibits fundamental limitations in representing complex latent distributions with irregular geometries,…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

UrbanClipAtlas: A Visual Analytics Framework for Event and Scene Retrieval in Urban Videos

2026-04-16 · Joel Perca, Luis Sante, Juanpablo Heredia, Joao Rulff, Claudio Silva, Jorge Poco

General AI

Extracting actionable insights from long-duration urban videos is often labor-intensive: analysts must manually sift through raw footage to pinpoint target events or uncover broader behavioral trends. In this work, we present URBANCLIPATLAS, a visual analytics system for exploring long urban videos recorded at street i…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

ECLASS-Augmented Semantic Product Search for Electronic Components

2026-04-21 · Nico Baumgart, Markus Lange-Hegermann, Jan Henze

General AI

Efficient semantic access to industrial product data is a key enabler for factory automation and emerging LLM-based agent workflows, where both human engineers and autonomous agents must identify suitable components from highly structured catalogs. However, the vocabulary mismatch between natural-language queries and a…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

Epistemic orientation in parliamentary discourse is associated with deliberative democracy

2026-04-21 · Segun Aroyehun, Stephan Lewandowsky, David Garcia

General AI

The pursuit of truth is central to democratic deliberation and governance, yet political discourse reflects varying epistemic orientations, ranging from evidence-based reasoning grounded in verifiable information to intuition-based reasoning rooted in beliefs and subjective interpretation. We introduce a scalable appro…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

SAMoRA: Semantic-Aware Mixture of LoRA Experts for Task-Adaptive Learning

2026-04-21 · Boyan Shi, Wei Chen, Shuyuan Zhao, Junfeng Shen, Shengnan Guo, Shaojiang Wang, Huaiyu Wan

General AI

The combination of Mixture-of-Experts (MoE) and Low-Rank Adaptation (LoRA) has shown significant potential for enhancing the multi-task learning capabilities of Large Language Models. However, existing methods face two primary challenges: (1)Imprecise Routing in the current MoE-LoRA method fails to explicitly match inp…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

Safe Continual Reinforcement Learning in Non-stationary Environments

2026-04-21 · Austin Coursey, Abel Diaz-Gonzalez, Marcos Quinones-Grueiro, Gautam Biswas

Research Track A · General AI

Reinforcement learning (RL) offers a compelling data-driven paradigm for synthesizing controllers for complex systems when accurate physical models are unavailable; however, most existing control-oriented RL methods assume stationarity and, therefore, struggle in real-world non-stationary deployments where system dynam…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

Render-in-the-Loop: Vector Graphics Generation via Visual Self-Feedback

2026-04-22 · Guotao Liang, Zhangcheng Wang, Juncheng Hu, Haitao Zhou, Ziteng Xue, Jing Zhang, Dong Xu, Qian Yu

General AI

Multimodal Large Language Models (MLLMs) have shown promising capabilities in generating Scalable Vector Graphics (SVG) via direct code synthesis. However, existing paradigms typically adopt an open-loop "blind drawing" approach, where models generate symbolic code sequences without perceiving intermediate visual outco…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

Synthesizing Multi-Agent Harnesses for Vulnerability Discovery

2026-04-22 · Hanzhi Liu, Chaofan Shou, Xiaonan Liu, Hongbo Wen, Yanju Chen, Ryan Jingyang Fang, Yu Feng

General AI

LLM agents have begun to find real security vulnerabilities that human auditors and automated fuzzers missed for decades, in source-available targets where the analyst can build and instrument the code. In practice the work is split among several agents, wired together by a harness: the program that fixes which roles e…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis

2026-04-23 · Songen Gu, Yuhang Zheng, Weize Li, Yupeng Zheng, Yating Feng, Xiang Li, Yilun Chen, Pengfei Li, Wenchao Ding

General AI

Recently, end-to-end robotic manipulation models have gained significant attention for their generalizability and scalability. However, they often suffer from limited robustness to camera viewpoint changes when training with a fixed camera. In this paper, we propose VistaBot, a novel framework that integrates feed-forw…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs

2026-04-23 · Pegah Khayatan, Jayneel Parekh, Arnaud Dapogny, Mustafa Shukor, Alasdair Newson, Matthieu Cord

General AI

Despite impressive progress in capabilities of large vision-language models (LVLMs), these systems remain vulnerable to hallucinations, i.e., outputs that are not grounded in the visual input. Prior work has attributed hallucinations in LVLMs to factors such as limitations of the vision backbone or the dominance of the…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

An LLM-Driven Closed-Loop Autonomous Learning Framework for Robots Facing Uncovered Tasks in Open Environments

2026-04-24 · Hong Su

General AI

Autonomous robots operating in open environments need the ability to continuously handle tasks that are not covered by predefined local methods. However, existing approaches often rely on repeated large-language-model (LLM) interaction for uncovered tasks, and even successful executions or observed successful external …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

RealBench: A Repo-Level Code Generation Benchmark Aligned with Real-World Software Development Practices

2026-04-24 · Jia Li, Hongyi Deng, Yiran Zhang, Kechi Zhang, Tianqi Shao, Tiankuo Zhao, Weinan Wang, Zhi Jin, Ge Li, Yang Liu, Yingtao Fang, Yihong Dong

General AI

Writing code requires significant time and effort in software development. To automate this process, researchers have made substantial progress using Large Language Models (LLMs) for code generation. Many benchmarks like HumanEval and EvoCodeBench have been created to evaluate LLMs by requiring them to generate code fr…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

Don't Pause! Every prediction matters in a streaming video

2026-04-27 · Dibyadip Chatterjee, Zhanzhong Pang, Fadime Sener, Yale Song, Angela Yao

General AI

Streaming video models should respond the moment an event unfolds, not after the moment has passed. Yet existing online VideoQA benchmarks remain largely retrospective. They pause the video at fixed timestamps, pose questions about current or past events, and score models only at those moments. This protocol leaves str…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

PhysNote: Self-Knowledge Notes for Evolvable Physical Reasoning in Vision-Language Model

2026-04-27 · Sinin Zhang, Yunfei Xie, Yuxuan Cheng, Haoyu Zhang, Tong Zhang

General AI

Vision-Language Models (VLMs) have demonstrated strong performance on textbook-style physics problems, yet they frequently fail when confronted with dynamic real-world scenarios that require temporal consistency and causal reasoning across frames. We identify two fundamental challenges underlying these failures: (1) sp…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

SpecRLBench: A Benchmark for Generalization in Specification-Guided Reinforcement Learning

2026-04-27 · Zijian Guo, İlker Işık, H. M. Sabbir Ahmad, Wenchao Li

General AI

Specification-guided reinforcement learning (RL) provides a principled framework for encoding complex, temporally extended tasks using formal specifications such as linear temporal logic (LTL). While recent methods have shown promising results, their ability to generalize across unseen specifications and diverse enviro…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum

2026-04-28 · Chu-Cheng Lin, Eugene Ie

General AI

Adapting reasoning models to new tasks during post-training with only output-level supervision stalls under reinforcement learning from verifiable rewards (RLVR) when the initial success probability $p_0$ is small. Using the Tsallis $q$-logarithm, we define a loss family $J_Q$ that interpolates between RLVR (at $q{=}0$…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

Learning Generalizable Multimodal Representations for Software Vulnerability Detection

2026-04-28 · Zeming Dong, Yuejun Guo, Qiang Hu, Yao Zhang, Maxime Cordy, Hao Liu, Mike Papadakis, Yongqiang Lyu

General AI

Source code and its accompanying comments are complementary yet naturally aligned modalities-code encodes structural logic while comments capture developer intent. However, existing vulnerability detection methods mostly rely on single-modality code representations, overlooking the complementary semantic information em…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

Three Models of RLHF Annotation: Extension, Evidence, and Authority

2026-04-28 · Steve Coyne

General AI

Preference-based alignment methods, most prominently Reinforcement Learning with Human Feedback (RLHF), use the judgments of human annotators to shape large language model behaviour. However, the normative role of these judgments is rarely made explicit. I distinguish three conceptual models of that role. The first is …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

Towards Agentic Investigation of Security Alerts

2026-04-28 · Even Eilertsen, Vasileios Mavroeidis, Gudmund Grov

General AI

Security analysts are overwhelmed by the volume of alerts and the low context provided by many detection systems. Early-stage investigations typically require manual correlation across multiple log sources, a task that is usually time-consuming. In this paper, we present an experimental, agentic workflow that leverages…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

Variational Neural Belief Parameterizations for Robust Dexterous Grasping under Multimodal Uncertainty

2026-04-28 · Clinton Enwerem, Shreya Kalyanaraman, John S. Baras, Calin Belta

General AI

Contact variability, sensing uncertainty, and external disturbances make grasp execution stochastic. Expected-quality objectives ignore tail outcomes and often select grasps that fail under adverse contact realizations. Risk-sensitive POMDPs address this failure mode, but many use particle-filter beliefs that scale poo…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

Degree-dependent and distance-dependent contact rates interpolate between explosive, exponential and polynomial epidemic growth

2026-04-29 · Zylan Benjert, Júlia Komjáthy, Johannes Lengler, John Lapinskas, Ulysse Schaller

General AI

It is a fundamental question in epidemiology to estimate, model and predict the growth rate of a pandemic. Analogously, analysing the diffusion of innovation, (fake) news, memes, and rumours is of key importance in the social sciences. The resulting epidemic growth curves can be classified according to their growth rat…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

Domain-Adapted Small Language Models for Reliable Clinical Triage

2026-04-29 · Manar Aljohani, Brandon Ho, Kenneth McKinley, Dennis Ren, Xuan Wang

General AI

Accurate and consistent Emergency Severity Index (ESI) assignment remains a persistent challenge in emergency departments, where highly variable free-text triage documentation contributes to mistriage and workflow inefficiencies. This study evaluates whether open-source small language models (SLMs) can serve as reliabl…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

Exploring the Efficiency of 3D-Stacked AI Chip Architecture for LLM Inference with Voxel

2026-04-29 · Yiqi Liu, Noelle Crawford, Michael Wang, Jilong Xue, Jian Huang

General AI

To overcome the well-known memory bottleneck of AI chips, 3D stacked architectures that employ advanced packaging technology with high-density through-silicon vias (TSVs) pins have proven to be a promising solution. The 3D-stacked AI chip enables ultra-high memory bandwidth between compute and memory by stacking numero…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

LLM as Clinical Graph Structure Refiner: Enhancing Representation Learning in EEG Seizure Diagnosis

2026-04-30 · Lincan Li, Zheng Chen, Yushun Dong

General AI

Electroencephalogram (EEG) signals are vital for automated seizure detection, but their inherent noise makes robust representation learning challenging. Existing graph construction methods, whether correlation-based or learning-based, often generate redundant or irrelevant edges due to the noisy nature of EEG data. Thi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.3

HyCOP: Hybrid Composition Operators for Interpretable Learning of PDEs

2026-05-01 · Jinpai Zhao, Nishant Panda, Yen Ting Lin, Eirik Valseth, Diane Oyen, Clint Dawson

General AI

We introduce HyCOP, a modular framework that learns parametric PDE solution operators by composing simple modules (advection, diffusion, learned closures, boundary handling) in a query-conditioned way. Rather than learning a monolithic map, HyCOP learns a policy over short programs - which module to apply and for how l…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

Parameter-Efficient Fine-Tuning with Learnable Rank

2026-06-03 · Arpit Garg, Simon Lucey, Hemanth Saratchandran

General AI

Low-Rank Adaptation (LoRA) is a popular parameter-efficient fine-tuning (PEFT) method that restricts weight updates to low-rank adapters, introducing a fixed low-rank inductive bias by optimizing in a low-dimensional subspace. In this work, we question whether a fixed-rank constraint is the most effective inductive bia…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

SentinelBench: A Benchmark for Long-Running Monitoring Agents

2026-06-03 · Matheus Kunzler Maldaner, Adam Fourney, Amanda Swearngin, Hussein Mozzanar, Gagan Bansal, Maya Murad, Rafah Hosn, Saleema Amershi, Hussein Mozannar

General AI

AI agents are increasingly asked to carry out work that spans minutes, hours, or longer. Yet the default model of agent behavior is continuous action: issuing tool calls, refreshing pages, searching for alternatives, or otherwise trying to force progress. This is the wrong approach for many long-running tasks, which ar…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

Operation-Guided Progressive Human-to-AI Text Transformation Benchmark for Multi-Granularity AI-Text Detection

2026-06-04 · Sondos Mahmoud Bsharat, Jiacheng Liu, Xiaohan Zhao, Tianjun Yao, Xinyi Shang, Yi Tang, Jiacheng Cui, Ahmed Elhagry, Salwa K. Al Khatib, Hao Li, Salman Khan, Zhiqiang Shen

General AI

As AI writing assistants become increasingly integrated into real-world drafting and revision workflows, many documents are no longer purely human-written or AI-generated, but instead result from progressive human-AI co-editing. However, existing AI-text detection benchmarks largely focus on final outputs and provide l…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

Where Should Knowledge Enter? A Layered Framework for Knowledge Infusion in Multimodal Iterative Generative Mo

2026-06-04 · Renjith Prasad, Chathurangi Shyalika, Anushka Pawar, Amit Sheth

General AI

Multimodal generative models produce fluent outputs but remain unreliable when generation must respect structured, domain-specific, or safety-critical knowledge. Existing methods incorporate knowledge through mechanisms such as prompt augmentation, guidance, latent editing, or fine-tuning, yet they are typically catego…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

TraRA: Trajectory-level Recognition Aggregation for Video Text Spotting in Urban Surveillance

2026-06-05 · Duc Tri Tran, Trung Thanh Nguyen, Vijay John, Phi Le Nguyen, Yasutomo Kawanishi

General AI

Video Text Spotting (VTS) is essential for urban surveillance and intelligent transportation systems, enabling automated reading of street signs, vehicle markings, and scene text in video streams. However, reliable recognition remains challenging due to dynamic video factors common in surveillance scenarios, including …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

Muon Learns More Robust and Transferable Features than Adam

2026-06-08 · Tianyu Ruan, Fengzhuo Zhang, Shuche Wang, Shihua Zhang

General AI

Muon has recently emerged as a state-of-the-art optimizer for pretraining Large Language Models (LLMs) and vision classifiers. Despite its efficiency advantage over Adam and SGD, the feature-learning advantage of Muon remains unclear. This paper investigates Muon's feature-learning advantage through the lens of robustn…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

Rethinking the Divergence Regularization in LLM RL

2026-06-08 · Jiarui Yao, Xiangxin Zhou, Penghui Qi, Wee Sun Lee, Liefeng Bo, Tianyu Pang

General AI

Reinforcement learning (RL) has become a key component of post-training large language models (LLMs). In practice, LLM RL is often off-policy because of training-inference mismatch and policy staleness, making trust-region control essential for stable optimization. Mainstream methods such as PPO and GRPO approximate th…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model

2026-06-08 · Wendy K. Tam

General AI

The ambition behind alignment training is to make large language models safe and useful. The primary mechanism, reinforcement learning from human feedback (RLHF), shapes the behavior of deployed language models by aligning them with ``human values.'' Yet the process is opaque. What values are being encoded; whose value…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

ARM: An AutoRegressive Large Multimodal Model with Unified Discrete Representations

2026-06-09 · Junke Wang, Xiao Wang, Jiacheng Pan, Xuefeng Hu, Feng Li, Jingxiang Sun, Chaorui Deng, Zilong Chen, Yunpeng Chen, Kaibin Tian, Matthew Gwilliam, Hao Chen, Danhui Guan, Kun Xu, Weilin Huang, Zuxuan Wu, Haoqi Fan, Yu-Gang Jiang, Zhenheng Yang

General AI

This paper introduces ARM, a discrete representation-based AutoRegressive Model that unifies image understanding, generation, and editing within a next-token prediction framework. ARM is built on three efforts: first, we train a discrete semantic visual tokenizer that maps images into compact token sequences. Our token…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

Monte Carlo Pass Search: Using Trajectory Generation for 3D Counterfactual Pass Evaluation in Football

2026-06-09 · Andrew Kang, Priya Narasimhan

General AI

We recast pass evaluation in football (soccer) as a Monte Carlo Tree Search (MCTS)-like evaluation problem whose components mostly exist in the literature under different names: a value model (possession value), a world model (multi-agent trajectories with ball interactions), and a policy over counterfactual actions (s…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

When to Align, When to Predict: A Phase Diagram for Multimodal Learning

2026-06-09 · Ilay Kamai, Hugues Van Assel, Aviv Regev, Hagai B. Perets, Randall Balestriero

General AI

Cross-modal alignment (CA) and cross-modal prediction (CP) are the dominant paradigms for multimodal representation learning, yet there is no systematic understanding of when each succeeds, when each fails, and when cross-modal training helps at all -- a gap that leaves practitioners, especially in scientific domains l…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

2026-06-11 · Amy Xin, Jiening Siow, Junjie Wang, Zijun Yao, Fanjin Zhang, Jian Song, Lei Hou, Juanzi Li

General AI

LLM-based agents have shown increasing potential in automating scientific discovery. Given an optimizable metric and an execution environment, they can propose, validate, and iterate scientific solutions, and have produced results that outperform human-designed approaches. As model capabilities continue to improve, we …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

Clay-CNN Hybrids: Leveraging Geospatial Foundation Models as Auxiliary Context for Landslide Detection

2026-06-12 · Huong Binh Vu

General AI

Rapid post-event landslide mapping is essential for disaster response but remains difficult to automate due to extreme class imbalance. This study evaluates whether Clay v1.5, a Geospatial Foundation Model (GFM), can improve pixel-level landslide segmentation on the Landslide4Sense (L4S) benchmark, which contains 3,799…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

LatentGym: A Testbed For Cross-Task Experiential Learning With Controllable Latent Structure

2026-06-13 · Daksh Mittal, Tommaso Castellani, Thomson Yen, Naimeng Ye, Fangyu Wu, Minghui Chen, Tiffany Cai, Emmanouil Koukoumidis, William Zeng, Hongseok Namkoong

General AI

We envision continually learning agentic systems that become more useful over time: as they encounter sequences of related tasks, they should infer the hidden structure shared across those tasks and use it to improve future decisions. This cross-task experiential learning capability is pivotal in domains such as person…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

Toward the Whole Picture: Accumulative Fingerprint Mapping and Reconstruction for Small-Area Mobile Sensors

2026-06-14 · Xiongjun Guan, Jianjiang Feng, Jie Zhou

Research Track A · General AI

Small-area fingerprint sensing on mobile devices creates a fundamental mismatch between acquisition and recognition: each touch captures only a tiny, pose-varying local patch, while reliable biometric matching ultimately requires a stable and sufficiently complete fingerprint representation. Existing pipelines largely …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

ExpRL: Exploratory RL for LLM Mid-Training

2026-06-15 · Violet Xiang, Amrith Setlur, Chase Blagden, Nick Haber, Aviral Kumar

General AI

Sparse reward reinforcement learning (RL) has become a standard tool for improving LLM reasoning, but its success depends critically on the coverage present in the base model. In practice, models are often primed for RL through \emph{mid-training} on curated reasoning traces that teach useful primitive skills such as d…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

Fixed-Point Reasoners: Stable and Adaptive Deep Looped Transformers

2026-06-16 · Sajad Movahedi, Vera Milovanović, Shlomo Libo Feigin, Alexander Theus, Thomas Hofmann, Valentina Boeva, T. Konstantin Rusch, Antonio Orvieto

General AI

Looped architectures provide an inductive bias toward learning step-by-step procedures for tasks that require compositional reasoning. The number of effective layers reached by looping determines the quality of the solution these models find. Like deep architectures, looped architectures are prone to a signal propagati…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

Predicting Immune Biomarkers with MultiModal Mixture-of-Expert Pathology Foundation Models Empowers Precision Oncology

2026-06-16 · Tianyu Liu, Ziqing Wang, Zhaokang Liang, Tong Ding, Peter Humphrey, Lorraine Colón-Cartagena, Emily Ling-Lin Pai, Kenneth Tou En Chang, Mohamed Kahila, Jonathan Chong Kai Liew, Tinglin Huang, Rex Ying, Kaize Ding, Faisal Mahmood, Wengong Jin

General AI

Predicting immune biomarkers associated with the tumor immune microenvironment (TIME) is critical for advancing precision oncology, yet existing approaches are largely limited to single image modalities and suffer from insufficient resolution and incomplete utilization of complementary clinical and biological informati…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.3

CodeSentinel: A Three-Layer Defense Against Indirect Prompt Injection in Code Contexts

2026-06-17 · Po-Han Cheng, Chia-Mu Yu, Ying-Dar Lin, Yu-Sung Wu, Wei-Bin Lee

General AI

Code large language models increasingly retrieve external code context from repositories, documentation, issue threads, and coding-agent environments, creating an indirect prompt-injection surface where attackers hide instructions in comments, strings, identifiers, or decoy code. We propose CodeSentinel, a three-layer …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.2

Grading the Grader: Lessons from Evaluating an Agentic Data Analysis System

2026-06-23 · Tian Zheng, Kai-Tai Hsu

General AI

Agentic data analysis systems produce rich outputs, including code, numerical results, and verbal diagnostics. This makes them more challenging to evaluate than single-turn LLM responses. It is therefore necessary to distinguish genuine disagreement between an agent's output and a ground-truth answer from grading artif…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.2

Vision-Language Model Reasoning for Contextual Semantic Mapping in Intralogistics

2026-06-23 · Marvin Rüdt, Hao Pang, Constantin Enke, Zäzilia Seibold, Kai Furmans

General AI

Autonomous mobile robots operating in intralogistics environments rely on geometric maps for localization and navigation, but lack semantic understanding of objects and their contextual properties. We present a contextual semantic mapping pipeline that combines SLAM-based geometric mapping, SAM-based instance segmentat…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.2

Cross-Attention Multimodal Learning for Predicting Response to Neoadjuvant Imatinib in Gastrointestinal Stromal Tumors: A Multicenter Retrospective Study

2026-06-24 · Fariba Tohidinezhad, Douwe J. Spaanderman, Natalia Oviedo Acosta, Kaouther Mouheb, Karthik Prathaban, David F. Hanff, Dirk J. Grünhagen, Cornelis Verhoef, Joris M. van Sabben, Evelyne Roets, Jette J. Slettenhaar, Hans Gelderblom, Ingrid M. E. Desar, Anna K. L. Reyners, Neeltje Steeghs, Stefan Klein, Martijn P. A. Starmans

General AI

Background: Response to neoadjuvant imatinib in gastrointestinal stromal tumors (GISTs) is highly variable and cannot be reliably predicted using current clinical or molecular markers. This study developed and evaluated an explainable multimodal deep learning framework integrating computed tomography (CT) imaging and c…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.2

Helpful or Harmful? Evaluating LLM-Assisted Vulnerability Patching via a Human Study

2026-06-24 · Giulian Biolo, Michael Tezza, Yuanjun Gong, Fabio Massacci

General AI

Software vulnerability remediation is a cognitively demanding task that requires specialized security expertise often lacking in general developers. In the meantime, Large Language Models (LLMs) assisted tools show potential in vulnerability detection, location, and repair tasks. [Hypothesis:] While LLM-assistance is h…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.2

MVTrack4Gen: Multi-View Point Tracking as Geometric Supervision for 4D Video Generation

2026-06-24 · JoungBin Lee, Jaewoo Jung, Jongmin Lee, Tongmin Kim, Hyunsung Kim, Takuya Narihira, Kazumi Fukuda, Jahyeok Koo, Jisang Han, Yuki Mitsufuji, Seungryong Kim

General AI

Synthesizing a novel-view video from a monocular reference video along a target camera trajectory requires both geometric consistency and motion fidelity with respect to the reference video. Existing methods based on explicit 3D representations are limited by the accuracy of off-the-shelf reconstruction modules, which …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.2

Weave of Formal Thought

2026-06-24 · Alexandre Bouayad

General AI

Large language models (LLMs) attain remarkable surface fluency on code, yet they neither formally guarantee the syntactic validity of their output nor leverage the hierarchical structure defining the target language. While existing constrained-decoding frameworks address the former, they operate under rigid assumptions…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.0

Anansi: Scalable Characterization of Message-Based Job Scams

2026-02-27 · Abisheka Pitumpe, Amir Rahmati

Research Track B · General AI

Job-based smishing scams, where victims are recruited under the guise of remote job opportunities, represent a rapidly growing and understudied threat within the broader landscape of online fraud. In this paper, we present Anansi, the first scalable, end-to-end measurement pipeline designed to systematically engage wit…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.0

Associative Constructive Evolution: Enhancing Metaheuristics through Hebbian-Learned Generative Guidance

2026-03-31 · Shanxian Lin, Yuichi Nagata, Haichuan Yang

Research Track A

Metaheuristic algorithms such as Particle Swarm Optimization (PSO) and Evolutionary Algorithms (EA) excel at exploring solution spaces but lack mechanisms to accumulate and reuse procedural knowledge from successful search trajectories. This paper proposes Associative Constructive Evolution (ACE), a framework that enha…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 8.0

NetSecBed: A Container-Native Testbed for Reproducible Cybersecurity Experimentation

2026-04-05 · Leonardo Bitzki, Diego Kreutz, Tiago Heinrich, Douglas Fideles, Leandro Bertholdo, Silvio Quincozes, Angelo Diniz

Research Track A

Cybersecurity research increasingly depends on reproducible evidence, such as traffic traces, logs, and labeled datasets, yet most public datasets remain static and offer limited support for controlled re-execution and traceability, especially in heterogeneous multi-protocol environments. This paper presents NetSecBed,…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.0

Failure Ontology: A Lifelong Learning Framework for Blind Spot Detection and Resilience Design

2026-04-12 · Yuan Sun, Hong Yi, Jinyuan Liu

Research Track A

Personalized learning systems are almost universally designed around a single objective: help people acquire knowledge and skills more efficiently. We argue this framing misses the more consequential problem. The most damaging failures in human life-financial ruin, health collapse, professional obsolescence-are rarely …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.0

Physics-Informed Neural Networks for Methane Sorption: Cross-Gas Transfer Learning, Ensemble Collapse Under Physics Constraints, and Monte Carlo Dropout Uncertainty Quantification

2026-04-15 · Mohammad Nooraiepour, Zezhang Song, Wei Li, Sarah Perez

Research Track A

Accurate methane sorption prediction across heterogeneous coal ranks requires models that combine thermodynamic consistency, efficient knowledge transfer across data-scarce geological systems, and calibrated uncertainty estimates, capabilities that are rarely addressed together in existing frameworks. We present a phys…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.0

MIPIC: Matryoshka Representation Learning via Self-Distilled Intra-Relational and Progressive Information Chaining

2026-04-27 · Phung Gia Huy, Hai An Vu, Minh-Phuc Truong, Thang Duc Tran, Linh Ngo Van, Thanh Hong Nguyen, Trung Le

Research Track A · General AI

Representation learning is fundamental to NLP, but building embeddings that work well at different computational budgets is challenging. Matryoshka Representation Learning (MRL) offers a flexible inference paradigm through nested embeddings; however, learning such structures requires explicit coordination of how inform…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 8.0

Perceptual Flow Network for Visually Grounded Reasoning

2026-05-04 · Yangfu Li, Yuning Gong, Hongjian Zhan, Teng Li, Yuanhuiyi Lyu, Tianyi Chen, Qi Liu, Ziyuan Huang, Zhihang Zhong, Dandan Zheng, Yue Lu

General AI

Despite the success of Large-Vision Language Models (LVLMs), general optimization objectives (e.g., standard MLE) fail to constrain visual trajectories, leading to language bias and hallucination. To mitigate this, current methods introduce geometric priors from visual experts as additional supervision. However, we obs…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.0

HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents

2026-05-18 · Woongyeng Yeo, Yumin Choi, Taekyung Ki, Sung Ju Hwang

General AI

Training long-horizon LLM agents with reinforcement learning is challenging because sparse outcome rewards reveal whether a task succeeds, but not which intermediate actions caused the outcome or how they should be corrected. Recent methods alleviate this issue by generating rewards or textual hints from turn-level act…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.0

Continual Segmentation under Joint Nonstationarity

2026-05-19 · Prashant Pandey, Himanshu Kumar, Devineni Sri Venkatraya Chowdary, Brejesh Lall

Research Track A

Evolving data streams induce joint nonstationarity in continual semantic segmentation, where semantic classes, input distributions, and supervision availability change simultaneously over time. This setting reflects practical structured prediction systems, yet remains largely unexplored in prior continual learning work…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.0

From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models

2026-05-19 · Juncheng Wu, Hardy Chen, Haoqin Tu, Xianfeng Tang, Freda Shi, Hui Liu, Hanqing Lu, Cihang Xie, Yuyin Zhou

General AI

Recent advances in vision-language models (VLMs) emphasize long chain-of-thought reasoning; yet, we find that their performance on visual tasks is primarily limited by a lack of visual perception as opposed to reasoning itself. In this work, we systematically study the interplay between perception and reasoning in VLM …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.0

Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases

2026-05-26 · Dongyoon Hahm, Dylan Hadfield-Menell, Kimin Lee

General AI

Reinforcement Learning from Human Feedback (RLHF) is the standard method to align Large Language Models (LLMs) with human preferences. In this work, we introduce alignment tampering, a potential vulnerability where the LLM undergoing alignment influences the preference dataset, causing RLHF to amplify undesired behavio…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.0

DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization

2026-05-29 · Jian Mu, Tianyi Lin, Chengwei Qin, Zhongxiang Dai, Yao Shu

General AI

Large language models are increasingly deployed in multi-turn interactive settings where users or environments can iteratively provide lightweight feedback. Unfortunately, optimizing such behavior presents a sharp dilemma in practice: online reinforcement learning is able to effectively address multi-turn dynamics but …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.0

DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory

2026-05-29 · Zhenhao Yang, Xiaoshi Wu, Zhengyao Lv, Xiaoyu Shi, Xintao Wang, Pengfei Wan, Kun Gai, Kwan-Yee K. Wong

General AI

Recent advances in video generative models have promoted rapid progress in controllable world models. However, maintaining fine-grained spatio-temporal consistency under long-horizon reasoning remains a key challenge. In this work, we move beyond explicit 3D memory and coarse frame-level implicit modeling, and propose …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.0

Tangram: Unlocking Non-Uniform KV Cache Compression for Efficient Multi-turn LLM Serving

2026-06-15 · Hyungmin Kim, Minsoo Kim, Hongseok Kim, Jungwook Choi

Research Track A · General AI

Multi-turn LLM serving accumulates dialogue history whose Key-Value (KV) cache grows with every turn and every user, quickly exceeding the model weights themselves and making memory -- not compute -- the binding constraint on throughput. Non-uniform KV compression, which allocates heterogeneous budgets across attention…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.8

QCalEval: Benchmarking Vision-Language Models for Quantum Calibration Plot Understanding

2026-04-28 · Shuxiang Cao, Zijian Zhang, Abhishek Agarwal, Grace Bratrud, Niyaz R. Beysengulov, Daniel C. Cole, Alejandro Gómez Frieiro, Elena O. Glen, Hao Hsu, Gang Huang, Raymond Jow, Greshma Shaji, Tom Lubowe, Ligeng Zhu, Luis Mantilla Calderón, Nicola Pancotti, Joel Pendleton, Brandon Severin, Charles Etienne Staub, Sara Sussman, Antti Vepsäläinen, Neel Rajeshbhai Vora, Yilun Xu, Varinia Bernales, Daniel Bowring, Elica Kyoseva, Ivan Rungger, Giulia Semeghini, Sam Stanwyck, Timothy Costa, Alán Aspuru-Guzik, Krysta Svore

Research Track A · General AI

Quantum computing calibration depends on interpreting experimental data, and calibration plots provide the most universal human-readable representation for this task, yet no systematic evaluation exists of how well vision-language models (VLMs) interpret them. We introduce QCalEval, the first VLM benchmark for quantum …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.8

Enhancing RL Generalizability in Robotics through SHAP Analysis of Algorithms and Hyperparameters

2026-05-04 · Lingxiao Kong, Cong Yang, Oya Deniz Beyan, Zeyd Boukhers

General AI

Despite significant advances in Reinforcement Learning (RL), model performance remains highly sensitive to algorithm and hyperparameter configurations, while generalization gaps across environments complicate real-world deployment. Although prior work has studied RL generalization, the relative contribution of specific…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.8

HAAS: A Policy-Aware Framework for Adaptive Task Allocation Between Humans and Artificial Intelligence Systems

2026-05-04 · Vicente Pelechanoa, Antoni Mestre, Manoli Albert, Miriam Gil

General AI

Deciding how to distribute work between humans and AI systems is a central challenge in organisational design. Most approaches treat this as a binary choice, yet the operational reality is richer: humans and AI routinely share tasks or take complementary roles depending on context, fatigue, and the stakes involved. Gov…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.8

MolmoAct2: Action Reasoning Models for Real-world Deployment

2026-05-04 · Haoquan Fang, Jiafei Duan, Donovan Clay, Sam Wang, Shuo Liu, Weikai Huang, Xiang Fan, Wei-Chuan Tsai, Shirui Chen, Yi Ru Wang, Shanli Xing, Jaemin Cho, Jae Sung Park, Ainaz Eftekhar, Peter Sushko, Karen Farley, Angad Wadhwa, Cole Harrison, Winson Han, Ying-Chun Lee, Eli VanderBilt, Rose Hendrix, Suveen Ellawela, Lucas Ngoo, Joyce Chai, Zhongzheng Ren, Ali Farhadi, Dieter Fox, Ranjay Krishna

General AI

Vision-Language-Action (VLA) models aim to provide a single generalist controller for robots, but today's systems fall short on the criteria that matter for real-world deployment. Frontier models are closed, open-weight alternatives are tied to expensive hardware, reasoning-augmented policies pay prohibitive latency fo…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.8

Virtual Scanning for NSCLC Histology: Investigating the Discriminatory Power of Synthetic PET

2026-05-04 · Fatih Aksu, Laura Ciuffetti, Francesco Di Feola, Filippo Ruffini, Giulia Romoli, Fabrizia Gelardi, Arturo Chiti, Valerio Guarrasi, Paolo Soda

General AI

Accurate histological differentiation between adenocarcinoma (ADC) and squamous cell carcinoma (SCC) is critical for personalized treatment in non-small cell lung cancer (NSCLC). While [$^{18}$F]FDG PET/CT is a standard tool for the clinical evaluation of lung cancer, its utility is often limited by high costs and radi…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.8

EMO: Pretraining Mixture of Experts for Emergent Modularity

2026-05-07 · Ryan Wang, Akshita Bhagia, Sewon Min

General AI

Large language models are typically deployed as monolithic systems, requiring the full model even when applications need only a narrow subset of capabilities, e.g., code, math, or domain-specific knowledge. Mixture-of-Experts (MoEs) seemingly offer a potential alternative by activating only a subset of experts per inpu…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.8

Quantifying Trade-Offs Between Stability and Goal-Obfuscation

2026-05-07 · Yixuan Wang, Dan Guralnik, Warren Dixon

General AI

Safety-critical autonomy in adversarial settings demands more than Lyapunov stability of tracking error signals. An agent executing a goal-directed trajectory is intrinsically legible to a passive observer running online Bayesian inference, because the contractive dynamics of any Lyapunov basin of attraction concentrat…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.8

Task-Aware Answer Preservation under Audio Compression for Large Audio Language Models

2026-05-07 · Amir Ivry

General AI

Large audio language models (LALMs) are increasingly used to reason over long audio clips, yet deployment often compresses audio before inference to reduce memory and latency. The risk is that compression can leave aggregate accuracy acceptable while sharply degrading answers for a deployment-critical query family. We …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.8

From Web to Pixels: Bringing Agentic Search into Visual Perception

2026-05-12 · Bokang Yang, Xinyi Sun, Kaituo Feng, Xingping Dong, Dongming Wu, Xiangyu Yue

General AI

Visual perception connects high-level semantic understanding to pixel-level perception, but most existing settings assume that the decisive evidence for identifying a target is already in the image or frozen model knowledge. We study a more practical yet harder open-world case where a visible object must first be resol…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.8

The Algorithmic Caricature: Auditing LLM-Generated Political Discourse Across Crisis Events

2026-05-12 · Gunjan, Sidahmed Benabderrahmane, Talal Rahwan

General AI

Large Language Models (LLMs) can generate fluent political text at scale, raising concerns about synthetic discourse during crises and social conflict. Existing AI-text detection often focuses on sentence-level cues such as perplexity, burstiness, or token irregularities, but these signals may weaken as generative syst…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.8

Inferential Privacy Leakage in Anonymized Conversational AI Logs

2026-05-22 · S M Mehedi Zaman, Kiran Garimella

General AI

Hundreds of millions of users now hold detailed, multi-turn conversations with ChatGPT and similar LLM assistants. We measure two privacy-relevant features of these conversations on a corpus of complete ChatGPT histories donated by over 1,000 users in four Global South countries (Brazil, India, Nigeria, Pakistan). Firs…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.8

Gram: Assessing sabotage propensities via automated alignment auditing

2026-05-28 · David Lindner, Victoria Krakovna, Sebastian Farquhar

General AI

We introduce Gram, an automated alignment auditing framework to assess the propensity of AI agents to engage in sabotage. We evaluate Gemini models across 17 simulated agentic deployment scenarios that incentivize sabotage. We find Gemini models misbehave in about 2-3% of our simulated trajectories. Many of these cases…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.8

On-Policy Replay for Continual Supervised Fine-Tuning

2026-05-28 · Yan Chen, Taojie Zhu, Meng Zhang, Xin Chen, Jiaqi Huang, Dongyang Xu, Yizhi Wang

Research Track A · General AI

Continual supervised fine-tuning (SFT) is the de facto recipe for adapting large language models (LLMs) to a stream of downstream tasks, but it suffers from catastrophic forgetting of earlier capabilities. Recent work shows that on-policy signals -- training on the model's own outputs -- reduce forgetting more reliably…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.8

Can Generative AI help people navigate Radical Moral Disagreements? The CONSIDER prototype

2026-05-29 · William Hohnen-Ford, Sarah Chen, Kathryn B. Francis, Madeline G. Reinecke, Ilina Singh, David Lyreskog

General AI

Radical Moral Disagreements (RMDs) are highly polarising topics that are increasingly censored in everyday life, with growing evidence suggesting that this polarisation carries measurable costs to public mental health. To address these challenges, some researchers have proposed Large Language Models (LLMs) as a means t…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.8

CoFiDA-M: Concept-Aware Feature Modulation for Cross-Domain Adaptation with Image-Only Inference

2026-05-29 · Nurjahan Sultana, Moi Hoon Yap, Xinqi Fan, Wenqi Lu

General AI

Models for AI-based skin cancer screening suffer a severe performance drop when shifting from expert dermoscopic (source) images to consumer-grade clinical (target) images, hindering real-world deployment. Existing domain adaptation methods often ignore crucial semantic invariants, such as clinical concepts. While new …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.8

Recognizing Co-Speech Gestures in-the-Wild

2026-05-29 · Sindhu B Hegde, K R Prajwal, Andrew Zisserman

General AI

While humans naturally gesture during speech, only a sparse subset of these movements are visually depictive and semantically linked to specific spoken words. Current multimodal models struggle to capture these semantic co-speech gestures, heavily bottlenecked by a lack of precisely annotated training data. To address …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.8

Representation Forcing for Bottleneck-Free Unified Multimodal Models

2026-05-29 · Yuqing Wang, Zhijie Lin, Ceyuan Yang, Yang Zhao, Fei Xiao, Hao He, Qi Zhao, Zihan Ding, Fuyun Wang, Shuai Wang, Youliang Zhang, Haoqi Fan, Xihui Liu

General AI

Unified multimodal models (UMMs) aim to handle perception and generation in a single model. Yet existing UMMs still rely on a frozen, separately pretrained VAE for image generation, imposing a structural bottleneck. Naively removing it introduces a quality gap, as the model must learn both high-level structure and low-…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.8

Semantic Triplet Restoration: A Novel Protocol for Hierarchical Table Understanding in Large Language Models

2026-05-29 · Yibin Zhao, Fangxin Shang, Dingrui Yang, Yuqi Wang

General AI

Table question answering requires models to recover semantic relations encoded implicitly by two-dimensional layout, merged cells, and hierarchical headers. Current pipelines typically use HTML or Markdown as intermediate table representations, but these layout-oriented serializations introduce markup overhead and requ…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.8

TunerDiT: Training-free Progressive Steering of Diffusion Transformer for Multi-Event Video Generation

2026-05-29 · Ruotong Liao, Guowen Huang, Qing Cheng, Guangyao Zhai, Lei Zhang, Xun Xiao, Thomas Seidl, Daniel Cremers, Volker Tresp

General AI

Text-to-video (T2V) generation faces challenging questions when generating videos with long horizons containing multiple events. Inspired by the intrinsics of the diffusion process, we probe video diffusion transformers (DiTs) and uncover intrinsic turning points in the DiT denoising trajectory where conditioning text …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.8

Pessimism's Paradox: Conservative Offline Training Amplifies Reward Hacking During Online Adaptation in Reasoning Models

2026-06-29 · Subramanyam Sahoo, Aman Chadha, Vinija Jain, Divya Chaudhary

General AI

Conservative offline training is widely advocated as a safe foundation for subsequent online adaptation: if a policy stays close to well-supported behaviour, the argument goes, it is less likely to exploit imperfections in a learned reward model. We challenge this intuition empirically and mechanistically. We train a Q…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.8

Residual-Guided Expert Specialization for Incomplete Multimodal Learning

2026-06-29 · Seunghun Baek, Jihwan Park, Jaeyoon Sim, Minjae Jeong, Hoseok Lee, Won Hwa Kim

General AI

As real-world prediction systems often face missing modalities at inference, incomplete multimodal learning (IML) remains a practical challenge. While prior methods aim to learn representations robust to missing inputs, representations from incomplete modalities inevitably deviate from their full-modality counterparts …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.6

G-RRM: Guiding Symbolic Solvers with Recurrent Reasoning Models

2026-07-02 · Timo Bertram, Sidhant Bhavnani, Richard Freinschlag, Erich Kobler, Andreas Mayr, Günter Klambauer

General AI

In this work, we focus on SE-RRMs, a symbol-equivariant instantiation of RRMs that exhibits improved extrapolation to larger problem sizes. We propose a neuro-symbolic approach, ``Guiding with Recurrent Reasoning Models'' (G-RRM), which integrates SE-RRMs with symbolic solvers for constraint satisfaction problems. SE-R…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.6

Learning Spectral and Polarimetric Clues for One-to-Multimodal Novel View Synthesis

2026-07-02 · Federico Lincetto, Gianluca Agresti, Mattia Rossi, Piergiorgio Sartor, Pietro Zanuttigh

General AI

Neural rendering techniques allow for accurate reconstruction of the geometry and color appearance of 3D scenes. Some methods have extended their use to additional imaging modalities, such as multispectral, infrared, or polarimetric data. However, all of these approaches require expensive sensors and calibrated setups …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

Make Geometry Matter for Spatial Reasoning

2026-03-27 · Shihua Zhang, Qiuhong Shen, Shizun Wang, Tianbo Pan, Xinchao Wang

General AI

Empowered by large-scale training, vision-language models (VLMs) achieve strong image and video understanding, yet their ability to perform spatial reasoning in both static scenes and dynamic videos remains limited. Recent advances try to handle this limitation by injecting geometry tokens from pretrained 3D foundation…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

MMFace-DiT: A Dual-Stream Diffusion Transformer for High-Fidelity Multimodal Face Generation

2026-03-30 · Bharath Krishnamurthy, Ajita Rattani

General AI

Recent multimodal face generation models address the spatial control limitations of text-to-image diffusion models by augmenting text-based conditioning with spatial priors such as segmentation masks, sketches, or edge maps. This multimodal fusion enables controllable synthesis aligned with both high-level semantic int…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

Therefore I am. I Think

2026-04-02 · Esakkivel Esakkiraja, Sai Rajeswar, Denis Akhiyarov, Rajagopal Venkatesaramani

General AI

We consider the question: when a large language reasoning model makes a choice, did it think first and then decide to, or decide first and then think? In this paper, we present evidence that detectable, early-encoded decisions shape chain-of-thought in reasoning models. Specifically, we show that a simple linear probe …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

AURA: Always-On Understanding and Real-Time Assistance via Video Streams

2026-04-05 · Xudong Lu, Yang Bo, Jinpeng Chen, Shuhan Li, Xintong Guo, Huankang Guan, Fang Liu, Dunyuan Xu, Peiwen Sun, Heyang Sun, Rui Liu, Hongsheng Li

General AI

Video Large Language Models (VideoLLMs) have achieved strong performance on many video understanding tasks, but most existing systems remain offline and are not well-suited for live video streams that require continuous observation and timely response. Recent streaming VideoLLMs have made progress, yet current approach…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

2026-04-06 · Zhengqing Yuan, Hanchi Sun, Lichao Sun, Yanfang Ye

General AI

We present MegaTrain, a memory-centric system that efficiently trains 100B+ parameter large language models at full precision on a single GPU. Unlike traditional GPU-centric systems, MegaTrain stores parameters and optimizer states in host memory (CPU memory) and treats GPUs as transient compute engines. For each layer…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

2026-04-08 · Qihan Ren, Peng Wang, Ruikun Cai, Shuai Shao, Dadi Guo, Yuejin Xie, Yafu Li, Quanshi Zhang, Xia Hu, Jing Shao, Dongrui Liu

General AI

A prevailing narrative in LLM post-training holds that supervised finetuning (SFT) memorizes while reinforcement learning (RL) generalizes. We revisit this claim for reasoning SFT with long chain-of-thought (CoT) supervision and find that cross-domain generalization is not absent but conditional, jointly shaped by opti…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator

2026-04-09 · Luozheng Qin, Jia Gong, Qian Qiao, Tianjiao Li, Li Xu, Haoyu Pan, Chao Qu, Zhiyu Tan, Hao Li

General AI

Unified multimodal models integrating visual understanding and generation face a fundamental challenge: visual generation incurs substantially higher computational costs than understanding, particularly for video. This imbalance motivates us to invert the conventional paradigm: rather than extending understanding-centr…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music

2026-04-13 · Sreyan Ghosh, Arushi Goel, Kaousheik Jayakumar, Lasha Koroshinadze, Nishit Anand, Zhifeng Kong, Siddharth Gururani, Sang-gil Lee, Jaehyeon Kim, Aya Aljafari, Chao-Han Huck Yang, Sungwon Kim, Ramani Duraiswami, Dinesh Manocha, Mohammad Shoeybi, Bryan Catanzaro, Ming-Yu Liu, Wei Ping

General AI

We present Audio Flamingo Next (AF-Next), the next-generation and most capable large audio-language model in the Audio Flamingo series, designed to advance understanding and reasoning over speech, environmental sounds and music. Compared to Audio Flamingo 3, AF-Next introduces: (i) a stronger foundational audio-languag…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

Artificial Intelligence Index Report 2026

2026-04-14 · Sha Sajadieh, Loredana Fattorini, Raymond Perrault, Yolanda Gil, Vanessa Parli, Lapo Santarlasci, Juan Pava, Nestor Maslej, Russ Altman, Erik Brynjolfsson, Carla Brodley, Jack Clark, Virginia Dignum, Vipin Kumar, James Landay, Terah Lyons, James Manyika, Juan Carlos Niebles, Yoav Shoham, Elham Tabassi, Russell Wald, Toby Walsh, Dan Weld

General AI

Welcome to the ninth edition of the AI Index report. As AI continues to advance rapidly, the question becomes whether the systems built around it can keep up. Governance frameworks, evaluation methods, education systems, and the data infrastructure needed to track AI's impact are struggling to match the pace of the tec…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.5

LightTune: Lightweight Forward-Only Online Fine-Tuning with Applications to Link Adaptation

2026-04-14 · Ramy E. Ali, Federico Penna

Research Track A

Deploying machine learning (ML) algorithms on mobile phones is bottlenecked by performance degradation under dynamic, real-world conditions that differ from the offline training conditions. While continual learning and adaptation are essential to mitigate this distributional shift, conventional online learning methods …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification

2026-04-15 · Wangjie Gan, Miao Pan, Linbo Xi, Wenqi Zhang, Jintao Chen, Jianwei Yin, Xuhong Zhang

General AI

Large language models are typically post-trained using supervised fine-tuning (SFT) and reinforcement learning (RL), yet effectively unifying efficient knowledge injection with robust generalization remains challenging. In this work, we provide a training-dynamics analysis showing that SFT can be interpreted as a speci…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

Learning Adaptive Reasoning Paths for Efficient Visual Reasoning

2026-04-16 · Yixu Huang, Tinghui Zhu, Muhao Chen

General AI

Visual reasoning models (VRMs) have recently shown strong cross-modal reasoning capabilities by integrating visual perception with language reasoning. However, they often suffer from overthinking, producing unnecessarily long reasoning chains for any tasks. We attribute this issue to Reasoning Path Redundancy in visual…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

Switch-KD: Visual-Switch Knowledge Distillation for Vision-Language Models

2026-04-16 · Haoyi Sun, Xiaoxiao Wang, Ning Mao, Qian Wang, Lifu Mu, Wen Zheng, Tao Wei, Wei Chen

General AI

Vision-Language Models (VLMs) have shown remarkable capabilities in joint vision-language understanding, but their large scale poses significant challenges for deployment in resource-constrained scenarios. Knowledge Distillation (KD) offers a viable way to improve model capabilities without increasing model size or dat…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

Qwen3.5-Omni Technical Report

2026-04-17 · Qwen Team

General AI

In this work, we present Qwen3.5-Omni, the latest advancement in the Qwen-Omni model family. Representing a significant evolution over its predecessor, Qwen3.5-Omni scales to hundreds of billions of parameters and supports a 256k context length. By leveraging a massive dataset comprising heterogeneous text-vision pairs…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

Speculative Decoding for Autoregressive Video Generation

2026-04-19 · Yuezhou Hu, Jintao Zhang

General AI

Autoregressive video diffusion is emerging as a promising paradigm for streaming video synthesis, with step distillation serving as the primary means of accelerating inference. Whether speculative decoding, the dominant acceleration strategy for large language models, can be effectively adapted to autoregressive video …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks

2026-04-20 · Rongyuan Tan, Jue Zhang, Zhuozhao Li, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang

General AI

Interpretability tools are increasingly used to analyze failures of Large Language Models (LLMs), yet prior work largely focuses on short prompts or toy settings, leaving their behavior on commonly used benchmarks underexplored. To address this gap, we study contrastive, LRP-based attribution as a practical tool for an…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts

2026-04-21 · Chaitanya Dwivedi, Binxuan Huang, Himanshu Gupta, Pratik Jayarao, Neeraj Varshney, Bing Yin

General AI

Mixture-of-Experts (MoE) has become the dominant architecture for scaling large language models: frontier models routinely decouple total parameters from per-token computation through sparse expert routing. Scaling laws show that under fixed active computation, model quality scales predictably with total parameters, an…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing

2026-04-21 · Ying Zeng, Miaosen Luo, Guangyuan Li, Yang Yang, Ruiyang Fan, Linxiao Shi, Qirui Yang, Jian Zhang, Chengcheng Liu, Siming Zheng, Jinwei Chen, Bo Li, Peng-Tao Jiang

General AI

Traditional photographic image editing typically requires users to possess sufficient aesthetic understanding to provide appropriate instructions for adjusting image quality and camera parameters. However, this paradigm relies on explicit human instruction of aesthetic intent, which is often ambiguous, incomplete, or i…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets

2026-04-24 · Harshit Joshi, Priyank Shethia, Jadelynn Dao, Monica S. Lam

General AI

Real-world document question answering is challenging. Analysts must synthesize evidence across multiple documents and different parts of each document. However, any fixed LLM context window can be exceeded as document collections grow. A common workaround is to decompose documents into chunks and assemble answers from…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.5

Neural Recovery of Historical Lexical Structure in Bantu Languages from Modern Data

2026-04-24 · Hillary Mutisya, John Mugane

Research Track A · General AI

We investigate whether neural models trained exclusively on modern morphological data can recover cross-lingual lexical structure consistent with historical reconstruction. Using BantuMorph v7, a transformer over Bantu morphological paradigms, we analyze 14 Eastern and Southern Bantu languages, extract encoder embeddin…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling

2026-04-29 · Zhen Zhang, Changyi Yang, Zijie Xia, Zhen Yang, Chengzhi Liu, Zhaotiao Weng, Yepeng Liu, Haobo Chen, Jin Pan, Chenyang Zhao, Yuheng Bu, Alkesh Patel, Zhe Gan, Xin Eric Wang

General AI

Token serves as the fundamental unit of computation in modern autoregressive models, and generation length directly influences both inference cost and reasoning performance. Despite its importance, existing approaches lack fine-grained length modeling, operating primarily at the coarse-grained sequence level. We introd…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

Online Self-Calibration Against Hallucination in Vision-Language Models

2026-05-01 · Minghui Chen, Chenxu Yang, Hengjie Zhu, Dayan Wu, Zheng Lin, Qingyi Si

General AI

Large Vision-Language Models (LVLMs) often suffer from hallucinations, generating descriptions that include visual details absent from the input image. Recent preference alignment methods typically rely on supervision distilled from stronger models such as GPT. However, this offline paradigm introduces a Supervision-Pe…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

2026-05-01 · Houyuan Chen, Hong Li, Xianghao Kong, Tianrui Zhu, Shaocong Xu, Weiqing Xiao, Yuwei Guo, Chongjie Ye, Lvmin Zhang, Hao Zhao, Anyi Rao

General AI

Recent progress has shown that video diffusion models (VDMs) can be repurposed for diverse multimodal graphics tasks. However, existing methods often train separate models for each problem setting, which fixes the input-output mapping and limits the modeling of correlations across modalities. We present UniVidX, a unif…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

Generative Quantum-inspired Kolmogorov-Arnold Eigensolver

2026-05-06 · Yu-Cheng Lin, Yu-Chao Hsu, I-Shan Tsai, Chun-Hua Lin, Kuo-Chung Peng, Jiun-Cheng Jiang, Yun-Yuan Wang, Tzung-Chi Huang, Tai-Yue Li, Kuan-Cheng Chen, Samuel Yen-Chi Chen, Nan-Yow Chen

General AI

High-performance computing (HPC) is increasingly important for scalable quantum chemistry workflows that couple classical generative models, quantum circuit simulation, and selected configuration interaction postprocessing. We present the generative quantum-inspired Kolmogorov-Arnold eigensolver (GQKAE), a parameter-ef…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

Trust Region Q Adjoint Matching

2026-05-26 · Yonghoon Dong, Kyungmin Lee, Changyeon Kim, Jaehyuk Kim, Jinwoo Shin

General AI

Off-policy reinforcement learning of pretrained flow policies remains challenging due to the instability of optimization arising from the multi-step sampling process. Recently, Q-learning with Adjoint Matching (QAM) addressed this issue by reformulating into a memoryless stochastic optimal control (SOC) problem with a …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

LLM Anonymization Against Agentic Re-Identification

2026-06-01 · Ziwen Li, Jianing Wen, Tianshi Li

General AI

Agentic LLMs with web search change the threat model for text anonymization: weak contextual cues can become cross-referenceable evidence for re-identification, yet those same details also carry downstream analytic value of the text. Existing defenses either remove explicit identifiers, perturb text for formal privacy,…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

2026-06-02 · Sanket Badhe, Deep Shah

General AI

Advanced reasoning typically requires Chain-of-Thought prompting, which is accurate but incurs prohibitive latency and substantial test-time inference costs. The standard alternative, fine-tuning smaller models, often sacrifices interpretability while introducing significant resource and operational overhead. To addres…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems

2026-06-04 · Shaoyang Xu, Jingshen Zhang, Long P. Hoang, Jinyuan Li, Wenxuan Zhang

General AI

Multicultural multi-agent systems are increasingly deployed in globally diverse settings, where different agents are grounded in different cultural backgrounds. Existing cultural evaluation focuses on value alignment: how closely a single agent matches a target culture. Yet alignment is a per-agent property and cannot …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs

2026-06-04 · Gianluca Barmina, Peter Schneider-Kamp, Lukas Galke Poech

General AI

Large language models can reproduce training data, but existing memorization evaluations mostly measure whether models can be forced to do so, rather than whether they do so under ordinary use. We introduce PropMe, a propensity-aware framework for memorization evaluation that contrasts prefix-based capability attacks w…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

2026-06-04 · Wenbo Pan, Shujie Liu, Chin-Yew Lin, Jingying Zeng, Xianfeng Tang, Xiangyang Zhou, Yan Lu, Xiaohua Jia

General AI

AI agents rely on a harness of skills, tools, and workflows to solve complex problems. Continually improving this harness is essential for adapting to new tasks. However, existing optimization methods typically require ground-truth validation sets, yet such labeled data is difficult to acquire in practical deployment s…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

SwiftVR: Real-Time One-Step Generative Video Restoration

2026-06-08 · Jiaqi Yan, Xiangyu Chen, Xinlin Zhong, Haibin Huang, Chi Zhang, Jie Liu, Jiantao Zhou, Xuelong Li

General AI

Real-time video restoration (VR) for live streams requires high-resolution outputs under strict per-frame latency constraints. Existing one-step diffusion-based VR models remain difficult to deploy on consumer-grade GPUs due to two main bottlenecks: quadratic spatial attention at high resolutions and the latency-memory…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.5

OpenRoundup: Multi-Table Data Wrangling Through Interactive Visualization

2026-06-10 · Stephen Kasica, Charles Berret, Tamara Munzner

Research Track A

Data journalists routinely integrate records across multiple independently published sources to support accountability reporting, yet no existing interactive wrangling tool treats the collection of tables -- rather than the single table -- as its primary unit of work. We present OpenRoundup, an open-source, browser-bas…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.5

A Sustainable Integrated Framework for Multi-Type Urban Waste Collection and Recycling

2026-06-11 · Víctor Blanco, J. Fernando Camacho-Vallejo, Yolanda Hinojosa

Research Track A

Urban waste management faces increasing operational and environmental challenges driven by population growth, heterogeneous waste streams, traffic congestion, and the need for sustainable collection infrastructures. We present an integrated optimization framework for the design of multi-type urban waste collection and …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.5

Open-World Video Segmentation

2026-06-14 · Qing Su, Kaiyang Li, Yuan Zhuang, Fei Miao, Shihao Ji

Research Track A · General AI

While video segmentation has advanced rapidly on short clips and closed-set benchmarks, open-world video segmentation remains largely unexplored. The challenge is twofold: (1) existing methods are not designed to support object discovery and identity maintenance in long videos of dynamic ego-motion, and (2) existing ev…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

RepSelect: Robust LLM Unlearning via Representation Selectivity

2026-06-15 · Filip Sondej, Yushi Yang, Adam Mahdi

General AI

Making large language models (LLMs) deeply forget specific knowledge and values without sacrificing general capabilities remains a central challenge in unlearning. However, current methods are easily reversed by fine-tuning or few-shot prompting, suggesting their forgetting is only shallow. We identify the root cause. …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

CEO-Bench: Can Agents Play the Long Game?

2026-06-16 · Haozhe Chen, Karthik Narasimhan, Zhuang Liu

General AI

Language model agents are becoming proficient executors at isolated, short-horizon tasks such as software engineering and customer service. Yet real-world challenges require a combination of sophisticated skills that remain largely untested in agents: (1) navigating long horizons amid uncertainty; (2) acquiring informa…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.5

Kairos: A Native World Model Stack for Physical AI

2026-06-16 · Kairos Team, Fei Wang, Shan You, Qiming Zhang, Tao Huang, Zuoyi Fu, Zhisheng Zheng, Yunlong Xi, Feng Lv, Xiaoming Wu, Zeyu Liu, Cong Wan, Pu Li, Ruiqing Yang, Xiaoou Li, Wei Wang, Kangkang Zhu, Yuwei Zhang, Shi Fu, Zheng Zhang, Xiaoning Wu, Xuzeng Fan, Dacheng Tao, Xiaogang Wang

General AI

World models are transitioning from passive visual generators to foundational, operational infrastructure for Physical AI: they must natively acquire world knowledge from heterogeneous experience, maintain persistent states over long horizons, and execute efficiently within real deployment constraints. We introduce Kai…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.5

UoU: A Universal Fingerprint Foundation Model Based on Large-Scale Unsupervised Learning

2026-06-16 · Xiongjun Guan, Jianjiang Feng, Jie Zhou

Research Track A

Fingerprint recognition is still dominated by task-specific pipelines, where enhancement, structural parsing, alignment, and matching are optimized in isolation. Although effective in narrow settings, this design limits representation reuse across sensors, qualities, and downstream applications. We therefore present Uo…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.5

Splaxel: Efficient Distributed Training of 3D Gaussian Splatting for Large-scale Scene Reconstruction via Pixel-level Communication

2026-06-17 · Wenqi Jia, Zhewen Hu, Ying Huang, Yu Gong, Stavros Kalafatis, Yuke Wang, Wei Niu, Chengming Zhang, Ang Li, Sheng Di, Yuede Ji, Bo Fang, Miao Yin

Research Track A

3D Gaussian Splatting (3DGS) enables high-fidelity and real-time 3D scene reconstruction, but scaling training to large-scale scenes requires optimizing hundreds of millions of Gaussians across multiple GPUs. Existing distributed approaches either partition scenes into isolated regions, causing global inconsistency, or…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference

2026-03-26 · Xiaofeng Mao, Shaohao Rui, Kaining Ying, Bo Zheng, Chuanhao Li, Mingmin Chi, Kaipeng Zhang

General AI

Autoregressive video diffusion models have demonstrated remarkable progress, yet they remain bottlenecked by intractable linear KV-cache growth, temporal repetition, and compounding errors during long-video generation. To address these challenges, we present PackForcing, a unified framework that efficiently manages the…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Aligned, Orthogonal or In-conflict: When can we safely optimize Chain-of-Thought?

2026-03-31 · Max Kaufmann, David Lindner, Roland S. Zimmermann, and Rohin Shah

General AI

Chain-of-Thought (CoT) monitoring, in which automated systems monitor the CoT of an LLM, is a promising approach for effectively overseeing AI systems. However, the extent to which a model's CoT helps us oversee the model - the monitorability of the CoT - can be affected by training, for instance by the model learning …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Extending MONA in Camera Dropbox: Reproduction, Learned Approval, and Design Implications for Reward-Hacking Mitigation

2026-03-31 · Nathan Heath

General AI

Myopic Optimization with Non-myopic Approval (MONA) mitigates multi-step reward hacking by restricting the agent's planning horizon while supplying far-sighted approval as a training signal~\cite{farquhar2025mona}. The original paper identifies a critical open question: how the method of constructing approval -- partic…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Phyelds: A Pythonic Framework for Aggregate Computing

2026-03-31 · Gianluca Aguzzi, Davide Domini, Nicolas Farabegoli, Mirko Viroli

General AI

Aggregate programming is a field-based coordination paradigm with over a decade of exploration and successful applications across domains including sensor networks, robotics, and IoT, with implementations in various programming languages, such as Protelis, ScaFi (Scala), and FCPP (C++). A recent research direction inte…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Quantifying Cross-Modal Interactions in Multimodal Glioma Survival Prediction via InterSHAP: Evidence for Additive Signal Integration

2026-03-31 · Iain Swift, JingHua Ye, Ruairi O'Reilly

General AI

Multimodal deep learning for cancer prognosis is commonly assumed to benefit from synergistic cross-modal interactions, yet this assumption has not been directly tested in survival prediction settings. This work adapts InterSHAP, a Shapley interaction index-based metric, from classification to Cox proportional hazards …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

RAAP: Retrieval-Augmented Affordance Prediction with Cross-Image Action Alignment

2026-03-31 · Qiyuan Zhuang, He-Yang Xu, Yijun Wang, Xin-Yang Zhao, Yang-Yang Li, Xiu-Shen Wei

General AI

Understanding object affordances is essential for enabling robots to perform purposeful and fine-grained interactions in diverse and unstructured environments. However, existing approaches either rely on retrieval, which is fragile due to sparsity and coverage gaps, or on large-scale models, which frequently mislocaliz…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Reward-Based Online LLM Routing via NeuralUCB

2026-03-31 · Ming-Hua Tsai, Phat Tran

General AI

This study investigates the use of NeuralUCB for cost-aware large language model (LLM) routing. Existing routing approaches can be broadly grouped into supervised routing methods and partial-feedback methods, each with different tradeoffs in efficiency and adaptivity. We implement a NeuralUCB-based routing policy and e…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Trimodal Deep Learning for Glioma Survival Prediction: A Feasibility Study Integrating Histopathology, Gene Expression, and MRI

2026-03-31 · Iain Swift, JingHua Ye

General AI

Multimodal deep learning has improved prognostic accuracy for brain tumours by integrating histopathology and genomic data, yet the contribution of volumetric MRI within unified survival frameworks remains unexplored. This pilot study extends a bimodal framework by incorporating Fluid Attenuated Inversion Recovery (FLA…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Best-Arm Identification with Noisy Actuation

2026-04-02 · Merve Karakas, Osama Hanna, Lin F. Yang, Christina Fragouli

General AI

In this paper, we consider a multi-armed bandit (MAB) instance and study how to identify the best arm when arm commands are conveyed from a central learner to a distributed agent over a discrete memoryless channel (DMC). Depending on the agent capabilities, we provide communication schemes along with their analysis, wh…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Grounded Token Initialization for New Vocabulary in LMs for Generative Recommendation

2026-04-02 · Daiwei Chen, Zhoutong Fu, Chengming Jiang, Haichao Zhang, Ran Zhou, Tan Wang, Chunnan Yao, Guoyao Li, Rui Cai, Yihan Cao, Ruijie Jiang, Fedor Borisyuk, Jianqiang Shen, Jingwei Wu, Ramya Korlakai Vinayak

General AI

Language models (LMs) are increasingly extended with new learnable vocabulary tokens for domain-specific tasks, such as Semantic-ID tokens in generative recommendation. The standard practice initializes these new tokens as the mean of existing vocabulary embeddings, then relies on supervised fine-tuning to learn their …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Modulate-and-Map: Crossmodal Feature Mapping with Cross-View Modulation for 3D Anomaly Detection

2026-04-02 · Alex Costanzino, Pierluigi Zama Ramirez, Giuseppe Lisanti, Luigi Di Stefano

General AI

We present ModMap, a natively multiview and multimodal framework for 3D anomaly detection and segmentation. Unlike existing methods that process views independently, our method draws inspiration from the crossmodal feature mapping paradigm to learn to map features across both modalities and views, while explicitly mode…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

How AI Aggregation Affects Knowledge

2026-04-06 · Daron Acemoglu, Tianyi Lin, Asuman Ozdaglar, James Siderius

General AI

Artificial intelligence (AI) changes social learning when aggregated outputs become training data for future predictions. To study this, we extend the DeGroot model by introducing an AI aggregator that trains on population beliefs and feeds synthesized signals back to agents. We define the learning gap as the deviation…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Stratifying Reinforcement Learning with Signal Temporal Logic

2026-04-06 · Justin Curry, Alberto Speranzon

General AI

In this paper, we develop a stratification-based semantics for Signal Temporal Logic (STL) in which each atomic predicate is interpreted as a membership test in a stratified space. This perspective reveals a novel correspondence principle between stratification theory and STL, showing that most STL formulas can be view…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

ALTO: Adaptive LoRA Tuning and Orchestration for Heterogeneous LoRA Training Workloads

2026-04-07 · Jingwei Zuo, Xinze Feng, Zien Liu, Kaijian Wang, Fanjiang Ye, Ye Cao, Zhuang Wang, Yuke Wang

General AI

Low-Rank Adaptation (LoRA) is now the dominant method for parameter-efficient fine-tuning of large language models, but achieving a high-quality adapter often requires systematic hyperparameter tuning because LoRA performance is highly sensitive to configuration choices. In practice, this leads to many concurrent LoRA …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

CrowdVLA: Embodied Vision-Language-Action Agents for Context-Aware Crowd Simulation

2026-04-07 · Juyeong Hwang, Seong-Eun Hong, Jinhyun Kim, JaeYoung Seon, Giljoo Nam, Hanyoung Jang, HyeongYeop Kang

General AI

Crowds do not merely move; they decide. Human navigation is inherently contextual: people interpret the meaning of space, social norms, and potential consequences before acting. Sidewalks invite walking, crosswalks invite crossing, and deviations are weighed against urgency and safety. Yet most crowd simulation methods…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Mixture-of-Modality-Experts with Holistic Token Learning for Fine-Grained Multimodal Visual Analytics in Driver Action Recognition

2026-04-07 · Tianyi Liu, Yiming Li, Wenqian Wang, Jiaojiao Wang, Chen Cai, Yi Wang, Kim-Hui Yap

General AI

Robust multimodal visual analytics remains challenging when heterogeneous modalities provide complementary but input-dependent evidence for decision-making.Existing multimodal learning methods mainly rely on fixed fusion modules or predefined cross-modal interactions, which are often insufficient to adapt to changing m…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Density-Driven Optimal Control: Convergence Guarantees for Stochastic LTI Multi-Agent Systems

2026-04-09 · Kooktae Lee

General AI

This paper addresses the decentralized non-uniform area coverage problem for multi-agent systems, a critical task in missions with high spatial priority and resource constraints. While existing density-based methods often rely on computationally heavy Eulerian PDE solvers or heuristic planning, we propose Stochastic De…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

UniversalVTG: A Universal and Lightweight Foundation Model for Video Temporal Grounding

2026-04-09 · Joungbin An, Agrim Jain, Kristen Grauman

General AI

Video temporal grounding (VTG) is typically tackled with dataset-specific models that transfer poorly across domains and query styles. Recent efforts to overcome this limitation have adapted large multimodal language models (MLLMs) to VTG, but their high compute cost and limited video context still hinder long-video gr…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

What They Saw, Not Just Where They Looked: Semantic Scanpath Similarity via VLMs and NLP metric

2026-04-09 · Mohamed Amine Kerkouri, Marouane Tliba, Bin Wang, Aladine Chetouani, Ulas Bagci, Alessandro Bruno

General AI

Scanpath similarity metrics are central to eye-movement research, yet existing methods predominantly evaluate spatial and temporal alignment while neglecting semantic equivalence between attended image regions. We present a semantic scanpath similarity framework that integrates vision-language models (VLMs) into eye-tr…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

The nextAI Solution to the NeurIPS 2023 LLM Efficiency Challenge

2026-04-10 · Gyuwon Park, DongIl Shin, SolGil Oh, SangGi Ryu, Byung-Hak Kim

General AI

The rapid evolution of Large Language Models (LLMs) has significantly impacted the field of natural language processing, but their growing complexity raises concerns about resource usage and transparency. Addressing these challenges, we participated in the NeurIPS LLM Efficiency Challenge, aiming to fine-tune a foundat…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

CLSGen: A Dual-Head Fine-Tuning Framework for Joint Probabilistic Classification and Verbalized Explanation

2026-04-13 · WonJin Yoon, Kangyu Zhu, Ian Bulovic, Autumn Sehy, Yanjun Gao, Dmitriy Dligach, Majid Afshar, Timothy A. Miller

Research Track A · General AI

With the recent progress of Large Language Models (LLMs), there is a growing interest in applying these models to solve complex and challenging problems. Modern LLMs, capable of processing long contexts and generating verbalized explanations, offer significant potential in addressing real-world applications. However, a…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Disentangled Point Diffusion for Precise Object Placement

2026-04-13 · Lyuxing He, Eric Cai, Shobhit Aggarwal, Jianjun Wang, David Held

General AI

Recent advances in robotic manipulation have highlighted the effectiveness of learning from demonstration. However, while end-to-end policies excel in expressivity and flexibility, they struggle both in generalizing to novel object geometries and in attaining a high degree of precision. An alternative, object-centric a…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

SceneCritic: A Symbolic Evaluator for 3D Indoor Scene Synthesis

2026-04-14 · Kathakoli Sengupta, Kai Ao, Paola Cascante-Bonilla

General AI

Large Language Models (LLMs) and Vision-Language Models (VLMs) increasingly generate indoor scenes through intermediate structures such as layouts and scene graphs, yet evaluation still relies on LLM or VLM judges that score rendered views, making judgments sensitive to viewpoint, prompt phrasing, and hallucination. Wh…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

TLoRA+: A Low-Rank Parameter-Efficient Fine-Tuning Method for Large Language Models

2026-04-15 · Yarui Cao, Kai Liu

General AI

Fine-tuning large language models (LLMs) aims to adapt pre-trained models to specific tasks using relatively small and domain-specific datasets. Among Parameter-Efficient Fine-Tuning (PEFT) methods, Low-Rank Adaptation (LoRA) stands out by matching the performance of full fine-tuning while avoiding additional inference…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

CAVERS: Multimodal SLAM Data from a Natural Karstic Cave with Ground Truth Motion Capture

2026-04-16 · Giacomo Franchini, David Rodríguez-Martínez, Alfonso Martínez-Petersen, C. J. Pérez-del-Pulgar, Marcello Chiaberge

General AI

Autonomous robots operating in natural karstic caves face perception and navigation challenges that are qualitatively distinct from those encountered in mines or tunnels: irregular geometry, reflective wet surfaces, near-zero ambient light, and complex branching passages. Yet publicly available datasets targeting this …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Generalization in LLM Problem Solving: The Case of the Shortest Path

2026-04-16 · Yao Tong, Jiayuan Ye, Anastasia Borovykh, Reza Shokri

General AI

Whether language models can systematically generalize remains actively debated. Yet empirical performance is jointly shaped by multiple factors such as training data, training paradigms, and inference-time strategies, making failures difficult to interpret. We introduce a controlled synthetic environment based on short…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Prism: Symbolic Superoptimization of Tensor Programs

2026-04-16 · Mengdi Wu, Xiaoyu Jiang, Oded Padon, Zhihao Jia

General AI

This paper presents Prism, the first symbolic superoptimizer for tensor programs. The key idea is sGraph, a symbolic, hierarchical representation that compactly encodes large classes of tensor programs by symbolically representing some execution parameters. Prism organizes optimization as a two-level search: it constru…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Think in Latent Thoughts: A New Paradigm for Gloss-Free Sign Language Translation

2026-04-16 · Yiyang Jiang, Li Zhang, Xiao-Yong Wei, Li Qing

General AI

Many SLT systems quietly assume that brief chunks of signing map directly to spoken-language words. That assumption breaks down because signers often create meaning on the fly using context, space, and movement. We revisit SLT and argue that it is mainly a cross-modal reasoning task, not just a straightforward video-to…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Joint-Centric Dual Contrastive Alignment with Structure-Preserving and Information-Balanced Regularization

2026-04-17 · Habibeh Naderi, Behrouz Haji Soleimani, Stan Matwin

General AI

We propose HILBERT (HIerarchical Long-sequence Balanced Embedding with Reciprocal contrastive Training), a cross-attentive multimodal framework for learning document-level audio-text representations from long, segmented sequences in low-resource data settings. HILBERT leverages frozen pre-trained speech and language en…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

VEFX-Bench: A Holistic Benchmark for Generic Video Editing and Visual Effects

2026-04-17 · Xiangbo Gao, Sicong Jiang, Bangya Liu, Xinghao Chen, Minglai Yang, Siyuan Yang, Mingyang Wu, Jiongze Yu, Qi Zheng, Haozhi Wang, Jiayi Zhang, Jared Yang, Jie Yang, Zihan Wang, Qing Yin, Zhengzhong Tu

General AI

As AI-assisted video creation becomes increasingly practical, instruction-guided video editing has become essential for refining generated or captured footage to meet professional requirements. Yet the field still lacks both a large-scale human-annotated dataset with complete editing examples and a standardized evaluat…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Agentic Forecasting using Sequential Bayesian Updating of Linguistic Beliefs

2026-04-20 · Kevin Murphy

General AI

We present BLF (Bayesian Linguistic Forecaster), an agentic system for binary forecasting that achieves state-of-the-art performance on the ForecastBench benchmark. The system is built on three ideas. (1) A Bayesian linguistic belief state: a semi-structured representation combining numerical probability estimates with…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Learning Invariant Modality Representation for Robust Multimodal Learning from a Causal Inference Perspective

2026-04-20 · Sijie Mai, Shiqin Han

General AI

Multimodal affective computing aims to predict humans' sentiment, emotion, intention, and opinion using language, acoustic, and visual modalities. However, current models often learn spurious correlations that harm generalization under distribution shifts or noisy modalities. To address this, we propose a causal modali…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

RDP LoRA: Geometry-Driven Identification for Parameter-Efficient Adaptation in Large Language Models

2026-04-21 · Yusuf Çelebi, Yağız Asker, Özay Ezerceli, Mahmoud ElHussieni, Selva Taş, Reyhan Bayraktar, Fatma Betül Terzioğlu

General AI

Fine-tuning Large Language Models (LLMs) remains structurally uncertain despite parameter-efficient methods such as Low-Rank Adaptation (LoRA), as the layer-specific roles of internal representations are poorly understood, leading to heuristic decisions about where adaptation should be applied. We model the evolution o…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

SpanVLA: Efficient Action Bridging and Learning from Negative-Recovery Samples for Vision-Language-Action Model

2026-04-21 · Zewei Zhou, Ruining Yang, Xuewei, Qi, Yiluan Guo, Sherry X. Chen, Tao Feng, Kateryna Pistunova, Yishan Shen, Lili Su, Jiaqi Ma

General AI

Vision-Language-Action (VLA) models offer a promising autonomous driving paradigm for leveraging world knowledge and reasoning capabilities, especially in long-tail scenarios. However, existing VLA models often struggle with the high latency in action generation using an autoregressive generation framework and exhibit …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Can "AI" Be a Doctor? A Study of Empathy, Readability, and Alignment in Clinical LLMs

2026-04-22 · Mariano Barone, Francesco Di Serio, Roberto Moio, Marco Postiglione, Giuseppe Riccio, Antonio Romano, Vincenzo Moscato

General AI

Large Language Models (LLMs) are increasingly deployed in healthcare, yet their communicative alignment with clinical standards remains insufficiently quantified. We conduct a multidimensional evaluation of general-purpose and domain-specialized LLMs across structured medical explanations and real-world physician-patie…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation

2026-04-22 · Hyeonwoo Kim, Jeonghwan Kim, Kyungwon Cho, Hanbyul Joo

General AI

Recent advances in video generative models enable the synthesis of realistic human-object interaction videos across a wide range of scenarios and object categories, including complex dexterous manipulations that are difficult to capture with motion capture systems. While the rich interaction knowledge embedded in these…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

SWE-chat: Coding Agent Interactions From Real Users in the Wild

2026-04-22 · Joachim Baumann, Vishakh Padmakumar, Xiang Li, John Yang, Diyi Yang, Sanmi Koyejo

General AI

AI coding agents are being adopted at scale, yet we lack empirical evidence on how people actually use them and how much of their output is useful in practice. We present SWE-chat, the first large-scale dataset of real coding agent sessions collected from open-source developers in the wild. The dataset currently contai…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Stream-CQSA: Avoiding Out-of-Memory in Attention Computation via Flexible Workload Scheduling

2026-04-22 · Yiming Bian, Joshua M. Akey

General AI

The scalability of long-context large language models is fundamentally limited by the quadratic memory cost of exact self-attention, which often leads to out-of-memory (OOM) failures on modern hardware. Existing methods improve memory efficiency to near-linear complexity, while assuming that the full query, key, and va…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

OptiMat Alloys: A FAIR End-to-End Agent with Living Database for Computational Multi-Principal Alloy Exploration

2026-04-23 · Yang Hu, Vladyslav Turlo

General AI

The FAIR principles have transformed how computational data and workflows are shared in materials research, yet existing repositories can only serve pre-computed entries -- broad coverage is perpetually incomplete and cannot adapt to new questions on demand. To address these challenges, we present OptiMat Alloys, a lar…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

SpikingBrain2.0: Brain-Inspired Foundation Models for Efficient Long-Context and Cross-Platform Inference

2026-04-24 · Yuqi Pan, Jinghao Zhuang, Yupeng Feng, Fangzhi Zhong, Siyu Ding, Xuerui Qiu, Shaowei Gu, Bohan Sun, Zhiyong Qin, Yibo Zhong, Lingtao Ouyang, Kun Yang, Zehao Liu, Yuhong Chou, Shurong Wang, Anjie Hu, Han Xu, Bo Xu, Guoqi Li

General AI

Scaling context length is reshaping large-model development, yet full-attention Transformers suffer from prohibitive computation and inference bottlenecks at long sequences. A key challenge is to design foundation models that maintain performance and long-context efficiency with minimal training overhead. We introduce …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Thinking Without Words: Efficient Latent Reasoning with Abstract Chain-of-Thought

2026-04-24 · Keshav Ramji, Tahira Naseem, Ramón Fernandez Astudillo

General AI

While long, explicit chains-of-thought (CoT) have proven effective on complex reasoning tasks, they are costly to generate during inference. Non-verbal reasoning methods have emerged with shorter generation lengths by leveraging continuous representations, yet their performance lags behind verbalized CoT. We propose $\…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Case-Specific Rubrics for Clinical AI Evaluation: Methodology, Validation, and LLM-Clinician Agreement Across 823 Encounters

2026-04-27 · Aaryan Shah, Andrew Hines, Alexia Downs, Denis Bajet, Paulius Mui, Fabiano Araujo, Laura Offutt, Aida Rutledge, Elizabeth Jimenez

General AI

Objective. Clinical AI documentation systems require evaluation methodologies that are clinically valid, economically viable, and sensitive to iterative changes. Methods requiring expert review per scoring instance are too slow and expensive for safe, iterative deployment. We present a case-specific, clinician-authored…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Contextual Linear Activation Steering of Language Models

2026-04-27 · Brandon Hsu, Daniel Beaglehole, Adityanarayanan Radhakrishnan, Mikhail Belkin

General AI

Linear activation steering is a powerful approach for eliciting the capabilities of large language models and specializing their behavior using limited labeled data. While effective, existing methods often apply a fixed steering strength to all tokens, resulting in inconsistent steering quality across diverse input pro…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Majorization-Guided Test-Time Adaptation for Vision-Language Models under Modality-Specific Shift

2026-04-27 · Lixian Chen, Mingxuan Huang, Yanhui Chen, Junyi Lin, Yang Shi

General AI

Vision-language models transfer well in zero-shot settings, but at deployment the visual and textual branches often shift asymmetrically. Under this condition, entropy-based test-time adaptation can sharpen the fused posterior while increasing error, because an unreliable modality may still dominate fusion. We study th…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

2026-04-27 · Zhiheng Liu, Weiming Ren, Xiaoke Huang, Shoufa Chen, Tianhong Li, Mengzhao Chen, Yatai Ji, Sen He, Jonas Schult, Belinda Zeng, Tao Xiang, Wenhu Chen, Ping Luo, Luke Zettlemoyer, Yuren Cong

General AI

Unified multimodal models typically rely on pretrained vision encoders and use separate visual representations for understanding and generation, creating misalignment between the two tasks and preventing fully end-to-end optimization from raw pixels. We introduce Tuna-2, a native unified multimodal model that performs …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers

2026-04-28 · Jan Dubiński, Jan Betley, Anna Sztyber-Betley, Daniel Tan, Owain Evans

General AI

Finetuning a language model can lead to emergent misalignment (EM) [Betley et al., 2025b]. Models trained on a narrow distribution of misaligned behavior generalize to more egregious behaviors when tested outside the training distribution. We study a set of interventions proposed to reduce EM. We confirm that these int…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Luminol-AIDetect: Fast Zero-shot Machine-Generated Text Detection based on Perplexity under Text Shuffling

2026-04-28 · Lucio La Cava, Andrea Tagarelli

General AI

Machine-generated text (MGT) detection requires identifying structurally invariant signals across generation models, rather than relying on model-specific fingerprints. In this respect, we hypothesize that while large language models excel at local semantic consistency, their autoregressive nature results in a specific…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Semi-Markov Reinforcement Learning for City-Scale EV Ride-Hailing with Feasibility-Guaranteed Actions

2026-04-28 · An Nguyen, Hoang Nguyen, Phuong Le, Hung Pham, Cuong Do, Laurent El Ghaoui

General AI

We study city-scale control of electric-vehicle (EV) ride-hailing fleets where dispatch, repositioning, and charging decisions must respect charger and feeder limits under uncertain, spatially correlated demand and travel times. We formulate the problem as a hex-grid semi-Markov decision process (semi-MDP) with mixed a…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

DreamProver: Evolving Transferable Lemma Libraries via a Wake-Sleep Theorem-Proving Agent

2026-04-29 · Youyuan Zhang, Jialiang Sun, Hangrui Bi, Chuqin Geng, Wenjie Ma, Zhaoyu Li, Xujie Si

General AI

We introduce DreamProver, an agentic framework that leverages a "wake-sleep" program induction paradigm to discover reusable lemmas for formal theorem proving. Existing approaches either rely on fixed lemma libraries, which limit adaptability, or synthesize highly specific intermediate lemmas tailored to individual the…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Graph-based Semantic Calibration Network for Unaligned UAV RGBT Image Semantic Segmentation and A Large-scale Benchmark

2026-04-29 · Fangqiang Fan, Zhicheng Zhao, Xiaoliang Ma, Chenglong Li, Jin Tang

General AI

Fine-grained RGBT image semantic segmentation is crucial for all-weather unmanned aerial vehicle (UAV) scene understanding. However, UAV RGBT semantic segmentation faces two coupled challenges: cross-modal spatial misalignment caused by sensor parallax and platform vibration, and severe semantic confusion among fine-gr…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Select to Think: Unlocking SLM Potential with Local Sufficiency

2026-04-29 · Wenxuan Ye, Yangyang Zhang, Xueli An, Georg Carle, Yunpu Ma

General AI

Small language models (SLMs) offer computational efficiency for scalable deployment, yet they often fall short of the reasoning power exhibited by their larger counterparts (LLMs). To mitigate this gap, current approaches invoke an LLM to generate tokens at points of reasoning divergence, but these external calls intro…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Walk With Me: Long-Horizon Social Navigation for Human-Centric Outdoor Assistance

2026-04-29 · Lingfeng Zhang, Xiaoshuai Hao, Xizhou Bu, Yingbo Tang, Hongsheng Li, Jinghui Lu, Xiu-shen Wei, Jiayi Ma, Yu Liu, Jing Zhang, Hangjun Ye, Xiaojun Liang, Long Chen, Wenbo Ding

General AI

Assisting humans in open-world outdoor environments requires robots to translate high-level natural-language intentions into safe, long-horizon, and socially compliant navigation behavior. Existing map-based methods rely on costly pre-built HD maps, while learning-based policies are mostly limited to indoor and short-h…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Beyond Generative Decoding: Discriminative Hidden-State Readout from a Native Omni-Modal LLM for Multimodal Sentiment Analysis

2026-06-04 · Bin Wen, Tien-Ping Tan

General AI

Multimodal sentiment analysis (MSA) infers human affect from language, acoustic, and visual signals. Recent methods increasingly adapt large multimodal models (LMMs) via generative readout: prompting the model to emit a sentiment score as a text string. While convenient, this ties continuous regression to discrete auto…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

2026-06-04 · Liliana Hotsko, Yinxi Li, Yuntian Deng, Pengyu Nie

General AI

Code language models need repository-level context to resolve imports, APIs, and project conventions. Existing methods inject this knowledge as long inputs (retrieved through RAG or dependency analysis) or through per-repository fine-tuning and LoRA -- costly at repository scale and brittle to evolving codebases. We in…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Flow-based Policy Adaptation without Policy Updates

2026-06-04 · Luzhe Sun, Jingtian Ji, Haoran Chen, Jiawei Zhou, Matthew R. Walter

General AI

Leveraging prior knowledge from pretrained policies, foundation models, or human operators offers an efficient alternative to learning robot skills from scratch. However, these agents often provide actions that are suboptimal, noisy, or misaligned with task-specific expert behavior. We propose GLOVES, a family of flow-…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

USAD 2.0: Scaling Representation Distillation for Universal Audio Understanding

2026-06-04 · Heng-Jui Chang, Alexander H. Liu, Saurabhchand Bhati, Mrudula Athi, Anton Ratnarajah, Amit Chhetri, James Glass

General AI

Audio encoders are critical to modern audio applications as large language models (LLMs) increasingly rely on a single encoder for diverse inputs. While self-supervised learning (SSL) has yielded strong domain-specific encoders like speech or music experts, multi-domain approaches like USAD and SPEAR remain limited in …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

FASE: Fast Adaptive Semantic Entropy for Code Quality

2026-06-08 · Shizhe Lin, Ladan Tahvildari

General AI

Multi-agent code generation offers a promising paradigm for autonomous software development by simulating the human software engineering lifecycle. However, system reliability remains hindered by LLM hallucinations and error propagation across interacting agents. While semantic entropy provides a principled way to quan…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

GenEyePose: Patient-Free, Knowledge-Based Saccadic Eye Movement Modeling for Digital Neurophysiologic Biomarker Development

2026-06-08 · Tianyu Lin, Jooyoung Ryu, Puvada Sreevarsha, Rahul Srinivasaragavan, Riya Satavlekar, Susan Kim, Nidhi Soley, Yujie Yan, Ishan Vatsaraj, Carl Harris, Aimon Rahman, Vishal Patel, Joseph Greenstein, Casey Taylor, Kemar E. Green

General AI

Eye movements, including saccades, are widely regarded as highly sensitive and objective biomarkers of neurophysiologic states. Detecting saccadic signatures in neurologic diseases offers a rapid, portable alternative to brain imaging, avoiding access and cost barriers. Currently, there are no robust AI-enabled video-o…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Latent Spatial Memory for Video World Models

2026-06-08 · Weijie Wang, Haoyu Zhao, Yifan Yang, Feng Chen, Zeyu Zhang, Yefei He, Zicheng Duan, Donny Y. Chen, Yuqing Yang, Bohan Zhuang

General AI

Video world models that maintain 3D spatial consistency across generated frames typically rely on explicit point cloud memory constructed in RGB space. This design is both computationally expensive, requiring repeated rendering and VAE encoding, and inherently lossy, as the round trip through pixel space discards rich …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models

2026-06-08 · Gianluca Barmina, Federico Torrielli, Sven Harms, Jacob Nielsen, Felix Mächtle, Stine Lyngsø Beltoft, Peter Schneider-Kamp, Thomas Eisenbarth, Lukas Galke Poech, Anne Lauscher

General AI

Large language models (LLMs) routinely face requests that should be refused, creating a trade-off between helpfulness and harm prevention. However, refusals themselves can be helpful. In high-risk interactions involving crisis, coercion, or escalating intent, blunt non-compliance may prevent direct harm while still fai…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

What the Eyes See, the LLMs Miss: Exploiting Human Perception for Adversarial Text Attacks

2026-06-08 · Qin Yang, Lu Malloy, Joshua Lee, Xiaohan Chang, Meisam Mohammady, Doowon Kim, Yuan Hong

General AI

Large language model (LLM)-powered content moderation systems have become a critical defense against harmful online content. However, these systems primarily operate on tokenized text and largely ignore the visual cues that humans naturally rely on when interpreting content. We show that this discrepancy creates a fund…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Predicting Future Behaviors in Reasoning Models Enables Better Steering

2026-06-09 · Evgenii Kortukov, Piotr Komorowski, Florian Klein, Paula Engl, Gabriele Sarti, Seong Joon Oh, Sebastian Lapuschkin, Wojciech Samek

General AI

Deployed large reasoning models (LRMs) often behave unexpectedly. Test-time steering controls LRM outputs by intervening on their hidden representations, but it can degrade output quality. We argue that prior steering work implicitly relies on internal features that detect behavior in already generated text. We show th…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

SSR-Merge: Subspace Signal Routing for Training-Free LoRA Merging in Diffusion Models

2026-06-09 · Zhengxuan Wei, Yi Dong, Zonghui Li, Xianhui Lin, Xing Liu, Hong Gu, Shaofeng Zhang, Wenbin Li, Qi Fan

General AI

Low-Rank Adaptation (LoRA) merging can efficiently combine diverse generative capabilities from multiple trained LoRAs for a diffusion model. However, existing LoRA merging techniques often suffer from severe parameter interference, causing destructive collisions in the shared parameter space. To address this, we propo…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Automated reproducibility assessments in the social and behavioral sciences using large language models

2026-06-11 · Tobias Holtdirk, Pietro Marcolongo, Anna Steinberg Schulten, Felix Henninger, Stefan Rose, Sarah Ball, Bolei Ma, Frauke Kreuter, Markus Weinmann, Stefan Feuerriegel

General AI

Reproducibility in the social and behavioral sciences is typically evaluated by independent researchers who reanalyze the original data to assess whether the published findings can be recovered. However, such approaches are resource-intensive and difficult to scale. Here, we show that large language models (LLMs) can a…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Beyond LoRA: Is Sparsity-Induced Adaptation Better?

2026-06-11 · Elijah Cadenhead, Cristian McGee, Xin Li, El Houcine Bergou, Aritra Dutta

General AI

Low-rank adaptation (LoRA) and its variants provide a memory- and compute-efficient alternative to full fine-tuning of pre-trained models. However, questions remain about the comparative generalizability of these approaches and how the structural restrictions on low-rank updates preserve effective adaptation performanc…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Beyond Uniform Tokens: Adaptive Compression for Time Series Language Models

2026-06-11 · Jialin Gan, Xin Qiu, Guangzhe Chen, Xue Wang

Research Track A · General AI

Large language models (LLMs) have enabled time series (TS) analysis by jointly modeling numerical observations and textual context through a shared token interface. However, TS tokens and prompt tokens exhibit fundamentally different information structures, making uniform token processing inefficient. In this paper, we…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Data-Driven Decoding of Russell's Circumplex Model of Affect

2026-06-15 · Amdjed Belaref, Samir Sadok, Zineb Noumir, Renaud Seguier

General AI

Affective computing increasingly relies on deep learning to represent emotions, yet latent spaces often remain opaque, high-dimensional black boxes. This paper investigates whether Transformers' embeddings recover the geometric regularities of Russell's circumplex model. We unify two complementary experiments testing t…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Exact Posterior Score Estimation for Solving Linear Inverse Problems

2026-06-15 · Abbas Mammadov, Ozgur Kara, Kaan Oktay, Iskander Azangulov, Adil Kaan Akan, Hyungjin Chung, James Matthew Rehg, Yee Whye Teh

General AI

Diffusion and flow-based models learn powerful data priors by training a denoiser to reverse Gaussian corruption. To use this prior to solve a linear inverse problem, one needs to sample from the posterior, but the score that the prior provides is the unconditional score, not the posterior score. Existing methods eithe…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

SoK: Security and Privacy of Foundation-Model-Powered Robots

2026-06-15 · Xueluan Gong, Chen Chen, Jinxin Liu, Qian Wang, Kwok-Yan Lam

General AI

Foundation models are reshaping robotics by enabling robots to interpret open-ended instructions, reason over multimodal contexts, and operate in complex, open-world environments. However, their integration also introduces security and privacy (S&P) risks that extend beyond the FMs themselves to embodied execution pipe…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Multi-Source Cybersecurity Logs: An ATT&CK-Labeled Dataset and SLM Evaluation

2026-06-16 · Abir Ashab Niloy, Ahmed Ryan, Imamul Hossain Rafi, Md Erfan, Md Rayhanur Rahman

General AI

Multi-stage cyberattacks span system, network, and browser logs. Detecting them requires correlating events across all three sources. Machine learning methods can learn these cross-source patterns, but they need labeled multi-source data. Existing public datasets fall short. Network-only datasets such as CICIDS and UNS…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

One-Step Token-to-Waveform Generation with MeanFlow in Latent Space

2026-06-16 · Zheqi Dai, Guangyan Zhang, Zhen Ye, Jingyu Li, Haolin He, Chunyat Wu, Yiwen Guo, Qiuqiang Kong

General AI

Neural audio codecs are central to modern LLM-based Text-to-Speech (TTS) and multimodal systems. As low-bitrate semantic codecs gain prominence, the Token-to-Waveform (Token2Wav) decoder becomes a bottleneck determining both perceptual quality and system efficiency. Conventional multi-step flow-matching decoders offer …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Reference-Driven Multi-Speaker Audio Scene Generation from In-the-Wild Priors

2026-06-17 · Michael Finkelson, Daniel Segal, Eitan Richardson, Shahar Armon, Nani Goldring, Poriya Panet, Nir Zabari, Benjamin Brazowski, Or Patashnik, Yoav HaCohen

General AI

Existing multi-speaker dialogue systems bind speakers to utterances through structured supervision: per-turn tags, multi-stream transcriptions, or learnable speaker embeddings. These systems operate within speech-only pipelines that produce clean vocal sequences without the ambient texture of real conversations. We tak…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.3

Risk Stratification for ICU Delirium using Pervasive Ambient Sensing Information

2026-06-17 · Jiaqing Zhang, Sabyasachi Bandyopadhyay, Miguel Contreras, Jessica Sena, Yuanfang Ren, Andrea Davidson, Ziyuan Guan, Tezcan Ozrazgat-Baslanti, Subhash Nerella, Azra Bihorac, Parisa Rashidi

General AI

Delirium is a common and serious complication in the Intensive Care Unit (ICU), associated with increased morbidity, prolonged hospital stays, and higher healthcare costs. Despite its prevalence, early prediction and prevention remain challenging. Environmental factors such as ambient sound and light may influence the …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.2

ProCUA-SFT Technical Report

2026-06-15 · Jaehun Jung, Ximing Lu, Brandon Cui, Muhammad Khalifa, Shaokun Zhang, Hao Zhang, Jin Xu, Amala Sanjay Deshmukh, Karan Sapra, Andrew Tao, Yejin Choi, Jan Kautz, Mingjie Liu, Yi Dong

Research Track B · General AI

Training computer-use agents (CUAs) -- models that interact with graphical desktops through screenshots and keyboard/mouse actions -- requires large-scale, diverse trajectory data collected in full desktop environments. The largest public resource, AgentNet (22.5K human trajectories), leads to negative transfer when us…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.2

Virtual Simulation for Mental Health

2026-06-23 · Anna Fang

General AI

Poorly designed interventions or those deployed without adequate safeguards can harm the communities they aim to serve, thus exacerbating existing vulnerabilities and leaving individuals unsupported. This is especially the case for the mental health context, where there is a growing trend of relying on technological in…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.2

BlowLive: Blow-Based Multi-Factor Biometrics with Liveness Detection and Revocability

2026-06-24 · Eyasu Getahun Chekole, Howard Halim, Daniël Reijsbergen, Jianying Zhou

General AI

Biometric authentication systems are increasingly deployed in security-critical applications, yet existing physiological and behavioral biometrics suffer from fundamental limitations: 1) they are vulnerable to spoofing attacks due to unreliable liveness detection, 2) biometric templates may leak privacy-sensitive infor…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.0

Code World Model Preparedness Report

2026-05-01 · Daniel Song, Peter Ney, Cristina Menghini, Faizan Ahmad, Aidan Boyd, Nathaniel Li, Ziwen Han, Jean-Christophe Testud, Saisuke Okabayashi, Maeve Ryan, Jinpeng Miao, Hamza Kwisaba, Felix Binder, Spencer Whitman, Jim Gust, Esteban Arcaute, Dhaval Kapil, Jacob Kahn, Ayaz Minhas, Tristan Goodman, Lauren Deason, Alexander Vaughan, Shengjia Zhao, Summer Yue

General AI

This report documents the preparedness assessment of Code World Model (CWM), a model for code generation and reasoning about code from Meta. We conducted pre-release testing across domains identified in our Frontier AI Framework as potentially presenting catastrophic risks, and also evaluated the model's misaligned pro…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.0

From Context to Skills: Can Language Models Learn from Context Skillfully?

2026-05-03 · Shuzheng Si, Haozhe Zhao, Yu Lei, Qingyi Wang, Dingwei Chen, Zhitong Wang, Zhenhailong Wang, Kangyang Luo, Zheng Wang, Gang Chen, Fanchao Qi, Minjia Zhang, Maosong Sun

General AI

Many real-world tasks require language models (LMs) to reason over complex contexts that exceed their parametric knowledge. This calls for context learning, where LMs directly learn relevant knowledge from the given context. An intuitive solution is inference-time skill augmentation: extracting the rules and procedures…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.0

RaguTeam at SemEval-2026 Task 8: Meno and Friends in a Judge-Orchestrated LLM Ensemble for Faithful Multi-Turn Response Generation

2026-05-06 · Ivan Bondarenko, Roman Derunets, Oleg Sedukhin, Mikhail Komarov, Ivan Chernov, Mikhail Kulakov

General AI

We present our winning system for Task~B (generation with reference passages) in SemEval-2026 Task~8: MTRAGEval. Our method is a heterogeneous ensemble of seven LLMs with two prompting variants, where a GPT-4o-mini judge selects the best candidate per instance. We ranked 1st out of 26 teams, achieving a conditioned har…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.0

CXR-ContraBench: Benchmarking Negated-Option Attraction in Medical VLMs

2026-05-07 · Zhengru Fang, Yanan Ma, Yu Guo, Senkang Hu, Yixian Zhang, Hangcheng Cao, Wenbo Ding, Yuguang Fang

Research Track A · General AI

When a chest X-ray shows consolidation but the question asks which finding is present, a medical vision-language model may answer "No consolidation." This is more than an incorrect choice: it is a polarity reversal that emits a clinical statement contradicting the image. We study this failure as negated-option attracti…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.0

TIDE: Every Layer Knows the Token Beneath the Context

2026-05-07 · Ajay Jaiswal, Lauren Hannah, Han-Byul Kim, Duc Hoang, Mehrdad Farajtabar, Minsik Cho

General AI

We revisit a universally accepted but under-examined design choice in every modern LLM: a token index is looked up once at the input embedding layer and then permanently discarded. This single-injection assumption induces two structural failures: (i) the Rare Token Problem, where a Zipf-type distribution of vocabulary …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.0

From Expansion to Consolidation: Socio-Spatial Contagion Dynamics in Off-Grid PV Adoption

2026-05-10 · Roni Blushtein-Livnon, Tal Svoray, Itay Fischhendler, Havatzelet Yahel, Emir Galilee

Research Track A

In traditional rural societies, where social ties are embedded in physical space, the diffusion of emerging technologies may be amplified through socio-spatial contagion (SSC). Such processes may play a key role in accelerating residential PV adoption in off-grid regions. Yet empirical evidence on SSC in PV adoption re…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.0

One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue

2026-05-12 · Xinjie Shen, Rongzhe Wei, Peizhi Niu, Haoyu Wang, Ruihan Wu, Eli Chien, Bo Li, Pin-Yu Chen, Pan Li

General AI

Hidden malicious intent in multi-turn dialogue poses a growing threat to deployed large language models (LLMs). Rather than exposing a harmful objective in a single prompt, increasingly capable attackers can distribute their intent across multiple benign-looking turns. Recent studies show that even modern commercial mo…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.0

Distributed optimal control problems governed by poroelasticity equations

2026-05-29 · Arbaz Khan, Jeonghun J. Lee, Harpal Singh

Research Track A

In this paper, we propose and analyze a novel two-field symmetric formulation with solid displacement and fluid pressure as main unknowns for the Biot's consolidation model in poroelasticity. Firstly, we prove the well-posedness of the new formulation and then show the existence and uniqueness of optimal control where …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 7.0

QG-MIL: A Gated Transformer Aggregator for Domain-Agnostic Multiple Instance Learning in Medical Imaging

2026-06-18 · Luca Zedda, Davide Antonio Mura, Cecilia Di Ruberto, Maurizio Atzori, Muhammed Furkan Dasdelen, Carsten Marr, Andrea Loddo

General AI

Attention-based Multiple Instance Learning aggregators in medical imaging are prone to attention concentration, producing overconfident and unstable predictions. We introduce QG-MIL, a gated transformer aggregator that addresses this through four synergistic architectural components: RMSNorm-based pre-normalization, pe…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.8

SCALE-LoRA: Auditing Post-Retrieval LoRA Composition with Residual Merging and View Reliability

2026-05-02 · Shuaipeng Zhou, Yu Zhang

General AI

Libraries of Low-Rank Adaptation (LoRA) adapters are becoming a practical by-product of parameter-efficient adaptation. Once such adapters accumulate, a natural question is no longer how to train one adapter for one task, but how to reuse an open pool of adapters for a new task given only a small support set. Prior wor…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.8

FunFuzz: An LLM-Powered Evolutionary Fuzzing Framework

2026-05-04 · Mario Rodríguez Béjar, B. Romera-Paredes, Jose L. Hernández-Ramos

General AI

Modern fuzzers increasingly use Large Language Models (LLMs) to generate structured inputs, but LLM-driven fuzzing is sensitive to prompt initialization and sampling variance, which can reduce exploration efficiency and lead to redundant inputs. We present FunFuzz, a multi-island evolutionary fuzzing framework that run…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.8

SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection

2026-05-04 · Shikhar Shukla

General AI

Speculative decoding accelerates large language model (LLM) inference by using a small draft model to propose candidate tokens that a larger target model verifies. A critical hyperparameter in this process is the speculation length~$γ$, which determines how many tokens the draft model proposes per step. Nearly all exis…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.8

Relit-LiVE: Relight Video by Jointly Learning Environment Video

2026-05-07 · Weiqing Xiao, Hong Li, Xiuyu Yang, Houyuan Chen, Wenyi Li, Tianqi Liu, Shaocong Xu, Chongjie Ye, Hao Zhao, Beibei Wang

General AI

Recent advances have shown that large-scale video diffusion models can be repurposed as neural renderers by first decomposing videos into intrinsic scene representations and then performing forward rendering under novel illumination. While promising, this paradigm fundamentally relies on accurate intrinsic decompositio…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.8

Rethinking Adapter Placement: A Dominant Adaptation Module Perspective

2026-05-07 · Suoxin Zhang, Run He, Di Fang, Xiang Tan, Kaixuan Chen, Huiping Zhuang

General AI

Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method that places trainable low-rank adapters into frozen pre-trained models. Recent studies show that using fewer LoRA adapters may still maintain or even improve performance, but existing methods still distribute adapters broadly, leaving wh…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.8

Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation

2026-05-07 · Abdelrahman Zaian, Sheethal Bhat, Mohamed Abdalkader, Andreas Maier

General AI

Diabetic Retinopathy (DR) is a leading cause of preventable blindness among working-age adults worldwide, yet most automated screening systems are limited to image-level classification and lack clinically structured reporting. We propose Retina-RAG, a low-cost modular framework that jointly performs DR severity grading…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.8

Images in Sentences: Scaling Interleaved Instructions for Unified Visual Generation

2026-05-12 · Yabo Zhang, Kunchang Li, Dewei Zhou, Xinyu Huang, Xun Wang

General AI

While recent advancements in multimodal language models have enabled image generation from expressive multi-image instructions, existing methods struggle to maintain performance under complex interleaved instructions. This limitation stems from the structural separation of images and text in current paradigms, which fo…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.8

From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills

2026-05-22 · Zisu Huang, Jingwen Xu, Yifan Yang, Ziyang Gong, Qihao Yang, Muzhao Tian, Xiaohua Wang, Changze Lv, Xuemei Gao, Qi Dai, Bei Liu, Kai Qiu, Xue Yang, Dongdong Chen, Xiaoqing Zheng, Chong Luo

General AI

Language agents increasingly improve by reusing \emph{skills} -- structured procedural artifacts distilled from past experience. In particular, \emph{domain-level} and \emph{model-generated} skills are especially promising. They offer fast adaptation within a domain by encoding domain-specific recurring procedures, and…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.8

Human Decision-Making with Persuasive and Narrative LLM Explanations

2026-05-22 · Laura R. Marusich, Mary Grace Kozuch Dhooghe, Jonathan Z. Bakdash, Murat Kantarcioglu

General AI

Large language models (LLMs) have the potential to aid and improve human decision-making in classification tasks, not only by providing fairly accurate predictions, but also in their ability to generate cogent narrative explanations of those predictions. Prior work has demonstrated that people generally find AI narrati…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.8

Multilingual Knowledge Transfer under Data Constraints via Lexical Interventions

2026-05-22 · Anastasiia Sedova, Natalie Schluter, Skyler Seto, Maartje ter Hoeve

General AI

Cross-lingual knowledge transfer is critical for building high-performing multilingual language models for languages with insufficient training data. When target language data is scarce, the knowledge required for many downstream tasks involving scientific reasoning, commonsense inference, and world knowledge must be a…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.8

PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

2026-05-22 · Yifan Lu, Qi Wu, Jay Zhangjie Wu, Zian Wang, Huan Ling, Sanja Fidler, Xuanchi Ren

General AI

Most practical high-resolution text-to-image systems, including latent diffusion and autoregressive models, perform generation in a compact latent space, and a decoder maps the generated latents back to pixels. Yet the latent-to-pixel decoder is reconstruction-oriented, optimized to invert the encoder rather than synth…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.8

SFG-ROS: A Resource-Aware Framework for Dense Multi-Agent Perception

2026-05-22 · Constantin Blessing, Elias Geiger, Jakob Häringer, Dennis Grewe, Markus Enzweiler

General AI

Deploying heterogeneous multi-agent robot fleets for collaborative perception requires robust data exchange and scalable software architectures. However, standard ROS 2 implementations often suffer from network saturation, namespace collisions, and severe computational overhead when distributing dense sensor streams ac…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.8

BlazeEdit: Generalist Image Editing on Mobile Devices with Image-to-Image Diffusion Models

2026-05-27 · Fei Deng, Yanwu Xu, Zhipeng Bao, Zhixing Zhang, Haolin Jia, Karthik Raveendran, Jianing Wei

General AI

The remarkable generation quality of modern diffusion models often comes at the cost of massive parameter counts, which necessitate server-side inference with significant computational costs and potential privacy risks. Consequently, there is growing momentum toward developing efficient on-device alternatives. While re…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.8

GPIC: A Giant Permissive Image Corpus for Visual Generation

2026-05-28 · Keshigeyan Chandrasegaran, Kyle Sargent, Suchir Agarwal, Michael Jang, Michael Poli, Juan Carlos Niebles, Justin Johnson, Jiajun Wu, Li Fei-Fei

General AI

Studying scalable methods for visual generative modeling requires large, accessible, and stable datasets. We introduce GPIC, a Giant Permissive Image Corpus of approximately 28 trillion pixels. GPIC comprises diverse internet images captioned by a state-of-the-art vision-language model, including 100M training, 200K va…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.8

Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software

2026-05-28 · Nhat-Minh Nguyen

General AI

Are AI agents tools, co-authors, or researchers? We present a quantified case study ($N=1$): a physicist supervising an AI coding agent (Claude Code, Sonnet and Opus models) over 12 work days and 57 sessions to build CLAX-PT, a differentiable one-loop perturbation theory module in JAX. We documented and classified 15 s…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.8

Choosing the Lens: Strategic Perspective Activation in Context-Dependent Argumentation

2026-05-29 · Albert Sadowski, Jarosław A. Chudziak

General AI

The same arguments often need to be evaluated under different external regimes. An agent with influence over the regime has a strategic lever that standard formalisms do not directly capture. We introduce context-dependent argumentation frameworks (CDAFs), an extension of Dung's theory in which a defeat function determ…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.8

Personalize Your Large Vision-language Models With In-context Prompt Tuning

2026-05-29 · Yanshu Li, Jiaqian Li, Kuai Yu, Xi Xiao, Dongfang Liu, Tianyang Wang, Ruixiang Tang

General AI

Large vision-language models (LVLMs) have demonstrated strong general multimodal capability and are increasingly deployed in downstream systems. This trend has driven growing interest in LVLM personalization, which aims to enable models to quickly and effectively learn out-of-distribution multimodal concepts to meet us…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.8

Reliable Multilingual Orthopedic Decision Support from Clinical Narratives: Language-Aware Adaptation and Verification-Guided Deferral

2026-05-29 · Danish Ali, Li Xiaojian, Sundas Iqbal, Farrukh Zaidi

General AI

Multilingual orthopedic decision support remains challenging in low-resource healthcare settings, where clinical narratives contain specialized terminology, mixed scripts, incomplete evidence, label imbalance and language-dependent documentation patterns. This article presents a reliability-oriented framework for class…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.8

Stateful Online Monitoring Catches Distributed Agent Attacks

2026-05-29 · Davis Brown, Samarth Bhargav, Arav Santhanam, Kasper Hong, Ivan Zhang, Matan Shtepel, Steffi Chern, Alexander Robey, Eric Wong, Hamed Hassani

General AI

Language models can find thousands of severe software vulnerabilities, and agents are increasingly being misused for cyberattacks. To avoid detection, attackers frequently distribute their misuse, splitting a harmful task across many user accounts so each individual transcript looks benign. Because safety monitors scor…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.8

When Are Multimodal Predictions Biologically Supported? A Diagnostic Evaluation Framework

2026-05-29 · Dylan Steiner, Gustavo Arango-Argoty, Gerald Sun, Etai Jacob

General AI

Multimodal models in oncology can produce accurate predictions, but accurate prediction does not reveal whether the model has learned biology that is shared across modalities, biology confined to one modality, or spurious correlations that reflect confounders rather than genuine biology. We introduce DECAT, a model-agn…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

Extending Precipitation Nowcasting Horizons via Spectral Fusion of Radar Observations and Foundation Model Priors

2026-03-23 · Yuze Qin, Qingyong Li, Zhiqing Guo, Wen Wang, Yan Liu, Yangli-ao Geng

General AI

Precipitation nowcasting is critical for disaster mitigation and aviation safety. However, radar-only models frequently suffer from a lack of large-scale atmospheric context, leading to performance degradation at longer lead times. While integrating meteorological variables predicted by weather foundation models offers…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

AVO: Agentic Variation Operators for Autonomous Evolutionary Search

2026-03-25 · Terry Chen, Zhifan Ye, Bing Xu, Zihao Ye, Timmy Liu, Ali Hassani, Tianqi Chen, Andrew Kerr, Haicheng Wu, Yang Xu, Yu-Jung Chen, Hanfeng Chen, Aditya Kane, Ronny Krashinsky, Ming-Yu Liu, Vinod Grover, Luis Ceze, Roger Bringmann, John Tran, Wei Liu, Fung Xie, Michael Lightstone, Humphrey Shi

General AI

Agentic Variation Operators (AVO) are a new family of evolutionary variation operators that replace the fixed mutation, crossover, and hand-designed heuristics of classical evolutionary search with autonomous coding agents. Rather than confining a language model to candidate generation within a prescribed pipeline, AVO…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.5

GridVAD: Open-Set Video Anomaly Detection via Spatial Reasoning over Stratified Frame Grids

2026-03-26 · Mohamed Eltahir, Ahmed O. Ibrahim, Obada Siralkhatim, Tabarak Abdallah, Sondos Mohamed

Research Track A · General AI

Vision-Language Models (VLMs) are powerful open-set reasoners, yet their direct use as anomaly detectors in video surveillance is fragile: without calibrated anomaly priors, they alternate between missed detections and hallucinated false alarms. We argue the problem is not the VLM itself but how it is used. VLMs should…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

Lingshu-Cell: A generative cellular world model for transcriptome modeling toward virtual cells

2026-03-26 · Han Zhang, Guo-Hua Yuan, Chaohao Yuan, Tingyang Xu, Tian Bian, Hong Cheng, Wenbing Huang, Deli Zhao, Yu Rong

General AI

Modeling cellular states and predicting their responses to perturbations are central challenges in computational biology and the development of virtual cells. Existing foundation models for single-cell transcriptomics provide powerful static representations, but they do not explicitly model the distribution of cellular…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.5

Dual-Stage Invariant Continual Learning under Extreme Visual Sparsity

2026-03-27 · Rangya Zhang, Jiaping Xiao, Lu Bai, Yuhang Zhang, Mir Feroskhan

Research Track A

Continual learning seeks to maintain stable adaptation under non-stationary environments, yet this problem becomes particularly challenging in object detection, where most existing methods implicitly assume relatively balanced visual conditions. In extreme-sparsity regimes, such as those observed in space-based residen…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.5

HAD: Heterogeneity-Aware Distillation for Lifelong Heterogeneous Learning

2026-03-27 · Xuerui Zhang, Xuehao Wang, Zhan Zhuang, Linglan Zhao, Ziyue Li, Xinmin Zhang, Zhihuan Song, Yu Zhang

Research Track A

Lifelong learning aims to preserve knowledge acquired from previous tasks while incorporating knowledge from a sequence of new tasks. However, most prior work explores only streams of homogeneous tasks (\textit{e.g.}, only classification tasks) and neglects the scenario of learning across heterogeneous tasks that posse…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.5

Auto-Stabilized Weak Galerkin Finite Element Methods for Biot's consolidation model on Non-Convex Polytopal Meshes

2026-03-29 · Chunmei Wang, Shangyou Zhang

Research Track A

This paper presents an auto-stabilized weak Galerkin (WG) finite element method for the Biot's consolidation model within the classical displacement-pressure two-field formulation. Unlike traditional WG approaches, the proposed scheme achieves numerical stability without the requirement of traditional stabilizers. Spat…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

MultiGen: Level-Design for Editable Multiplayer Worlds in Diffusion Game Engines

2026-03-30 · Ryan Po, David Junhao Zhang, Amir Hertz, Gordon Wetzstein, Neal Wadhwa, Nataniel Ruiz

General AI

Video world models have shown immense promise for interactive simulation and entertainment, but current systems still struggle with two important aspects of interactivity: user control over the environment for reproducible, editable experiences, and shared inference where players hold influence over a common world. To …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

2026-04-06 · Zijun Wang, Haoqin Tu, Letian Zhang, Hardy Chen, Juncheng Wu, Xiangyan Liu, Zhenlong Yuan, Tianyu Pang, Michael Qizhe Shieh, Fengze Liu, Zeyu Zheng, Huaxiu Yao, Yuyin Zhou, Cihang Xie

General AI

OpenClaw, the most widely deployed personal AI agent in early 2026, operates with full local system access and integrates with sensitive services such as Gmail, Stripe, and the filesystem. While these broad privileges enable high levels of automation and powerful personalization, they also expose a substantial attack s…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment

2026-04-07 · Rishab Balasubramanian, Pin-Jie Lin, Rituraj Sharma, Anjie Fang, Fardin Abdi, Viktor Rozgic, Zheng Du, Mohit Bansal, Tu Vu

General AI

We investigate whether post-trained capabilities can be transferred across models without retraining, with a focus on transfer across different model scales. We propose the Master Key Hypothesis, which states that model capabilities correspond to directions in a low-dimensional latent subspace that induce specific beha…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

Introspective Diffusion Language Models

2026-04-13 · Yifan Yu, Yuqing Jian, Junxiong Wang, Zhongzhu Zhou, Donglin Zhuang, Xinyu Fang, Sri Yanamandra, Xiaoxia Wu, Qingyang Wu, Shuaiwen Leon Song, Tri Dao, Ben Athiwaratkun, James Zou, Fan Lai, Chenfeng Xu

General AI

Diffusion language models promise parallel generation, yet still lag behind autoregressive (AR) models in quality. We stem this gap to a failure of introspective consistency: AR models agree with their own generations, while DLMs often do not. We define the introspective acceptance rate, which measures whether a model …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

Representations Before Pixels: Semantics-Guided Hierarchical Video Prediction

2026-04-13 · Efstathios Karypidis, Spyros Gidaris, Nikos Komodakis

General AI

Accurate future video prediction requires both high visual fidelity and consistent scene semantics, particularly in complex dynamic environments such as autonomous driving. We present Re2Pix, a hierarchical video prediction framework that decomposes forecasting into two stages: semantic representation prediction and re…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.5

Statehood Without Capacity

2026-04-13 · Rok Spruk

Research Track A

This paper develops a political-economy theory of statehood without capacity. I argue that under specific institutional and geopolitical conditions, a polity can become trapped in an equilibrium of nominal statehood: a state in which claims to sovereignty, external recognition, and symbolic legitimacy persist or even s…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

C2: Scalable Rubric-Augmented Reward Modeling from Binary Preferences

2026-04-15 · Akira Kawabata, Saku Sugawara

General AI

Rubric-augmented verification guides reward models with explicit evaluation criteria, yielding more reliable judgments than single-model verification. However, most existing methods require costly rubric annotations, limiting scalability. Moreover, we find that rubric generation is vulnerable to a failure of cooperatio…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning

2026-04-17 · Jiaxi Bi, Tongxu Luo, Wenyu Du, Zhengyang Tang, Benyou Wang

General AI

Parallel reasoning enhances Large Reasoning Models (LRMs) but incurs prohibitive costs due to futile paths caused by early errors. To mitigate this, path pruning at the prefix level is essential, yet existing research remains fragmented without a standardized framework. In this work, we propose the first systematic tax…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.5

From Papers to Progress: Rethinking Knowledge Accumulation in Software Engineering

2026-04-17 · Jason Cusati, Chris Brown

Research Track A

Software engineering research has experienced rapid growth in both output and participation over the past decades. Yet concerns persist about the field's ability to accumulate, integrate, and reuse knowledge in ways that support long-term progress. To better understand how the community itself perceives these challenge…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.5

Analyzing Process Data from Computer-Based Assessments: A Tutorial on Preprocessing, Feature Extraction, and Model-Based Inference

2026-04-18 · Daeun Hwangbo, Junyeong Park, Minjeong Jeon, Ick Hoon Jin

Research Track A

Computer-based assessments routinely generate detailed interaction logs -- commonly referred to as process data -- that record every action a respondent performs during task completion, yet systematic preprocessing guidance, integrated analytical workflows, and cross-method consistency checks remain scarce in the liter…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.5

Can Institutional Integration of Western Balkans Stock Exchanges Strengthen Monetary Transmission?

2026-04-20 · Stefan Tanevski

Research Track A

This paper asks how institutional stock-market integration reshapes the transmission of monetary policy through asset prices in small open economies. Motivated by the persistent segmentation of Western Balkan capital markets, we develop a two-stage counterfactual transmission framework to identify how stock-exchange co…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

The Last Harness You'll Ever Build

2026-04-22 · Haebin Seong, Li Yin, Haoran Zhang

General AI

AI agents are increasingly deployed on complex, domain-specific workflows -- navigating enterprise web applications that require dozens of clicks and form fills, orchestrating multi-step research pipelines that span search, extraction, and synthesis, automating code review across unfamiliar repositories, and handling c…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think

2026-04-25 · Bingda Tang, Yuhui Zhang, Xiaohan Wang, Jiayuan Mao, Ludwig Schmidt, Serena Yeung-Levy

General AI

Aligning denoising generative models with human preferences or verifiable rewards remains a key challenge. While policy-gradient online reinforcement learning (RL) offers a principled post-training framework, its direct application is hindered by the intractable likelihoods of these models. Prior work therefore either …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling

2026-04-26 · Zhen Ye, Xu Tan, Aoxiong Yin, Hongzhan Lin, Guangyan Zhang, Peiwen Sun, Yiming Li, Chi-Min Chan, Wei Ye, Shikun Zhang, Wei Xue

General AI

Joint audio-video generation models have shown that unified generation yields stronger cross-modal coherence than cascaded approaches. However, existing models couple modalities throughout denoising via pervasive attention, treating high-level semantics and low-level details in a fully entangled manner. This is subopti…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

A Survey on LLM-based Conversational User Simulation

2026-04-27 · Bo Ni, Leyao Wang, Yu Wang, Branislav Kveton, Franck Dernoncourt, Yu Xia, Hongjie Chen, Reuben Leura, Samyadeep Basu, Subhojyoti Mukherjee, Puneet Mathur, Nesreen Ahmed, Junda Wu, Li Li, Huixin Zhang, Ruiyi Zhang, Tong Yu, Sungchul Kim, Jiuxiang Gu, Zhengzhong Tu, Alexa Siu, Zichao Wang, David Seunghyun Yoon, Nedim Lipka, Namyong Park, Zihao Lin, Trung Bui, Yue Zhao, Tyler Derr, Ryan A. Rossi

General AI

User simulation has long played a vital role in computer science due to its potential to support a wide range of applications. Language, as the primary medium of human communication, forms the foundation of social interaction and behavior. Consequently, simulating conversational behavior has become a key area of study.…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

A Systematic Post-Train Framework for Video Generation

2026-04-28 · Zeyue Xue, Siming Fu, Jie Huang, Shuai Lu, Haoran Li, Yijun Liu, Yuming Li, Xiaoxuan He, Mengzhao Chen, Haoyang Huang, Nan Duan, Ping Luo

General AI

While large-scale video diffusion models have demonstrated impressive capabilities in generating high-resolution and semantically rich content, a significant gap remains between their pretraining performance and real-world deployment requirements due to critical issues such as prompt sensitivity, temporal inconsistency…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models

2026-04-28 · Jiayi Guo, Linqing Wang, Jiangshan Wang, Yang Yue, Zeyu Liu, Zhiyuan Zhao, Qinglin Lu, Gao Huang, Chunyu Wang

General AI

Unified multimodal models (UMMs) integrate visual understanding and generation within a single framework. For text-to-image (T2I) tasks, this unified capability allows UMMs to refine outputs after their initial generation, potentially extending the performance upper bound. Current UMM-based refinement methods primarily…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising

2026-04-29 · Jun Guo, Qiwei Li, Peiyan Li, Zilong Chen, Nan Sun, Yifei Su, Heyun Wang, Yuan Zhang, Xinghang Li, Huaping Liu

General AI

We propose X-WAM, a Unified 4D World Model that unifies real-time robotic action execution and high-fidelity 4D world synthesis (video + 3D reconstruction) in a single framework, addressing the critical limitations of prior unified world models (e.g., UWM) that only model 2D pixel-space and fail to balance action effic…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

Instruction-Guided Poetry Generation in Arabic and Its Dialects

2026-04-30 · Abdelrahman Sadallah, Kareem Elozeiri, Mervat Abassy, Rania Elbadry, Mohamed Anwar, Abed Alhakim Freihat, Preslav Nakov, Fajri Koto

General AI

Poetry has long been a central art form for Arabic speakers, serving as a powerful medium of expression and cultural identity. While modern Arabic speakers continue to value poetry, existing research on Arabic poetry within Large Language Models (LLMs) has primarily focused on analysis tasks such as interpretation or m…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion

2026-05-29 · Stine Lyngsø Beltoft, William Brach, Federico Torrielli, Jacob Nielsen, Annemette Brok Pirchert, Filippo Tonini, Peter Schneider-Kamp, Lukas Galke Poech

Research Track A · General AI

Monitoring autonomous language model agents currently relies mostly on surface behavior. But what happens when agent populations invent new languages with the goal of avoiding human oversight. Here, we study the emergent languages on Moltbook. For this, we build upon the Moltbook Files dataset and apply a two-stage app…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

RobotValues: Evaluating Household Robots When Human Values Conflict

2026-06-02 · Jongwook Han, Hyeongjin Kim, Yohan Jo

General AI

While household robots are often evaluated based on task completion, everyday domestic environments involve value-conflicting situations in which robots are expected to choose actions that prioritize other values than task success, such as human autonomy, efficiency, or social appropriateness. Yet, there are no benchma…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

Complexity-Balanced Diffusion Splitting

2026-06-04 · Noam Issachar, Dani Lischinski, Raanan Fattal

General AI

Standard continuous-time generative models rely on monolithic architectures that must navigate vastly different signal regimes, from isotropic noise to intricate data distributions. While scaling model capacity improves performance, deploying a massive network uniformly across the entire generative timeline is inherent…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

2026-06-04 · Rui Zhao, Kaiming Yang, Jifeng Zhu, Siyang Chen, Ziqi Wang, Weijia Wu, Kevin Qinghong Lin, Heng Wang, Mike Zheng Shou

General AI

Video generation models have made impressive strides in synthesizing visually compelling content, yet their outputs remain confined to the virtual domain. A natural question follows: how well do these models reflect the physical world when their generated videos leave the screen and enter reality? We propose robotic ma…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

A Stationary (and Therefore Compatible) Representation is All You Need

2026-06-10 · Niccolò Biondi, Federico Pernici, Simone Ricci, Alberto Del Bimbo

General AI

Learning compatible representations aims to learn feature representations that can be used interchangeably over time whenever a model undergoes updates. In this paper, we demonstrate that stationary representations learned by d-Simplex fixed classifiers imply compatibility as in its formal definition. This result estab…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

Rethinking the Role of Efficient Attention in Hybrid Architectures

2026-06-13 · Ziqing Qiao, Yinuo Xu, Chaojun Xiao, Zhou Su, Zihan Zhou, Yingfa Chen, Xiaoyue Xu, Xu Han, Zhiyuan Liu

General AI

Modern language models increasingly adopt hybrid architectures that combine full attention with efficient attention modules, such as sliding-window attention (SWA) and recurrent sequence mixers. However, how these efficient modules shape model capabilities remains poorly understood. To address this gap, we conduct a sy…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

SP^3: Spherical Priors for Plug-and-Play Restoration

2026-06-15 · Sean Man, Ron Raphaeli, Matan Kleiner, Or Ronai

General AI

In this paper, we introduce SP^3, a novel Plug-and-Play algorithm that accelerates maximum a posteriori image restoration by replacing denoisers with Spherical Encoders (SE) as generative priors. SP^3 approximates the intractable proximal prior step by utilizing the SE tightly structured latent space as a robust projec…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

Text-Vision Co-Instructed Image Editing

2026-06-15 · Chenxi Xie, Yuhui Wu, Qiaosi Yi, Lei Zhang

General AI

Existing image editing methods can be generally categorized into textual instruction-based and visual prompt-based ones. Textual instructions are semantically expressive, but are limited by the coarse granularity of spatial control of the editing results. In contrast, visual prompts such as drag and point can provide p…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion

2026-06-16 · Weichen Fan, Haiwen Diao, Penghao Wu, Ziwei Liu

General AI

Pixel-space diffusion models are trained on full-bandwidth noisy images, yet the useful signal available to the denoiser is strongly frequency dependent. Under rectified-flow diffusion and natural-image power-law spectra, the per-band data-to-noise contour k^{*}(t) = (1-t)^{-2/α} separates a signal-bearing low-frequenc…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding

2026-06-16 · Jingyuan Huang, Zuming Huang, Yucheng Shi, Tianze Yang, Xiaoming Zhai, Wei Chu, Ninghao Liu

General AI

Graphical user interface (GUI) grounding requires vision-language models (VLMs) to identify small target elements in high-resolution screenshots and predict precise screen coordinates. On-policy self-distillation (OPSD) is a promising post-training approach for this coordinate-sensitive task, since it provides dense to…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.5

Physics-IQ Verified

2026-06-17 · Tim Rädsch, Yuki M Asano, Hilde Kuehne, Stefan Bauer, Priyank Jaini, Robert Geirhos, Carsten T. Lüth

General AI

Video generative models ( VGMs) have become a new frontier that can be used not just for video generation but for a multitude of downstream tasks, including world modeling. To advance these tasks, a good video model must understand the physical reality of the world. Evaluating this understanding is an emerging field an…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Scaling DoRA: High-Rank Adaptation via Factored Norms and Fused Kernels

2026-03-23 · Alexandra Zelenin, Alexandra Zhuravlyova

General AI

Weight-Decomposed Low-Rank Adaptation (DoRA) extends LoRA by decoupling weight magnitude from direction, but its forward pass requires the row-wise norm of W + sBA, a computation that every major framework we surveyed implements by materializing the dense [d_out, d_in] product BA. At d_in = 8192 and rank r = 384, a sin…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 6.3

Enhancing Efficiency and Performance in Deepfake Audio Detection through Neuron-level Dropin & Neuroplasticity Mechanisms

2026-03-25 · Yupei Li, Shuaijie Shao, Manuel Milling, Björn Schuller

General AI

Current audio deepfake detection has achieved remarkable performance using diverse deep learning architectures such as ResNet, and has seen further improvements with the introduction of large models (LMs) like Wav2Vec. The success of large language models (LLMs) further demonstrates the benefits of scaling model parame…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Beyond Via: Analysis and Estimation of the Impact of Large Language Models in Academic Papers

2026-03-26 · Mingmeng Geng, Yuhang Dong, Thierry Poibeau

General AI

Through an analysis of arXiv papers, we report several shifts in word usage that are likely driven by large language models (LLMs) but have not previously received sufficient attention, such as the increased frequency of "beyond" and "via" in titles and the decreased frequency of "the" and "of" in abstracts. Due to the…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

PixelSmile: Toward Fine-Grained Facial Expression Editing

2026-03-26 · Jiabin Hua, Hengyuan Xu, Aojie Li, Wei Cheng, Gang Yu, Xingjun Ma, Yu-Gang Jiang

General AI

Fine-grained facial expression editing has long been limited by intrinsic semantic overlap. To address this, we construct the Flex Facial Expression (FFE) dataset with continuous affective annotations and establish FFE-Bench to evaluate structural confusion, editing accuracy, linear controllability, and the trade-off b…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Self-Improvement of Large Language Models: A Technical Overview and Future Outlook

2026-03-26 · Haoyan Yang, Mario Xerri, Solha Park, Huajian Zhang, Yiyang Feng, Sai Akhil Kogilathota, Jiawei Zhou

General AI

As large language models (LLMs) continue to advance, improving them solely through human supervision is becoming increasingly costly and limited in scalability. As models approach human-level capabilities in certain domains, human feedback may no longer provide sufficiently informative signals for further improvement. …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Adaptive Block-Scaled Data Types

2026-03-30 · Jack Cook, Hyemin S. Lee, Kathryn Le, Junxian Guo, Giovanni Traverso, Anantha P. Chandrakasan, Song Han

General AI

NVFP4 has grown increasingly popular as a 4-bit format for quantizing large language models due to its hardware support and its ability to retain useful information with relatively few bits per parameter. However, the format is not without limitations: recent work has shown that NVFP4 suffers from its error distributio…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining

2026-03-30 · Anuj Diwan, Eunsol Choi, David Harwath

General AI

We introduce ParaSpeechCLAP, a dual-encoder contrastive model that maps speech and text style captions into a common embedding space, supporting a wide range of intrinsic (speaker-level) and situational (utterance-level) descriptors (such as pitch, texture and emotion) far beyond the narrow set handled by existing mode…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

SAGAI-MID: A Generative AI-Driven Middleware for Dynamic Runtime Interoperability

2026-03-30 · Oliver Aleksander Larsen, Mahyar T. Moghaddam

General AI

Modern distributed systems integrate heterogeneous services, REST APIs with different schema versions, GraphQL endpoints, and IoT devices with proprietary payloads that suffer from persistent schema mismatches. Traditional static adapters require manual coding for every schema pair and cannot handle novel combinations …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Temporal Credit Is Free

2026-03-30 · Aur Shalev Merin

General AI

Recurrent networks do not need Jacobian propagation to adapt online. The hidden state already carries temporal credit through the forward pass; immediate derivatives suffice if you stop corrupting them with stale trace memory and normalize gradient scales across parameter groups. An architectural rule predicts when nor…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

XSPA: Crafting Imperceptible X-Shaped Sparse Adversarial Perturbations for Transferable Attacks on VLMs

2026-03-30 · Chengyin Hu, Jiaju Han, Xuemeng Sun, Qike Zhang, Yiwei Wei, Ang Li, Chunlei Meng, Xiang Chen, Jiahuan Long

General AI

Vision-language models (VLMs) rely on a shared visual-textual representation space to perform tasks such as zero-shot classification, image captioning, and visual question answering (VQA). While this shared space enables strong cross-task generalization, it may also introduce a common vulnerability: small visual pertur…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Multimodal Higher-Order Brain Networks: A Topological Signal Processing Perspective

2026-03-31 · Breno C. Bispo, Stefania Sardellitti, Juliano B. Lima, Fernando A. N. Santos

General AI

Brain connectomics is still largely dominated by pairwise-based models, such as graphs, which cannot represent circulatory or higher-order functional interactions. In this paper, we propose a multimodal framework based on Topological Signal Processing (TSP) that models the brain as a higher-order topological domain and…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Quantization with Unified Adaptive Distillation to enable multi-LoRA based one-for-all Generative Vision Models on edge

2026-03-31 · Sowmya Vajrala, Aakash Parmar, Prasanna R, Sravanth Kodavanti, Manjunath Arveti, Srinivas Soumitri Miriyala, Ashok Senapati

General AI

Generative Artificial Intelligence (GenAI) features such as image editing, object removal, and prompt-guided image transformation are increasingly integrated into mobile applications. However, deploying Large Vision Models (LVMs) for such tasks on resource-constrained devices remains challenging due to their high memor…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Tucker Attention: A generalization of approximate attention mechanisms

2026-03-31 · Timon Klein, Jonas Kusch, Sebastian Sager, Stefan Schnake, Steffen Schotthöfer

General AI

The pursuit of reducing the memory footprint of the self-attention mechanism in multi-headed self attention (MHA) spawned a rich portfolio of methods, e.g., group-query attention (GQA) and multi-head latent attention (MLA). The methods leverage specialized low-rank factorizations across embedding dimensions or attentio…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

BVFLMSP : Bayesian Vertical Federated Learning for Multimodal Survival with Privacy

2026-04-02 · Abhilash Kar, Basisth Saha, Tanmay Sen, Biswabrata Pradhan

General AI

Multimodal time-to-event prediction often requires integrating sensitive data distributed across multiple parties, making centralized model training impractical due to privacy constraints. At the same time, most existing multimodal survival models produce single deterministic predictions without indicating how confiden…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

MiCA Learns More Knowledge Than LoRA and Full Fine-Tuning

2026-04-02 · Sten Rüdiger, Sebastian Raschka

General AI

Minor Component Adaptation (MiCA) is a novel parameter-efficient fine-tuning method for large language models that focuses on adapting underutilized subspaces of model representations. Unlike conventional methods such as Low-Rank Adaptation (LoRA), which target dominant subspaces, MiCA leverages Singular Value Decompos…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Smoothing the Landscape: Causal Structure Learning via Diffusion Denoising Objectives

2026-04-02 · Hao Zhu, Di Zhou, Donna Slonim

General AI

Understanding causal dependencies in observational data is critical for informing decision-making. These relationships are often modeled as Bayesian Networks (BNs) and Directed Acyclic Graphs (DAGs). Existing methods, such as NOTEARS and DAG-GNN, often face issues with scalability and stability in high-dimensional data…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

ClickAIXR: On-Device Multimodal Vision-Language Interaction with Real-World Objects in Extended Reality

2026-04-06 · Dawar Khan, Alexandre Kouyoumdjian, Xinyu Liu, Omar Mena, Dominik Engel, Ivan Viola

General AI

We present ClickAIXR, a novel on-device framework for multimodal vision-language interaction with objects in extended reality (XR). Unlike prior systems that rely on cloud-based AI (e.g., ChatGPT) or gaze-based selection (e.g., GazePointAR), ClickAIXR integrates an on-device vision-language model (VLM) with a controlle…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

FAVE: Flow-based Average Velocity Establishment for Sequential Recommendation

2026-04-06 · Ke Shi, Yao Zhang, Feng Guo, Jinyuan Zhang, JunShuo Zhang, Shen Gao, Shuo Shang

General AI

Generative recommendation has emerged as a transformative paradigm for capturing the dynamic evolution of user intents in sequential recommendation. While flow-based methods improve the efficiency of diffusion models, they remain hindered by the ``Noise-to-Data'' paradigm, which introduces two critical inefficiencies: …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement

2026-04-07 · Qimin Zhong, Hao Liao, Haiming Qin, Mingyang Zhou, Rui Mao, Wei Chen, Naipeng Chao

General AI

Whether Large Language Models (LLMs) develop coherent internal world models remains a core debate. While conventional Next-Token Prediction (NTP) focuses on one-step-ahead supervision, Multi-Token Prediction (MTP) has shown promise in learning more structured representations. In this work, we provide a theoretical pers…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Who Governs the Machine? A Machine Identity Governance Taxonomy (MIGT) for AI Systems Operating Across Enterprise and Geopolitical Boundaries

2026-04-07 · Andrew Kurtz, Klaudia Krawiecka

General AI

The governance of artificial intelligence has a blind spot: the machine identities that AI systems use to act. AI agents, service accounts, API tokens, and automated workflows now outnumber human identities in enterprise environments by ratios exceeding 80 to 1, yet no integrated framework exists to govern them. A sing…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

A Little Rank Goes a Long Way: Random Scaffolds with LoRA Adapters Are All You Need

2026-04-09 · Hananel Hazan, Yanbo Zhang, Benedikt Hartl, Michael Levin

General AI

How many of a neural network's parameters actually encode task-specific information? We investigate this question with LottaLoRA, a training paradigm in which every backbone weight is drawn at random and frozen; only low-rank LoRA adapters are trained. Across nine benchmarks spanning diverse architecture families from …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Bridging the Gap between Micro-scale Traffic Simulation and 4D Digital Cityscapes

2026-04-09 · Longxiang Jiao, Lukas Hofmann, Yiru Yang, Zhanyi Wu, Jonas Egeler

General AI

While micro-scale traffic simulations provide essential data for urban planning, they are rarely coupled with the high-fidelity visualization or auralization necessary for effective stakeholder communication. In this work, we present a real-time 4D visualization framework that couples the SUMO traffic with a photoreali…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Fail2Drive: Benchmarking Closed-Loop Driving Generalization

2026-04-09 · Simon Gerstenecker, Andreas Geiger, Katrin Renz

General AI

Generalization under distribution shift remains a central bottleneck for closed-loop autonomous driving. Although simulators like CARLA enable safe and scalable testing, existing benchmarks rarely measure true generalization: they typically reuse training scenarios at test time. Success can therefore reflect memorizati…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Scal3R: Scalable Test-Time Training for Large-Scale 3D Reconstruction

2026-04-09 · Tao Xie, Peishan Yang, Yudong Jin, Yingfeng Cai, Wei Yin, Weiqiang Ren, Qian Zhang, Wei Hua, Sida Peng, Xiaoyang Guo, Xiaowei Zhou

General AI

This paper addresses the task of large-scale 3D scene reconstruction from long video sequences. Recent feed-forward reconstruction models have shown promising results by directly regressing 3D geometry from RGB images without explicit 3D priors or geometric constraints. However, these methods often struggle to maintain…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

What Drives Representation Steering? A Mechanistic Case Study on Steering Refusal

2026-04-09 · Stephen Cheng, Sarah Wiegreffe, Dinesh Manocha

General AI

Applying steering vectors to large language models (LLMs) is an efficient and effective model alignment technique, but we lack an interpretable explanation for how it works-- specifically, what internal mechanisms steering vectors affect and how this results in different model outputs. To investigate the causal mechani…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Low-Data Supervised Adaptation Outperforms Prompting for Cloud Segmentation Under Domain Shift

2026-04-10 · Harshith Kethavath, Weiming Hu

General AI

Adapting vision-language models to remote sensing imagery presents a fundamental challenge: both the visual and linguistic distributions of satellite data lie far outside natural image pretraining corpora. Despite this, prompting remains the dominant deployment paradigm, driven by the assumption that domain-specific la…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

FishRoPE: Projective Rotary Position Embeddings for Omnidirectional Visual Perception

2026-04-12 · Rahul Ahuja, Mudit Jain, Bala Murali Manoghar Sai Sudhakar, Venkatraman Narayanan, Pratik Likhar, Varun Ravi Kumar, Senthil Yogamani

General AI

Vision foundation models (VFMs) and Bird's Eye View (BEV) representation have advanced visual perception substantially, yet their internal spatial representations assume the rectilinear geometry of pinhole cameras. Fisheye cameras, widely deployed on production autonomous vehicles for their surround-view coverage, exhi…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

A Mamba-Based Multimodal Network for Multiscale Blast-Induced Rapid Structural Damage Assessment

2026-04-13 · Wanli Ma, Sivasakthy Selvakumaran, Dain G. Farrimond, Adam A. Dennis, Samuel E. Rigby

General AI

Accurate and rapid structural damage assessment (SDA) is crucial for post-disaster management, helping responders prioritise resources, plan rescues, and support recovery. Traditional field inspections, though precise, are limited by accessibility, safety risks, and time constraints, especially after large explosions. …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Budget-Aware Uncertainty for Radiotherapy Segmentation QA Using nnU-Net

2026-04-13 · Ricardo Coimbra Brioso, Lorenzo Mondo, Damiano Dei, Nicola Lambri, Pietro Mancosu, Marta Scorsetti, Daniele Loiacono

General AI

Accurate delineation of the Clinical Target Volume (CTV) is essential for radiotherapy planning, yet remains time-consuming and difficult to assess, especially for complex treatments such as Total Marrow and Lymph Node Irradiation (TMLI). While deep learning-based auto-segmentation can reduce workload, safe clinical de…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Psychological Concept Neurons: Can Neural Control Bias Probing and Shift Generation in LLMs?

2026-04-13 · Yuto Harada, Hiro Taiyo Hamada

General AI

Using psychological constructs such as the Big Five, large language models (LLMs) can imitate specific personality profiles and predict a user's personality. While LLMs can exhibit behaviors consistent with these constructs, it remains unclear where and how they are represented inside the model and how they relate to b…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Generative Refinement Networks for Visual Synthesis

2026-04-14 · Jian Han, Jinlai Liu, Jiahuan Wang, Bingyue Peng, Zehuan Yuan

General AI

While diffusion models dominate the field of visual generation, they are computationally inefficient, applying a uniform computational effort regardless of different complexity. In contrast, autoregressive (AR) models are inherently complexity-aware, as evidenced by their variable likelihoods, but are often hindered by…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Representation geometry shapes task performance in vision-language modeling for CT enterography

2026-04-14 · Cristian Minoccheri, Emily Wittrup, Kayvan Najarian, Ryan Stidham

General AI

Computed tomography (CT) enterography is a primary imaging modality for assessing inflammatory bowel disease (IBD), yet the representational choices that best support automated analysis of this modality are unknown. We present the first study of vision-language transfer learning on abdominal CT enterography and identif…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

2026-04-14 · Yaxuan Li, Yuxin Zuo, Bingxiang He, Jinqian Zhang, Chaojun Xiao, Cheng Qian, Tianyu Yu, Huan-ang Gao, Wenkai Yang, Zhiyuan Liu, Ning Ding

General AI

On-policy distillation (OPD) has become a core technique in the post-training of large language models, yet its training dynamics remain poorly understood. This paper provides a systematic investigation of OPD dynamics and mechanisms. We first identify that two conditions govern whether OPD succeeds or fails: (i) the s…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

LLM attribution analysis across different fine-tuning strategies and model scales for automated code compliance

2026-04-16 · Jack Wei Lun Shi, Minghao Dang, Wawan Solihin, Justin K. W. Yeoh

General AI

Existing research on large language models (LLMs) for automated code compliance has primarily focused on performance, treating the models as black boxes and overlooking how training decisions affect their interpretive behavior. This paper addresses this gap by employing a perturbation-based attribution analysis to comp…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

MOMENTA: Mixture-of-Experts Over Multimodal Embeddings with Neural Temporal Aggregation for Misinformation Detection

2026-04-17 · Yeganeh Abdollahinejad, Ahmad Mousavi, Naeemul Hassan, Kai Shu, Nathalie Japkowicz, Shahriar Khosravi, Amir Karami

General AI

The widespread dissemination of multimodal content on social media has made misinformation detection increasingly challenging, as misleading narratives often arise not only from textual or visual content alone, but also from semantic inconsistencies between modalities and their evolution over time. Existing multimodal …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

No Universal Courtesy: A Cross-Linguistic, Multi-Model Study of Politeness Effects on LLMs Using the PLUM Corpus

2026-04-17 · Hitesh Mehta, Arjit Saxena, Garima Chhikara, Rohit Kumar

Research Track A · General AI

This paper explores the response of Large Language Models (LLMs) to user prompts with different degrees of politeness and impoliteness. The Politeness Theory by Brown and Levinson and the Impoliteness Framework by Culpeper form the basis of experiments conducted across three languages (English, Hindi, Spanish), five mo…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

PIIBench: A Unified Multi-Source Benchmark Corpus for Personally Identifiable Information Detection

2026-04-17 · Pritesh Jha

General AI

We present PIIBench, a unified benchmark corpus for Personally Identifiable Information (PII) detection in natural language text. Existing resources for PII detection are fragmented across domain-specific corpora with mutually incompatible annotation schemes, preventing systematic comparison of detection systems. We co…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Repurposing 3D Generative Model for Autoregressive Layout Generation

2026-04-17 · Haoran Feng, Yifan Niu, Zehuan Huang, Yang-Tian Sun, Chunchao Guo, Yuxin Peng, Lu Sheng

General AI

We introduce LaviGen, a framework that repurposes 3D generative models for 3D layout generation. Unlike previous methods that infer object layouts from textual descriptions, LaviGen operates directly in the native 3D space, formulating layout generation as an autoregressive process that explicitly models geometric rela…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Spinning Living Crystals of Run-and-Tumble Particles with Environmental Feedback

2026-04-17 · Maks Pečnik Bambič, Nuno A. M. Araújo, Giorgio Volpe

General AI

Collective rotations are common in active matter, enhancing cohesion, transport, and mixing. They are typically attributed to chiral non-reciprocal dynamics due to intrinsic particle chirality, torque-generating interactions among units, or geometric confinement. Here, we uncover a different mechanism for rotational or…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Beyond the Basics: Leveraging Large Language Model for Fine-Grained Medical Entity Recognition

2026-04-19 · Nwe Ni Win, Jim Basilakis, Steven Thomas, Seyhan Yazar, Laura Pierce, Stephanie Liu, Paul M. Middleton, Nasser Ghadiri, X. Rosalind Wang

General AI

Extracting clinically relevant information from unstructured medical narratives such as admission notes, discharge summaries, and emergency case histories remains a challenge in clinical natural language processing (NLP). Medical Entity Recognition (MER) identifies meaningful concepts embedded in these records. Recent …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Aligning Language Models for Lyric-to-Melody Generation with Rule-Based Musical Constraints

2026-04-20 · Hao Meng, Siyuan Zheng, Shuran Zhou, Qiangqiang Wang, Yang Song

General AI

Large Language Models (LLMs) show promise in lyric-to-melody generation, but models trained with Supervised Fine-Tuning (SFT) often produce musically implausible melodies with issues like poor rhythm and unsuitable vocal ranges, a phenomenon we term "constraint violation". To address this, we propose a novel alignment …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Bounded Ratio Reinforcement Learning

2026-04-20 · Yunke Ao, Le Chen, Bruce D. Lee, Assefa S. Wahd, Aline Czarnobai, Philipp Fürnstahl, Bernhard Schölkopf, Andreas Krause

General AI

Proximal Policy Optimization (PPO) has become the predominant algorithm for on-policy reinforcement learning due to its scalability and empirical robustness across domains. However, there is a significant disconnect between the underlying foundations of trust region methods and the heuristic clipped objective used in P…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

QRAFTI: An Agentic Framework for Empirical Research in Quantitative Finance

2026-04-20 · Terence Lim, Kumar Muthuraman, Michael Sury

General AI

We introduce a multi-agent framework intended to emulate parts of a quantitative research team and support equity factor research on large financial panel datasets. QRAFTI integrates a research toolkit for panel data with MCP servers that expose data access, factor construction, and custom coding operations as callable…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

A Gesture-Based Visual Learning Model for Acoustophoretic Interactions using a Swarm of AcoustoBots

2026-04-21 · Alex Lin, Lei Gao, Narsimlu Kemsaram, Sriram Subramanian

General AI

AcoustoBots are mobile acoustophoretic robots capable of delivering mid-air haptics, directional audio, and acoustic levitation, but existing implementations rely on scripted commands and lack an intuitive interface for real-time human control. This work presents a gesture-based visual learning framework for contactles…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

An AI Agent Execution Environment to Safeguard User Data

2026-04-21 · Robert Stanley, Avi Verma, Lillian Tsai, Konstantinos Kallas, Sam Kumar

General AI

AI agents promise to serve as general-purpose personal assistants for their users, which requires them to have access to private user data (e.g., personal and financial information). This poses a serious risk to security and privacy. Adversaries may attack the AI model (e.g., via prompt injection) to exfiltrate user da…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Global Hopf Bifurcation and Symmetric Periodic Solutions in Multi-Agent Systems with Neutral Distributed Delays

2026-04-22 · Casey Crane

General AI

We study the emergence of symmetric oscillatory behavior in multi-agent systems where each agent incorporates a continuous memory of its past states and past rates of change, modeled by distributed retarded and neutral delays. The closed-loop dynamics are described by a system of nonlinear neutral functional differenti…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Global Offshore Wind Infrastructure: Deployment and Operational Dynamics from Dense Sentinel-1 Time Series

2026-04-22 · Thorsten Hoeser, Felix Bachofer, Claudia Kuenzer

General AI

The offshore wind energy sector is expanding rapidly, increasing the need for independent, high-temporal-resolution monitoring of infrastructure deployment and operation at global scale. While Earth Observation based offshore wind infrastructure mapping has matured for spatial localization, existing open datasets lack …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Relative Principals, Pluralistic Alignment, and the Structural Value Alignment Problem

2026-04-22 · Travis LaCroix

General AI

The value alignment problem for artificial intelligence (AI) is often framed as a purely technical or normative challenge, sometimes focused on hypothetical future systems. I argue that the problem is better understood as a structural question about governance: not whether an AI system is aligned in the abstract, but w…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

SpeechParaling-Bench: A Comprehensive Benchmark for Paralinguistic-Aware Speech Generation

2026-04-22 · Ruohan Liu, Shukang Yin, Tao Wang, Dong Zhang, Weiji Zhuang, Shuhuai Ren, Ran He, Caifeng Shan, Chaoyou Fu

General AI

Paralinguistic cues are essential for natural human-computer interaction, yet their evaluation in Large Audio-Language Models (LALMs) remains limited by coarse feature coverage and the inherent subjectivity of assessment. To address these challenges, we introduce SpeechParaling-Bench, a comprehensive benchmark for para…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Evaluation of Automatic Speech Recognition Using Generative Large Language Models

2026-04-23 · Thibault Bañeras-Roux, Shashi Kumar, Driss Khalil, Sergio Burdisso, Petr Motlicek, Shiran Liu, Mickael Rouvier, Jane Wottawa, Richard Dufour

General AI

Automatic Speech Recognition (ASR) is traditionally evaluated using Word Error Rate (WER), a metric that is insensitive to meaning. Embedding-based semantic metrics are better correlated with human perception, but decoder-based Large Language Models (LLMs) remain underexplored for this task. This paper evaluates their …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

First measurement of wind line formation regions in an early O-type star

2026-04-23 · D. Pauli, T. N. Parsons, R. K. Prinja

General AI

Massive stars with their strong ionizing radiation and strong stellar winds are the key feedback agents of the universe. Stellar winds of massive stars are often measured by fitting resonance lines in the UV using non-LTE stellar atmosphere models. So far, the line formation regions of these lines have not been measure…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

From Research Question to Scientific Workflow: Leveraging Agentic AI for Science Automation

2026-04-23 · Bartosz Balis, Michal Orzechowski, Piotr Kica, Michal Dygas, Michal Kuszewski

General AI

Scientific workflow systems automate execution -- scheduling, fault tolerance, resource management -- but not the semantic translation that precedes it. Scientists still manually convert research questions into workflow specifications, a task requiring both domain knowledge and infrastructure expertise. We propose an a…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Revisiting Non-Verbatim Memorization in Large Language Models: The Role of Entity Surface Forms

2026-04-23 · Yuto Nishida, Naoki Shikoda, Yosuke Kishinami, Ryo Fujii, Makoto Morishita, Hidetaka Kamigaito, Taro Watanabe

General AI

Understanding what kinds of factual knowledge large language models (LLMs) memorize is essential for evaluating their reliability and limitations. Entity-based QA is a common framework for analyzing non-verbatim memorization, but typical evaluations query each entity using a single canonical surface form, making it dif…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Scalable Multimodal Beam Alignment in V2X: An Anti-Imbalance Graph Learning Approach

2026-04-23 · Jiahui Liang, Shuoyao Wang, Shijian Gao

General AI

Efficient beam alignment is fundamental to high-throughput and reliable connectivity in Vehicle-to-Everything (V2X) systems. However, conventional beam management in dynamic vehicular topologies incurs prohibitive alignment overhead and struggles to maintain robust links under rapid mobility. To overcome these challeng…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

TraceScope: Interactive URL Triage via Decoupled Checklist Adjudication

2026-04-23 · Haolin Zhang, William Reber, Yuxuan Zhang, Guofei Gu, Jeff Huang

General AI

Modern phishing campaigns increasingly evade snapshot-based URL classifiers using interaction gates (e.g., checkbox/slider challenges), delayed content rendering, and logo-less credential harvesters. This shifts URL triage from static classification toward an interactive forensics task: an analyst must actively navigat…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Dharma, Data and Deception: An LLM-Powered Rhetorical Analysis of Cow-Urine Health Claims on YouTube

2026-04-24 · Sheza Munir, Ratna Kandala, Anamta Khan, Deepti, Joyojeet Pal

General AI

Health misinformation remains one of the most pressing challenges on social media, particularly when cultural traditions intersect with scientific-sounding claims. These dynamics are not only global but also deeply local, manifesting in culturally specific controversies that require careful analysis. Motivated by this,…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

From Natural Language to Verified Code: Toward AI Assisted Problem-to-Code Generation with Dafny-Based Formal Verification

2026-04-24 · Md Erfan, Md Kamal Hossain Chowdhury, Ahmed Ryan, Md Rayhanur Rahman

General AI

Large Language Models (LLMs) show promise in automated software engineering, yet their guarantee of correctness is frequently undermined by erroneous or hallucinated code. To enforce model honesty, formal verification requires LLMs to synthesize implementation logic alongside formal specifications that are subsequently…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Inter-Stance: A Dyadic Multimodal Corpus for Conversational Stance Analysis

2026-04-24 · Xiang Zhang, Xiaotian Li, Taoyue Wang, Nan Bi, Xin Zhou, Cody Zhou, Zoie Wang, Andrew Yang, Yuming Su, Jeff Cohn, Qiang Ji, Lijun Yin

General AI

Social interactions dominate our perceptions of the world and shape our daily behavior by attaching social meaning to acts as simple and spontaneous as gestures, facial expressions, voice, and speech. People mimic and otherwise respond to each other's postures, facial expressions, mannerisms, and other verbal and nonve…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

PASS: A Provenanced Access Subaccount System for Blockchain Wallets

2026-04-24 · Jay Yu, Shunfan Zhou, Hang Yin, Brian Seong

General AI

Blockchain wallets conventionally follow an ownership model where possession of a private key grants unilateral control. However, this assumption is brittle for emerging settings such as AI agent wallets, organizational custody, and enterprise payroll, where multiple actors must coordinate without exposing secrets or l…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Representational Harms in LLM-Generated Narratives Against Global Majority Nationalities

2026-04-24 · Ilana Nguyen, Harini Suresh, Thema Monroe-White, Evan Shieh

General AI

Large language models (LLMs) are increasingly used for text generation tasks from everyday use to high-stakes enterprise and government applications, including simulated interviews with asylum seekers. While many works highlight the new potential applications of LLMs, there are risks of LLMs encoding and perpetuating h…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Governing What You Cannot Observe: Adaptive Runtime Governance for Autonomous AI Agents

2026-04-27 · German Marin, Jatin Chaudhary

General AI

Autonomous AI agents can remain fully authorized and still become unsafe as behavior drifts, adversaries adapt, and decision patterns shift without any code change. We propose the \textbf{Informational Viability Principle}: governing an agent reduces to estimating a bound on unobserved risk $\hat{B}(x) = U(x) + SB(x) +…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Sentiment and Emotion Classification of Indonesian E-Commerce Reviews via Multi-Task BiLSTM and AutoML Benchmarking

2026-04-27 · Hermawan Manurung, Ibrahim Al-Kahfi, Ahmad Rizqi, Martin Clinton Tosima Manullang

General AI

Indonesian marketplace reviews mix standard vocabulary with slang, regional loanwords, numeric shorthands, and emoji, making lexicon-based sentiment tools unreliable in practice. This paper describes a two-track classification pipeline applied to the PRDECT-ID dataset, which contains 5,400 product reviews from 29 Indon…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

The Nonverbal Syntax Framework: An Evidence-Based Tiered System for Inferring Learner States from Observable Behavioral Cues

2026-04-28 · Sherzod Turaev, Mary John, Jaloliddin Rustamov, Zahiriddin Rustamov, Saja Aldabet, Nazar Zaki, Khaled Shuaib

General AI

Understanding learners' cognitive and affective states underpins adaptive educational systems and effective teaching. Although research links nonverbal cues to internal states, no framework calibrates them to evidence. We present the Nonverbal Syntax Framework, drawn from a systematic review of 908 studies and 17,043 c…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Characterizing the Consistency of the Emergent Misalignment Persona

2026-04-30 · Anietta Weckauff, Yuchen Zhang, Maksym Andriushchenko

General AI

Fine-tuning large language models (LLMs) on narrowly misaligned data generalizes to broadly misaligned behavior, a phenomenon termed emergent misalignment (EM). While prior work has found a correlation between harmful behavior and self-assessment in emergently misaligned models, it remains unclear how consistent this c…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists

2026-04-30 · Yujun Wu, Dongxu Zhang, Xinchen Li, Jinhang Xu, Yiling Duan, Yumou Liu, Jiabao Pan, Xuanhe Zhou, Jingxuan Wei, Siyuan Li, Jintao Chen, Conghui He, Cheng Tan

General AI

Existing research infrastructure is fundamentally document-centric, providing citation links between papers but lacking explicit representations of methodological evolution. In particular, it does not capture the structured relationships that explain how and why research methods emerge, adapt, and build upon one anothe…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

MIFair: A Mutual-Information Framework for Intersectionality and Multiclass Fairness

2026-04-30 · Jeanne Monnier, Thomas George, Frédéric Guyard, Christèle Tarnec, Marios Kountouris

General AI

Fairness in machine learning remains challenging due to its ethical complexity, the absence of a universal definition, and the need for context-specific bias metrics. Existing methods still struggle with intersectionality, multiclass settings, and limited flexibility and generality. To address these gaps, we introduce …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Multisensory learning recruits visual neurons into an olfactory memory engram

2026-04-30 · Zeynep Okray, Nils Otto, Anna A. Cook, Clifford Talbot, Ashwin Miriyala, Martín Klappenbach, Ciara Stern, Kieran Desmond, Paola Vargas-Gutierrez, Scott Waddell

General AI

Associating multiple sensory cues with a single experience or object is a fundamental process that improves object recognition and memory performance. However, neural mechanisms that bind sensory features during learning and augment memory expression are unknown. Here we demonstrate multisensory appetitive and aversive…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

On the Proper Treatment of Units in Surprisal Theory

2026-04-30 · Samuel Kiegeland, Vésteinn Snæbjarnarson, Tim Vieira, Ryan Cotterell

General AI

Surprisal theory links human processing effort to the predictability of an upcoming linguistic unit, but empirical work often leaves the notion of a unit underspecified. In practice, experimental stimuli are segmented into linguistically motivated units (e.g., words), while pretrained language models assign probability…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Modeling Subjective Urban Perception with Human Gaze

2026-05-01 · Lin Che, Xi Wang, Marc Pollefeys, Konrad Schindler, Martin Raubal, Peter Kiefer

General AI

Urban perception describes how people subjectively evaluate urban environments, shaping how cities are experienced and understood. Existing computational approaches primarily model urban perception directly from street view images, but largely ignore the human perceptual process through which such judgments are formed.…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

When RAG Chatbots Expose Their Backend: An Anonymized Case Study of Privacy and Security Risks in Patient-Facing Medical AI

2026-05-01 · Alfredo Madrid-García, Miguel Rujas

General AI

Background: Patient-facing medical chatbots based on retrieval-augmented generation (RAG) are increasingly promoted to deliver accessible, grounded health information. AI-assisted development lowers the barrier to building them, but they still demand rigorous security, privacy, and governance controls. Objective: To re…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Regret Minimization with Adaptive Opponents in Repeated Games

2026-06-04 · Mingyang Liu, Asuman Ozdaglar, Tiancheng Yu, Kaiqing Zhang

General AI

In this paper, we study regret minimization in repeated games with \emph{adaptive} opponents who can respond based on histories of play. The standard metric of \emph{external regret} in online learning is known to fail to capture such adaptivity. To account for players' counterfactual reasoning, we introduce {\tt Repea…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Symmetric Divergence and Normalized Similarity: A Unified Topological Framework for Representation Analysis

2026-06-04 · Yan Wang, Tianyang Hu

General AI

Topological Data Analysis (TDA) offers a principled, intrinsic lens for comparing neural representations. However, existing paired topological divergences (e.g., RTD) are limited by heuristic asymmetry and, more critically, unbounded scores that depend on sample size, hindering reliable cross-scenario benchmarking. To …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting

2026-06-08 · Avijit Ghosh, Anka Reuel, Jenny Chim, Wm. Matthew Kennedy, Srishti Yadav, Jennifer Mickel, Yanan Long, Andrew Tran, Anastassia Kornilova, Damian Stachura, Kevin Klyman, Felix Friedrich, Jeba Sania, Max Lamparth, Jan Batzner, Anoop Mishra, Eliya Habba, Yixiong Hao, Nathan Heath, Shalaleh Rismani, Usman Gohar, Andrea Loehr, David Manheim, Ruchira Dhar, Sree Harsha Nelaturu, Aarush Sinha, Leshem Choshen, Drishti Sharma, Ishan Khire, Amit Saha, Subramanyam Sahoo, Michael Hardy, Michael Alexander Riegler, Kabir Manghnani, Michelle Lin, Yanan Jiang, Yilin Huang, Asaf Yehudai, Jessica Ji, Aris Hofmann, Mubashara Akhtar, Nuno Moniz, Yacine Jernite, Stella Biderman, Zeerak Talat, Sanmi Koyejo, Mykel Kochenderfer, Irene Solaiman

General AI

AI evaluation results are produced at scale but reported inconsistently across leaderboards, model cards, benchmark papers, and company blogs. The cost is interpretive: readers cannot reliably compare results across sources, identify what a report omits, or trace an aggregate claim to its underlying evidence. Recent ef…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Human-Centred Risk Mitigation for AI-Mediated Information Manipulation: A SOCMINT Framework Based on Information Manipulation Sets

2026-06-08 · Antonio Scala

General AI

AI-mediated information manipulation increasingly takes the form of social cyber attacks that target trust, attention, credibility, reputation, and decision-making rather than only technical infrastructures or isolated false contents. Existing defensive approaches often oscillate between incident-level analysis, which …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Principled Uncertainty in Clinical AI: End-to-End Bayesian Modelling and Algorithmic Equity Auditing Across Multimodal Patient Data

2026-06-08 · Oladimeji Anthonio, Dimeji Abdulsobur Olawuyi, Oloruntoba Ajayi, Temiloluwa Aderemi, Joseph Odamo

General AI

Clinical artificial intelligence (AI) systems routinely produce predictions without principled quantification of uncertainty, limiting their trustworthiness in high-stakes medical environments. This paper presents an integrated research programme addressing two interconnected problems: (1) the development of a fully en…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

The Role of Feedback Alignment in Self-Distillation

2026-06-09 · Semih Kara, Oğuzhan Ersoy

General AI

Conditioning a language model on additional context, such as feedback on a previous attempt, typically improves its response. Self-distillation trains the model to retain this improvement when the context is not present. The method works by matching the model's output distribution under two settings: a student that see…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Improving Robotic Generalist Policies via Flow Reversal Steering

2026-06-11 · Andy Tang, William Chen, Andrew Wagenmaker, Chelsea Finn, Sergey Levine

General AI

Generalist policies can learn a wide range of skills from diverse robot datasets. In order to solve or improve on challenging news tasks, we need a way to infer and invoke the appropriate actions from the policy's rich behavioral prior, especially when directly commanding the policy fails. We focus on flow matching gen…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Human Universal Grasping

2026-06-15 · Kevin Yuanbo Wu, Tianxing Zhou, Isaac Tu, Billy Yan, Irmak Guzey, David Fouhey, Dandan Shan, Lerrel Pinto

General AI

Humans can grasp objects effortlessly, whereas multi-fingered robots are far from this level of generality. We argue that the most natural source of robot grasping data is from humans, who pick up thousands of objects every day. We present HUG, a flow-matching model that generates diverse human grasps for any user-spec…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Phantoms and Disclosures: a Causal Framework for Auditing Synthetic Data

2026-06-15 · Kareem Amin, Rudrajit Das, Alessandro Epasto, Adel Javanmard, Dennis Kraft, Mónica Ribero, Sergei Vassilvitskii

General AI

The rapid adoption of generative AI and Large Language Models (LLMs) has spurred interest in synthetic data as a privacy-preserving alternative to sensitive real-world datasets. However, generating high-utility synthetic data often carries the risk of memorizing and regurgitating private information from the training c…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

Adaptive Volumetric Mechanical Property Fields Invariant to Resolution

2026-06-16 · Rishit Dagli, Donglai Xiang, Vismay Modi, Xuning Yang, Gavriel State, David I. W. Levin, Maria Shugrina

General AI

Accurate mechanical properties (or materials) Young's modulus ($E$), Poisson's ratio ($ν$) and density ($ρ$) are essential for reliable physics simulation of digital worlds, but most 3D assets lack this information. We propose AdaVoMP, a method for predicting accurate dense spatially-varying ($E$, $ν$, $ρ$) for input 3…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies

2026-06-16 · Ning Gao, Jinliang Zheng, Xing Gao, Haoxiang Ma, Hanqing Wang, Yukai Wang, Jiantong Chen, Zanxin Chen, Shujie Zhang, Mingda Jia, Xuekun Jiang, Zihou Zhu, Xinyu Li, Shuai Wang, Hao Li, Wenzhe Cai, Yuqiang Yang, Xudong Xu, Zhaoyang Lyu, Yao Mu, Tai Wang, Jiangmiao Pang, Jia Zeng, Weinan Zhang, Chunhua Shen

General AI

We present EBench, a simulation benchmark that diagnoses generalist mobile manipulation policies beyond a single success-rate scalar. EBench comprises 26 diverse and challenging manipulation tasks annotated along 5 capability dimensions and 4 generalization dimensions. We evaluate state-of-the-art generalist manipulati…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

MAJIC: Leveraging Articulatory Motion for Speech-based Emotion Recognition

2026-06-16 · Tanmay Srivastava, Paras Bhavnani, Benjir Alvee Islam, Shubham Jain

General AI

We introduce MAJIC, a multimodal emotion recognition system that leverages articulatory motion of the jaw and facial muscles for speech-based emotion recognition (SER). While most SER systems perform well on datasets with strongly expressed emotional speech of trained actors, their performance often degrades when emoti…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.3

CABLE: Cloud-Assisted Bandwidth-efficient LMM-based Encoding for V2X Systems

2026-06-17 · Haohua Que, Zhipeng Bao, Qianyi Wu, Handong Yao

General AI

Cloud-hosted large multimodal models (LMMs) can provide strong open-vocabulary perception for Vehicle-to-Everything systems, but naively transmitting full-resolution frames from edge to cloud causes severe communication overhead and high cloud-side prefill latency. We present CABLE, a cloud-assisted bandwidth-efficient…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.2

CrossPool: Efficient Multi-LLM Serving for Cold MoE Models through KV-Cache and Weight Disaggregation

2026-06-23 · Zhuoren Ye, Tianyu Wo, Dinghao Xue, Mingming Zhang, Yuchen Teng, Chunming Hu, Renyu Yang

General AI

Emerging LLM services increasingly host many sparse MoE models, yet most models receive sparse requests and remain cold. This creates a GPU memory problem: model weights are stable and model-determined, while KV-cache is transient and demand-determined. Because cold models rarely reach peak KV-cache demand at the same …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.2

InSight: Self-Guided Skill Acquisition via Steerable VLAs

2026-06-23 · Maggie Wang, Lars Osterberg, Stephen Tian, Ola Shorinwa, Jiajun Wu, Mac Schwager

General AI

Vision-language-action (VLA) models can learn manipulation skills from demonstrations, but their capabilities are bounded by the skills in the training data. We present InSight, a framework that unlocks autonomous skill acquisition by rendering VLAs steerable at the primitive-action level (e.g., "move gripper to the bo…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.2

ForceBand: Learning Forceful Manipulation with sEMG

2026-06-24 · Botao He, Zhi Wang, Linna Kuang, Ishaan Ghosh, Jitendra Malik, Cornelia Fermuller, Tingfan Wu, Jiayuan Mao, Ruoshi Liu, Haozhi Qi, Yiannis Aloimonos

General AI

Human demonstrations are a scalable data source for learning robot manipulation policies. However, common sources of human demonstration data, such as motion-capture trajectories and internet videos, capture mostly motion and appearance while missing the contact forces that are critical for force-sensitive manipulation…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.2

MIL-LC: A Robust Magnetometer-Inertial-LiDAR Fusion Multimodal Localization Framework

2026-06-24 · Qiyang Lyu, Zhenyu Wu, Wei Wang, Hongming Shen, Danwei Wang

General AI

Localization in challenging environments, such as GNSS-denied, geometrically repetitive, or textureless scenes commonly found in offices, hotels, and underground parking facilities, remains an open problem for reliable autonomous mobile robot (AMR) deployment. Single-modality localization methods are inherently limited…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.2

Measurable Majorities Are Not Finitely Axiomatizable

2026-06-24 · Lawrence S. Moss, Arthur Paul Pedersen

General AI

This theoretical note studies the finite axiomatizability of strict majority reasoning in finite social decision frames. Moss and Pedersen (2026) <doi: 10.48550/arXiv.2606.23853> introduce a coherence criterion that characterizes exactly when qualitative majority judgments are representable by a finitely additive measu…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.0

Matching Accuracy, Different Geometry: Evolution Strategies vs GRPO in LLM Post-Training

2026-04-02 · William Hoy, Binxu Wang, Xu Pan

Research Track A · General AI

Evolution Strategies (ES) have emerged as a scalable gradient-free alternative to reinforcement learning based LLM fine-tuning, but it remains unclear whether comparable task performance implies comparable solutions in parameter space. We compare ES and Group Relative Policy Optimization (GRPO) across four tasks in bot…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.0

TRU: Targeted Reverse Update for Efficient Multimodal Recommendation Unlearning

2026-04-02 · Zhanting Zhou, KaHou Tam, Ziqiang Zheng, Zeyu Ma

Research Track A · General AI

Multimodal recommendation systems (MRS) jointly model user-item interaction graphs and rich item content, but this tight coupling makes user data difficult to remove once learned. Approximate machine unlearning offers an efficient alternative to full retraining, yet existing methods for MRS mainly rely on a largely uni…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.0

Exclusive Unlearning

2026-04-07 · Mutsumi Sasaki, Kouta Nakayama, Yusuke Miyao, Yohei Oseki, Masaru Isonuma

Research Track A · General AI

When introducing Large Language Models (LLMs) into industrial applications, such as healthcare and education, the risk of generating harmful content becomes a significant challenge. While existing machine unlearning methods can erase specific harmful knowledge and expressions, diverse harmful content makes comprehensiv…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.0

How Do LLMs and VLMs Understand Viewpoint Rotation Without Vision? An Interpretability Study

2026-04-16 · Zhen Yang, Ping Jian, Zhongbin Guo, Zuming Zhang, Chengzhi Li, Yonghong Deng, Xinyue Zhang, Wenpeng Lu

Research Track A

Over the past year, spatial intelligence has drawn increasing attention. Many prior works study it from the perspective of visual-spatial intelligence, where models have access to visuospatial information from visual inputs. However, in the absence of visual information, whether linguistic intelligence alone is suffici…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.0

Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling

2026-04-30 · Ansar Aynetdinov, Patrick Haller, Alan Akbik

General AI

Recent research has shown that filtering massive English web corpora into high-quality subsets significantly improves training efficiency. However, for high-resource non-English languages like German, French, or Japanese, aggressive filtering creates a strategic dilemma: should practitioners prioritize diversity by tra…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.0

Generative Modeling with Orbit-Space Particle Flow Matching

2026-05-04 · Sinan Wang, Jinjin He, Shenyifan Lu, Ruicheng Wang, Greg Turk, Bo Zhu

General AI

We present Orbit-Space Geometric Probability Paths (OGPP), a particle-native flow-matching framework for generative modeling of particle systems. OGPP is motivated by two insights: (i) particles are defined up to permutation symmetries, so anonymous indexing inflates per-index target variance and yields curved, hard-to…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.0

KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

2026-05-06 · Han Wang, Jintao Zhang, Kai Jiang, Haoxu Wang, Jianfei Chen, Jun Zhu

General AI

LLM-based Triton kernel generation has attracted significant interest, yet a fundamental empirical question remains unanswered: where does this capability break down, and why? We present KernelBench-X, a benchmark designed to answer this question through category-aware evaluation of correctness and hardware efficiency …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.0

PianoCoRe: Combined and Refined Piano MIDI Dataset

2026-05-07 · Ilya Borovik

General AI

Symbolic music datasets with matched scores and performances are essential for many music information retrieval (MIR) tasks. Yet, existing resources often cover a narrow range of composers, lack performance variety, omit note-level alignments, or use inconsistent naming formats. This work presents PianoCoRe, a large-sc…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.0

Sparkle: Realizing Lively Instruction-Guided Video Background Replacement via Decoupled Guidance

2026-05-07 · Ziyun Zeng, Yiqi Lin, Guoqiang Liang, Mike Zheng Shou

General AI

In recent years, open-source efforts like Senorita-2M have propelled video editing toward natural language instruction. However, current publicly available datasets predominantly focus on local editing or style transfer, which largely preserve the original scene structure and are easier to scale. In contrast, Backgroun…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.0

Debiased Model-based Representations for Sample-efficient Continuous Control

2026-05-12 · Jiafei Lyu, Zichuan Lin, Scott Fujimoto, Kai Yang, Yangkun Chen, Saiyong Yang, Zongqing Lu, Deheng Ye

General AI

Model-based representations recently stand out as a promising framework that embeds latent dynamics information into the representations for downstream off-policy actor-critic learning. It implicitly combines the advantages of both model-free and model-based approaches while avoiding the training costs associated with …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.0

Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction

2026-05-12 · Haoyu Zhang, Zeyu Zhang, Zedong Zhou, Yang Zhao, Hao Tang

General AI

Transformer-based 3D reconstruction has emerged as a powerful paradigm for recovering geometry and appearance from multi-view observations, offering strong performance across challenging visual conditions. As these models scale to larger backbones and higher-resolution inputs, improving their efficiency becomes increas…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.0

Beyond Holistic Models: Systematic Component-level Benchmarking of Deep Multivariate Time-Series Forecasting

2026-05-26 · Shuang Liang, Chaochuan Hou, Xu Yao, Shiping Wang, Hailiang Huang, Songqiao Han, Minqi Jiang

General AI

While previous research in multivariate time series forecasting has focused on developing complex holistic models, this work advocates for a shift toward a granular, component-level understanding of their impacts. We propose TSCOMP, the first large-scale benchmark that systematically deconstructs deep forecasting metho…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.0

How can embedding models bind concepts?

2026-05-29 · Arnas Uselis, Darina Koishigarina, Seong Joon Oh

General AI

Humans easily determine which color belongs to which shape in multi-object scenes, an ability known as concept binding. Vision-language embedding models such as CLIP struggle with binding: they recognize individual concepts but fail to represent which concepts form which objects. Although CLIP behaves like a bag-of-con…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 6.0

Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

2026-06-09 · Bowen Ping, Xiangxin Zhou, Penghui Qi, Minnan Luo, Liefeng Bo, Tianyu Pang

General AI

Recent work has demonstrated that online reinforcement learning (RL) can substantially improve the quality and alignment of flow matching models for image and video generation. Methods such as Flow-GRPO and CPS cast the denoising process as a Markov Decision Process and apply PPO-style ratio clipping to enforce a trust…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.8

ProPACT: A Proactive AI-Driven Adaptive Collaborative Tutor for Pair Programming

2026-05-04 · Anahita Golrang, Kshitij Sharma, olga viberg

General AI

Effective pair programming depends on coordination of attention, cognitive effort, and joint regulation over time, yet most adaptive learning systems remain individual-centric and reactive. This paper introduces ProPACT, a proactive AI-driven adaptive collaborative tutor that treats collaboration itself as the object o…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.8

When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels

2026-05-07 · Sushant Gautam, Finn Schwall, Annika Willoch Olstad, Fernando Vallecillos Ruiz, Birk Torpmann-Hagen, Sunniva Maria Stordal Bjørklund, Leon Moonen, Klas Pettersen, Michael A. Riegler

General AI

Many deployments must compare candidate language models for safety before a labeled benchmark exists for the relevant language, sector, or regulatory regime. We formalize this setting as benchmarkless comparative safety scoring and specify the contract under which a scenario-based audit can be interpreted as deployment…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.8

Why Global LLM Leaderboards Are Misleading: Small Portfolios for Heterogeneous Supervised ML

2026-05-07 · Jai Moondra, Ayela Chughtai, Bhargavi Lanka, Swati Gupta

General AI

Ranking LLMs via pairwise human feedback underpins current leaderboards for open-ended tasks, such as creative writing and problem-solving. We analyze ~89K comparisons in 116 languages from 52 LLMs from Arena, and show that the best-fit global Bradley-Terry (BT) ranking is misleading. Nearly 2/3 of the decisive votes c…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.8

xApp Empowered Resource Management for Non-Terrestrial Users in 5G O-RAN Networks

2026-05-11 · Mohammed M. H. Qazzaz, Syed Ali Zaidi, Aubida A. Al-Hameed, Abdelaziz Salama, Des Mclernon

General AI

This paper introduces a proactive Unmanned Aerial Vehicle (UAV) mobility management xApp for Open Radio Access Network (O-RAN) Near Real-Time Radio Intelligent Controller (Near-RT RIC) environments, employing Double Deep Q-Network (DDQN) reinforcement learning (RL) enhanced with transfer learning to optimise handover d…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.8

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

2026-05-22 · Xu Ouyang, Deyi Liu, Yuhang Cai, Jing Liu, Yuan Yang, Chen Zheng, Thomas Hartvigsen, Yiyuan Ma

General AI

Existing scaling laws for Large Language Models (LLMs), predominantly monotonic power laws, fail to explain emerging non-monotonic phenomena such as catastrophic overtraining and quantization-induced degradation, where performance deteriorates despite increased compute. We propose the Shannon Scaling Law, a unified the…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.8

Natural Yet Challenging to Detect: Robust In-the-Wild TTS through EMA and Dual-Scoring Prompt Selection -- Submission for WildSpoof 2026 TTS Track

2026-05-22 · Renhe Sun, Jiayi Zhou, Haolin He, Yueying Feng, Jian Liu

General AI

In this technical report, we describe our submission for the WildSpoof Challenge TTS Track: Text-to-Speech with In-the-Wild Data. We introduce F5-TTS-DPS, a model built upon the F5-TTS architecture. Our approach integrates Exponential Moving Average (EMA) into supervised fine-tuning to stabilize training and improve ge…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.8

Strong Teacher Not Needed? On Distillation in LLM Pretraining

2026-05-22 · Taiming Lu, Zhuang Liu

General AI

Knowledge distillation generally assumes a strong-to-weak relationship where stronger teachers yield better students. In this work, we examine this assumption about distillation in large language model pretraining. By varying architecture sizes and training token budgets, we create strong-to-weak, same-level, and weak-…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.8

When Youth Enter the Algorithmic Wild: Discovering and Understanding Potentially Harmful Teen Videos on Douyin and Kwai

2026-05-22 · Shaoxuan Zhou, Yafei Sun, Jing Zhang, Xianghang Mi

General AI

Short-video platforms like Douyin and Kwai have become central to adolescent digital life, but they also risk exposing teens to algorithmically amplified harmful content. Despite its societal importance, the scale, mechanisms, and real-world impact of this exposure remain poorly understood. Measuring it is challenging:…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.8

Demystifying Data Organization for Enhanced LLM Training

2026-05-28 · Yalun Dai, Yangyu Huang, Tongshen Yang, Yonghan Wang, Xin Zhang, Wenshan Wu, Qihao Zhao, Hao Li, Yuanyuan Gao, Kim-Hui Yap, Scarlett Li

General AI

Large Language Models (LLMs) have revolutionized various fields, yet their training efficiency is heavily reliant on effective data curation. While data selection has been widely studied, the strategic data organization for enhanced training remains an underexplored area, particularly since current LLMs are often train…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.8

DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation

2026-05-28 · Jusuk Lee, Seungjae Lee, Jonghun Shin, Hoseong Jung, Sungha Kim, Daesol Cho, H. Jin Kim, Jia-Bin Huang, Furong Huang

General AI

Robot manipulation critically depends on perception that preserves the action-relevant aspects of a scene. Yet most robot learning pipelines are built upon visual encoders pre-trained for static recognition or vision-language alignment, leaving motion understanding to downstream policies. We introduce DynaFLIP, a dynam…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.8

GMOS: Grounding Moving Object Segmentation in 3D Space and Time

2026-05-28 · Junyu Xie, Tengda Han, Weidi Xie, Andrew Zisserman

General AI

Moving Object Segmentation (MOS) aims to discover, segment, and track objects that move independently of the camera. Current MOS methods, however, exhibit two fundamental limitations: they rely on pre-computed 2D auxiliary modalities such as optical flow or point trajectories that lack 3D geometric information, and the…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.8

A Datalog Framework for Conflict-Free Replicated Data Types

2026-05-29 · Elena Yanakieva, Annette Bieniusa, Stefania Dumbrava

General AI

Distributed applications increasingly support local-first collaboration over shared data, allowing multiple users to perform updates concurrently without global coordination. Such collaboration requires careful design to capture the intended semantics of the concurrent interactions. We introduce a declarative framework…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.8

RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video

2026-05-29 · Ulrich Prestel, Stefan Andreas Baumann, Nick Stracke, Björn Ommer

General AI

Self-supervised novel view synthesis (NVS) remains challenging to scale, despite the abundance of video data, largely due to the brittleness of training on realistic videos and the hard-to-predict scaling behavior of multi-network system designs. We introduce RayDer, a unified, feed-forward transformer that consolidate…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.8

AI-Native Closed-Loop Security for 6G-Enabled Cyber-Physical Systems: From Edge Detection to Network-Wide Mitigation

2026-06-06 · Bilal Hussain, Muhammad Bilal, Tan Li, Haris Pervaiz, Xiao Tang, Qinghe Du, Fawad Ahmad, Muhammad Azhar, Jun Zhang

General AI

In sixth-generation (6G) networks, billions of cyber-physical systems (CPSs) - autonomous vehicles, smart grids, industrial robots, and remote-surgical equipment - will run over ultra-reliable low-latency slices, collapsing the gap between a remote breach and physical harm to milliseconds, a budget perimeter firewalls …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.5

AVControl: Efficient Framework for Training Audio-Visual Controls

2026-03-25 · Matan Ben-Yosef, Tavi Halperin, Naomi Ken Korem, Mohammad Salama, Harel Cain, Asaf Joseph, Anthony Chen, Urska Jelercic, Ofir Bibi

General AI

Controlling video and audio generation requires diverse modalities, from depth and pose to camera trajectories and audio transformations, yet existing approaches either train a single monolithic model for a fixed set of controls or introduce costly architectural changes for each new modality. We introduce AVControl, a …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.5

VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

2026-03-25 · Qijia He, Xunmei Liu, Hammaad Memon, Ziang Li, Zixian Ma, Jaemin Cho, Jason Ren, Daniel S Weld, Ranjay Krishna

General AI

Scalable Vector Graphics (SVG) are an essential format for technical illustration and digital design, offering precise resolution independence and flexible semantic editability. In practice, however, original vector source files are frequently lost or inaccessible, leaving only "flat" rasterized versions (e.g., PNG or …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.5

Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development

2026-03-29 · Zhongying Deng, Cheng Tang, Ziyan Huang, Jiashi Lin, Ying Chen, Junzhi Ning, Chenglong Ma, Jiyao Liu, Wei Li, Yinghao Zhu, Shujian Gao, Yanyan Huang, Sibo Ju, Yanzhou Su, Pengcheng Chen, Wenhao Tang, Tianbin Li, Haoyu Wang, Yuanfeng Ji, Hui Sun, Shaobo Min, Liang Peng, Feilong Tang, Haochen Xue, Rulin Zhou, Chaoyang Zhang, Wenjie Li, Shaohao Rui, Weijie Ma, Xingyue Zhao, Yibin Wang, Kun Yuan, Zhaohui Lu, Shujun Wang, Jinjie Wei, Lihao Liu, Dingkang Yang, Lin Wang, Yulong Li, Haolin Yang, Yiqing Shen, Lequan Yu, Xiaowei Hu, Yun Gu, Yicheng Wu, Benyou Wang, Minghui Zhang, Angelica I. Aviles-Rivero, Qi Gao, Hongming Shan, Xiaoyu Ren, Fang Yan, Hongyu Zhou, Haodong Duan, Maosong Cao, Shanshan Wang, Bin Fu, Xiaomeng Li, Zhi Hou, Chunfeng Song, Lei Bai, Yuan Cheng, Yuandong Pu, Xiang Li, Wenhai Wang, Hao Chen, Jiaxin Zhuang, Songyang Zhang, Huiguang He, Mengzhang Li, Bohan Zhuang, Zhian Bai, Rongshan Yu, Liansheng Wang, Yukun Zhou, Xiaosong Wang, Xin Guo, Guanbin Li, Xiangru Lin, Dakai Jin, Mianxin Liu, Wenlong Zhang, Qi Qin, Conghui He, Yuqiang Li, Ye Luo, Nanqing Dong, Jie Xu, Wenqi Shao, Bo Zhang, Qiujuan Yan, Yihao Liu, Jun Ma, Zhi Lu, Yuewen Cao, Zongwei Zhou, Jianming Liang, Shixiang Tang, Qi Duan, Dongzhan Zhou, Chen Jiang, Yuyin Zhou, Yanwu Xu, Jiancheng Yang, Shaoting Zhang, Xiaohong Liu, Siqi Luo, Yi Xin, Chaoyu Liu, Haochen Wen, Xin Chen, Alejandro Lozano, Min Woo Sun, Yuhui Zhang, Yue Yao, Xiaoxiao Sun, Serena Yeung-Levy, Xia Li, Jing Ke, Chunhui Zhang, Zongyuan Ge, Ming Hu, Jin Ye, Zhifeng Li, Yirong Chen, Yu Qiao, Junjun He

Research Track A

Foundation models have demonstrated remarkable success across diverse domains and tasks, primarily due to the thrive of large-scale, diverse, and high-quality datasets. However, in the field of medical imaging, the curation and assembling of such medical datasets are highly challenging due to the reliance on clinical e…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.5

GEditBench v2: A Human-Aligned Benchmark for General Image Editing

2026-03-30 · Zhangqi Jiang, Zheng Sun, Xianfang Zeng, Yufeng Yang, Xuanyang Zhang, Yongliang Wu, Wei Cheng, Gang Yu, Xu Yang, Bihan Wen

General AI

Recent advances in image editing have enabled models to handle complex instructions with impressive realism. However, existing evaluation frameworks lag behind: current benchmarks suffer from narrow task coverage, while standard metrics fail to adequately capture visual consistency, i.e., the preservation of identity, …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.5

NearID: Identity Representation Learning via Near-identity Distractors

2026-04-02 · Aleksandar Cvejic, Rameen Abdal, Abdelrahman Eldesokey, Bernard Ghanem, Peter Wonka

General AI

When evaluating identity-focused tasks such as personalized generation and image editing, existing vision encoders entangle object identity with background context, leading to unreliable representations and metrics. We introduce the first principled framework to address this vulnerability using Near-identity (NearID) d…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.5

ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation

2026-04-05 · Hui Sun, Yun-Ji Zhang, Zheng Xie, Ren-Biao Liu, Yali Du, Xin-Ye Li, Ming Li

General AI

Selecting LLM-generated code candidates using LLM-generated tests is challenging because the tests themselves may be incorrect. Existing methods either treat all tests equally or rely on ad-hoc heuristics to filter unreliable tests. Yet determining test correctness requires knowing which codes are correct, creating a c…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.5

Demystifying When Pruning Works via Representation Hierarchies

2026-04-06 · Shwai He, Guoheng Sun, Haichao Zhang, Yun Fu, Ang Li

General AI

Network pruning, which removes less important parameters or architectures, is often expected to improve efficiency while preserving performance. However, this expectation does not consistently hold across language tasks: pruned models can perform well on non-generative tasks but frequently fail in generative settings. …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.5

SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing

2026-04-06 · Yicheng Xiao, Wenhu Zhang, Lin Song, Yukang Chen, Wenbo Li, Nan Jiang, Tianhe Ren, Haokun Lin, Wei Huang, Haoyang Huang, Xiu Li, Nan Duan, Xiaojuan Qi

General AI

Image spatial editing performs geometry-driven transformations, allowing precise control over object layout and camera viewpoints. Current models are insufficient for fine-grained spatial manipulations, motivating a dedicated assessment suite. Our contributions are listed: (i) We introduce SpatialEdit-Bench, a complete…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.5

Beyond Perception Errors: Semantic Fixation in Large Vision-Language Models

2026-04-13 · Md Tanvirul Alam

General AI

Large vision-language models (VLMs) often rely on familiar semantic priors, but existing evaluations do not cleanly separate perception failures from rule-mapping failures. We study this behavior as semantic fixation: preserving a default interpretation even when the prompt specifies an alternative, equally valid mappi…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.5

Continuous Adversarial Flow Models

2026-04-13 · Shanchuan Lin, Ceyuan Yang, Zhijie Lin, Hao Chen, Haoqi Fan

General AI

We propose continuous adversarial flow models, a type of continuous-time flow model trained with an adversarial objective. Unlike flow matching, which uses a fixed mean-squared-error criterion, our approach introduces a learned discriminator to guide training. This change in objective induces a different generalized di…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.5

LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment

2026-04-13 · Dujun Nie, Fengjiao Chen, Qi Lv, Jun Kuang, Xiaoyu Li, Xuezhi Cao, Xunliang Cai

General AI

While the shortage of explicit action data limits Vision-Language-Action (VLA) models, human action videos offer a scalable yet unlabeled data source. A critical challenge in utilizing large-scale human video datasets lies in transforming visual signals into ontology-independent representations, known as latent actions…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.5

Seeing Through Touch: Tactile-Driven Visual Localization of Material Regions

2026-04-13 · Seongyu Kim, Seungwoo Lee, Hyeonggon Ryu, Joon Son Chung, Arda Senocak

General AI

We address the problem of tactile localization, where the goal is to identify image regions that share the same material properties as a tactile input. Existing visuo-tactile methods rely on global alignment and thus fail to capture the fine-grained local correspondences required for this task. The challenge is amplifi…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.5

TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text Alignment

2026-04-13 · Bingyi Cao, Koert Chen, Kevis-Kokitsi Maninis, Kaifeng Chen, Arjun Karpur, Ye Xia, Sahil Dua, Tanmaya Dabral, Guangxing Han, Bohyung Han, Joshua Ainslie, Alex Bewley, Mithun Jacob, René Wagner, Washington Ramos, Krzysztof Choromanski, Mojtaba Seyedhosseini, Howard Zhou, André Araujo

General AI

Recent progress in vision-language pretraining has enabled significant improvements to many downstream computer vision applications, such as classification, retrieval, segmentation and depth prediction. However, a fundamental capability that these models still struggle with is aligning dense patch representations with …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.5

Self-Adversarial One Step Generation via Condition Shifting

2026-04-14 · Deyuan Liu, Peng Sun, Yansen Han, Zhenglin Cheng, Chuyan Chen, Tao Lin

General AI

The push for efficient text to image synthesis has moved the field toward one step sampling, yet existing methods still face a three way tradeoff among fidelity, inference speed, and training efficiency. Approaches that rely on external discriminators can sharpen one step performance, but they often introduce training …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.5

ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics

2026-04-17 · Heewon Oh

General AI

We present ArtifactNet, a lightweight framework that detects AI-generated music by reframing the problem as forensic physics -- extracting and analyzing the physical artifacts that neural audio codecs inevitably imprint on generated audio. A bounded-mask UNet (ArtifactUNet, 3.6M parameters) extracts codec residuals fro…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.5

TwinTrack: Post-hoc Multi-Rater Calibration for Medical Image Segmentation

2026-04-17 · Tristan Kirscher, Alexandra Ertl, Klaus Maier-Hein, Xavier Coubez, Philippe Meyer, Sylvain Faisan

General AI

Pancreatic ductal adenocarcinoma (PDAC) segmentation on contrast-enhanced CT is inherently ambiguous: inter-rater disagreement among experts reflects genuine uncertainty rather than annotation noise. Standard deep learning approaches assume a single ground truth, producing probabilistic outputs that can be poorly calib…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.5

Test-Time Adaptation for EEG Foundation Models: A Systematic Study under Real-World Distribution Shifts

2026-04-18 · Gabriel Jason Lee, Jathurshan Pradeepkumar, Jimeng Sun

General AI

Electroencephalography (EEG) foundation models have shown strong potential for learning generalizable representations from large-scale neural data, yet their clinical deployment is hindered by distribution shifts across clinical settings, devices, and populations. Test-time adaptation (TTA) offers a promising solution …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.5

LLM Safety From Within: Detecting Harmful Content with Internal Representations

2026-04-20 · Difan Jiao, Yilun Liu, Ye Yuan, Zhenwei Tang, Linfeng Du, Haolun Wu, Ashton Anderson

General AI

Guard models are widely used to detect harmful content in user prompts and LLM responses. However, state-of-the-art guard models rely solely on terminal-layer representations and overlook the rich safety-relevant features distributed across internal layers. We present SIREN, a lightweight guard model that harnesses the…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.5

ATTN-FIQA: Interpretable Attention-based Face Image Quality Assessment with Vision Transformers

2026-04-21 · Guray Ozgur, Tahar Chettaoui, Eduarda Caldeira, Jan Niklas Kolf, Marco Huber, Andrea Atzori, Naser Damer, Fadi Boutros

General AI

Face Image Quality Assessment (FIQA) aims to assess the recognition utility of face samples and is essential for reliable face recognition (FR) systems. Existing approaches require computationally expensive procedures such as multiple forward passes, backpropagation, or additional training, and only recent work has foc…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.5

EX-FIQA: Leveraging Intermediate Early eXit Representations from Vision Transformers for Face Image Quality Assessment

2026-04-21 · Guray Ozgur, Tahar Chettaoui, Eduarda Caldeira, Jan Niklas Kolf, Andrea Atzori, Fadi Boutros, Naser Damer

General AI

Face Image Quality Assessment is crucial for reliable face recognition systems, yet existing Vision Transformer-based approaches rely exclusively on final-layer representations, ignoring quality-relevant information captured at intermediate network depths. This paper presents the first comprehensive investigation of ho…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.5

Cortex 2.0: Grounding World Models in Real-World Industrial Deployment

2026-04-22 · Adriana Aida, Walida Amer, Katarina Bankovic, Dhruv Behl, Fabian Busch, Annie Bhalla, Minh Duong, Florian Gienger, Rohan Godse, Denis Grachev, Ralf Gulde, Elisa Hagensieker, Junpeng Hu, Shivam Joshi, Tobias Knoblauch, Likith Kumar, Damien LaRocque, Keerthana Lokesh, Omar Moured, Khiem Nguyen, Christian Preyss, Ranjith Sriganesan, Vikram Singh, Carsten Sponner, Anh Tong, Dominik Tuscher, Marc Tuscher, Pavan Upputuri

General AI

Industrial robotic manipulation demands reliable long-horizon execution across embodiments, tasks, and changing object distributions. While Vision-Language-Action models have demonstrated strong generalization, they remain fundamentally reactive. By optimizing the next action given the current observation without evalu…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.5

WorldMark: A Unified Benchmark Suite for Interactive Video World Models

2026-04-23 · Xiaojie Xu, Zhengyuan Lin, Kang He, Yukang Feng, Xiaofeng Mao, Yuanyang Yin, Kaipeng Zhang, Yongtao Ge

General AI

Interactive video generation models such as Genie, YUME, HY-World, and Matrix-Game are advancing rapidly, yet every model is evaluated on its own benchmark with private scenes and trajectories, making fair cross-model comparison impossible. Existing public benchmarks offer useful metrics such as trajectory error, aesth…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.5

Diffusion Templates: A Unified Plugin Framework for Controllable Diffusion

2026-04-27 · Zhongjie Duan, Hong Zhang, Yingda Chen

General AI

Controllable diffusion methods have substantially expanded the practical utility of diffusion models, but they are typically developed as isolated, backbone-specific systems with incompatible training pipelines, parameter formats, and runtime hooks. This fragmentation makes it difficult to reuse infrastructure across t…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.5

Safety Drift After Fine-Tuning: Evidence from High-Stakes Domains

2026-04-27 · Emaan Bilal Khan, Amy Winecoff, Miranda Bogen, Dylan Hadfield-Menell

General AI

Foundation models are routinely fine-tuned for use in particular domains, yet safety assessments are typically conducted only on base models, implicitly assuming that safety properties persist through downstream adaptation. We test this assumption by analyzing the safety behavior of 100 models, including widely deploye…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.5

ViPO: Visual Preference Optimization at Scale

2026-04-29 · Ming Li, Jie Wu, Justin Cui, Xiaojie Li, Rui Wang, Chen Chen

General AI

While preference optimization is crucial for improving visual generative models, how to effectively scale this paradigm remains largely unexplored. Current open-source preference datasets contain conflicting preference patterns, where winners excel in some dimensions but underperform in others. Naively optimizing on su…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Parameter-Efficient Fine-Tuning for Medical Text Summarization: A Comparative Study of Lora, Prompt Tuning, and Full Fine-Tuning

2026-03-23 · Ulugbek Shernazarov, Rostislav Svitsov, Bin Shi

General AI

Fine-tuning large language models for domain-specific tasks such as medical text summarization demands substantial computational resources. Parameter-efficient fine-tuning (PEFT) methods offer promising alternatives by updating only a small fraction of parameters. This paper compares three adaptation approaches-Low-Ran…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 5.3

Rethinking Multimodal Fusion for Time Series: Auxiliary Modalities Need Constrained Fusion

2026-03-23 · Seunghan Lee, Jun Seo, Jaehoon Lee, Sungdong Yoo, Minjae Kim, Tae Yoon Lim, Dongwan Kang, Hwanil Choi, SoonYoung Lee, Wonbin Ahn

General AI

Recent advances in multimodal learning have motivated the integration of auxiliary modalities such as text or vision into time series (TS) forecasting. However, most existing methods provide limited gains, often improving performance only in specific datasets or relying on architecture-specific designs that limit gener…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

A Mentalistic Interface for Probing Folk-Psychological Attribution to Non-Humanoid Robots

2026-03-26 · Giulio Pisaneschi, Pierpaolo Serio, Estelle Gerbier, Andrea Dan Ryals, Lorenzo Pollini, Mario G. C. A. Cimino

General AI

This paper presents an experimental platform for studying intentional-state attribution toward a non-humanoid robot. The system combines a simulated robot, realistic task environments, and large language model-based explanatory layers that can express the same behavior in mentalistic, teleological, or mechanistic terms…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Measuring What Matters -- or What's Convenient?: Robustness of LLM-Based Scoring Systems to Construct-Irrelevant Factors

2026-03-26 · Cole Walsh, Rodica Ivan

General AI

Automated systems have been widely adopted across the educational testing industry for open-response assessment and essay scoring. These systems commonly achieve performance levels comparable to or superior than trained human raters, but have frequently been demonstrated to be vulnerable to the influence of construct-i…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Can Hierarchical Cross-Modal Fusion Predict Human Perception of AI Dubbed Content?

2026-03-30 · Ashwini Dasare, Nirmesh Shah, Ashishkumar Gudmalwar, Pankaj Wasnik

General AI

Evaluating AI generated dubbed content is inherently multi-dimensional, shaped by synchronization, intelligibility, speaker consistency, emotional alignment, and semantic context. Human Mean Opinion Scores (MOS) remain the gold standard but are costly and impractical at scale. We present a hierarchical multimodal archi…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

DinoDental: Benchmarking DINOv3 as a Unified Vision Encoder for Dental Image Analysis

2026-03-30 · Kun Tang, Xinquan Yang, Mianjie Zheng, Xuefen Liu, Xuguang Li, Xiaoqi Guo, Ruihan Chen, Linlin Shen, He Meng

General AI

The scarcity and high cost of expert annotations in dental imaging present a significant challenge for the development of AI in dentistry. DINOv3, a state-of-the-art, self-supervised vision foundation model pre-trained on 1.7 billion images, offers a promising pathway to mitigate this issue. However, its reliability wh…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

VAANI: Capturing the language landscape for an inclusive digital India

2026-03-30 · Sujith Pulikodan, Abhayjeet Singh, Agneedh Basu, Lokesh Rady, Nihar Desai, Pavan Kumar J, Prajjwal Srivastav, Pranav D Bhat, Raghu Dharmaraju, Ritika Gupta, Sathvik Udupa, Saurabh Kumar, Sumit Sharma, Vaibhav Vishwakarma, Visruth Sanka, Dinesh Tewari, Harsh Dhand, Amrita Kamat, Sukhwinder Singh, Shikhar Vashishth, Partha Talukdar, Raj Acharya, Prasanta Kumar Ghosh

General AI

Project VAANI is an initiative to create an India-representative multi-modal dataset that comprehensively maps India's linguistic diversity, starting with 165 districts across the country in its first two phases. Speech data is collected through a carefully structured process that uses image-based prompts to encourage …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Benchmarking PhD-Level Coding in 3D Geometric Computer Vision

2026-03-31 · Wenyi Li, Renkai Luo, Yue Yu, Huan-ang Gao, Mingju Gao, Li Yuan, Chaoyou Fu, Hao Zhao

General AI

AI-assisted coding has rapidly reshaped software practice and research workflows, yet today's models still struggle to produce correct code for complex 3D geometric vision. If models could reliably write such code, the research of our community would change substantially. To measure progress toward that goal, we introd…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Enhancing Structural Mapping with LLM-derived Abstractions for Analogical Reasoning in Narratives

2026-03-31 · Mohammadhossein Khojasteh, Yifan Jiang, Stefano De Giorgis, Frank van Harmelen, Filip Ilievski

General AI

Analogical reasoning is a key driver of human generalization in problem-solving and argumentation. Yet, analogies between narrative structures remain challenging for machines. Cognitive engines for structural mapping are not directly applicable, as they assume pre-extracted entities, whereas LLMs' performance is sensit…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

ReFormeR: Learning and Applying Explicit Query Reformulation Patterns

2026-04-01 · Amin Bigdeli, Mert Incesu, Negar Arabzadeh, Charles L. A. Clarke, Ebrahim Bagheri

General AI

We present ReFormeR, a pattern-guided approach for query reformulation. Instead of prompting a language model to generate reformulations of a query directly, ReFormeR first elicits short reformulation patterns from pairs of initial queries and empirically stronger reformulations, consolidates them into a compact librar…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Revisiting Human-in-the-Loop Object Retrieval with Pre-Trained Vision Transformers

2026-04-01 · Kawtar Zaher, Olivier Buisson, Alexis Joly

General AI

Building on existing approaches, we revisit Human-in-the-Loop Object Retrieval, a task that consists of iteratively retrieving images containing objects of a class-of-interest, specified by a user-provided query. Starting from a large unlabeled image collection, the aim is to rapidly identify diverse instances of an ob…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Transformer self-attention encoder-decoder with multimodal deep learning for response time series forecasting and digital twin support in wind structural health monitoring

2026-04-02 · Feiyu Zhou, Marios Impraimakis

General AI

The wind-induced structural response forecasting capabilities of a novel transformer methodology are examined here. The model also provides a digital twin component for bridge structural health monitoring. Firstly, the approach uses the temporal characteristics of the system to train a forecasting model. Secondly, the …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

LoMa: Local Feature Matching Revisited

2026-04-06 · David Nordström, Johan Edstedt, Georg Bökman, Jonathan Astermark, Anders Heyden, Viktor Larsson, Mårten Wadenbäck, Michael Felsberg, Fredrik Kahl

General AI

Local feature matching has long been a fundamental component of 3D vision systems such as Structure-from-Motion (SfM), yet progress has lagged behind the rapid advances of modern data-driven approaches. The newer approaches, such as feed-forward reconstruction models, have benefited extensively from scaling dataset siz…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

CoStream: Codec-Guided Resource-Efficient System for Video Streaming Analytics

2026-04-07 · Yulin Zou, Yan Chen, Wenyan Chen, JooYoung Park, Shivaraman Nitin, Luo Tao, Francisco Romero, Dmitrii Ustiugov

General AI

Video streaming analytics is a crucial workload for vision-language model serving, but the high cost of multimodal inference limits scalability. Prior systems reduce inference cost by exploiting temporal and spatial redundancy in video streams, but they target either the vision transformer (ViT) or the LLM with a limit…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Graph-PiT: Enhancing Structural Coherence in Part-Based Image Synthesis via Graph Priors

2026-04-07 · Junbin Zhang, Meng Cao, Feng Tan, Yikai Lin, Yuexian Zou

General AI

Achieving fine-grained and structurally sound controllability is a cornerstone of advanced visual generation. Existing part-based frameworks treat user-provided parts as an unordered set and therefore ignore their intrinsic spatial and semantic relationships, which often results in compositions that lack structural int…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models

2026-04-07 · Lin Mu, Haiyang Wang, Li Ni, Lei Sang, Zhize Wu, Peiquan Jin, Yiwen Zhang

General AI

Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning of Large Language Models (LLMs), and recent Mixture-of-Experts (MoE) extensions further enhance flexibility by dynamically combining multiple LoRA experts. However, existing MoE-augmented LoRA methods assume that experts operate independently, often lea…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds

2026-04-09 · Yunsong Zhou, Hangxu Liu, Xuekun Jiang, Xing Shen, Yuanzhen Zhou, Hui Wang, Baole Fang, Yang Tian, Mulin Yu, Qiaojun Yu, Li Ma, Hengjie Li, Hanqing Wang, Jia Zeng, Jiangmiao Pang

General AI

Robotic manipulation with deformable objects represents a data-intensive regime in embodied learning, where shape, contact, and topology co-evolve in ways that far exceed the variability of rigids. Although simulation promises relief from the cost of real-world data acquisition, prevailing sim-to-real pipelines remain …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

$λ_A$: A Typed Lambda Calculus for LLM Agent Composition

2026-04-13 · Qin Liu

General AI

Existing LLM agent frameworks lack formal semantics: there is no principled way to determine whether an agent configuration is well-formed or will terminate. We present $λ_A$, a typed lambda calculus for agent composition that extends the simply-typed lambda calculus with oracle calls, bounded fixpoints (the ReAct loop…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Who Handles Orientation? Investigating Invariance in Feature Matching

2026-04-13 · David Nordström, Johan Edstedt, Fredrik Kahl, Georg Bökman

General AI

Finding matching keypoints between images is a core problem in 3D computer vision. However, modern matchers struggle with large in-plane rotations. A straightforward mitigation is to learn rotation invariance via data augmentation. However, it remains unclear at which stage rotation invariance should be incorporated. I…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Bilevel Late Acceptance Hill Climbing for the Electric Capacitated Vehicle Routing Problem

2026-04-14 · Yinghao Qin, Mosab Bazargani, Edmund K. Burke, Carlos A. Coello Coello, Zhongmin Song, Jun Chen

General AI

This paper tackles the Electric Capacitated Vehicle Routing Problem (E-CVRP) through a bilevel optimization framework that handles routing and charging decisions separately or jointly depending on the search stage. By analyzing their interaction, we introduce a surrogate objective at the upper level to guide the search…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

SEATrack: Simple, Efficient, and Adaptive Multimodal Tracker

2026-04-14 · Junbin Su, Ziteng Xue, Shihui Zhang, Kun Chen, Weiming Hu, Zhipeng Zhang

General AI

Parameter-efficient fine-tuning (PEFT) in multimodal tracking reveals a concerning trend where recent performance gains are often achieved at the cost of inflated parameter budgets, which fundamentally erodes PEFT's efficiency promise. In this work, we introduce SEATrack, a Simple, Efficient, and Adaptive two-stream mu…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Abstract Sim2Real through Approximate Information States

2026-04-16 · Yunfu Deng, Yuhao Li, Josiah P. Hanna

General AI

In recent years, reinforcement learning (RL) has shown remarkable success in robotics when a fast and accurate simulator is available for a given task. When using RL and simulation, more simulator realism is generally beneficial but becomes harder to obtain as robots are deployed in increasingly complex and widescale d…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

ASMR-Bench: Auditing for Sabotage in ML Research

2026-04-17 · Eric Gan, Aryan Bhatt, Buck Shlegeris, Julian Stastny, Vivek Hebbar

General AI

As AI systems are increasingly used to conduct research autonomously, misaligned systems could introduce subtle flaws that produce misleading results while evading detection. We introduce ASMR-Bench (Auditing for Sabotage in ML Research), a benchmark for evaluating the ability of auditors to detect sabotage in ML resea…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

CIG: Measuring Conversational Information Gain in Deliberative Dialogues with Semantic Memory Dynamics

2026-04-17 · Ming-Bin Chen, Jey Han Lau, Lea Frermann

General AI

Measuring the quality of public deliberation requires evaluating not only civility or argument structure, but also the informational progress of a conversation. We introduce a framework for Conversational Information Gain (CIG) that evaluates each utterance in terms of how it advances collective understanding of the ta…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Investigating Conversational Agents to Support Secondary School Students Learning CSP

2026-04-17 · Matthew Frazier, Kostadin Damevski, Lori Pollock

General AI

Secondary school students enrolled in the AP Computer Science Principles (CSP) course commonly utilize web resources (e.g., tutorials, Q\&A sites) to better understand key concepts in the curriculum. The primary obstacle to using these resources is finding information appropriate for the learning task and student's bac…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

T-REN: Learning Text-Aligned Region Tokens Improves Dense Vision-Language Alignment and Scalability

2026-04-20 · Savya Khosla, Sethuraman T, Aryan Chadha, Alex Schwing, Derek Hoiem

General AI

Despite recent progress, vision-language encoders struggle with two core limitations: (1) weak alignment between language and dense vision features, which hurts tasks like open-vocabulary semantic segmentation; and (2) high token counts for fine-grained visual representations, which limits scalability to long videos. T…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

InvestChat: Exploring Multimodal Interaction via Natural Language, Touch, and Pen in an Investment Dashboard

2026-04-21 · Sarah Lykke Tost, Adson Lucas de Paiva Sales, Henrik Østergaard, Vaishali Dhanoa, Gabriela Molina León

General AI

We designed and implemented InvestChat, a multimodal tablet-based application that supports stock market exploration with multiple coordinated views and an LLM-powered chat. We evaluated the application with 12 novice investors. Our findings suggest that combining natural language, touch, and pen input during stock mar…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items

2026-04-21 · Mengting Chen, Zhengrui Chen, Yongchao Du, Zuan Gao, Taihang Hu, Jinsong Lan, Chao Lin, Yefeng Shen, Xingjian Wang, Zhao Wang, Zhengtao Wu, Xiaoli Xu, Zhengze Xu, Hao Yan, Mingzhou Zhang, Jun Zheng, Qinye Zhou, Xiaoyong Zhu, Bo Zheng

General AI

Recent advances in image generation and editing have opened new opportunities for virtual try-on. However, existing methods still struggle to meet complex real-world demands. We present Tstars-Tryon 1.0, a commercial-scale virtual try-on system that is robust, realistic, versatile, and highly efficient. First, our syst…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

VLA Foundry: A Unified Framework for Training Vision-Language-Action Models

2026-04-21 · Jean Mercat, Sedrick Keh, Kushal Arora, Isabella Huang, Paarth Shah, Haruki Nishimura, Shun Iwase, Katherine Liu

General AI

We present VLA Foundry, an open-source framework that unifies LLM, VLM, and VLA training in a single codebase. Most open-source VLA efforts specialize on the action training stage, often stitching together incompatible pretraining pipelines. VLA Foundry instead provides a shared training stack with end-to-end control, …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

FedSIR: Spectral Client Identification and Relabeling for Federated Learning with Noisy Labels

2026-04-22 · Sina Gholami, Abdulmoneam Ali, Tania Haghighi, Ahmed Arafa, Minhaj Nur Alam

General AI

Federated learning (FL) enables collaborative model training without sharing raw data; however, the presence of noisy labels across distributed clients can severely degrade the learning performance. In this paper, we propose FedSIR, a multi-stage framework for robust FL under noisy labels. Different from existing appro…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

2026-04-23 · Zihan Wang, Rui Zhang, Yu Liu, Chi Liu, Qingchuan Zhao, Hongwei Li, Guowen Xu

General AI

LLM agents increasingly rely on skills to encapsulate reusable capabilities via progressively disclosed instructions. High-quality skills inject expert knowledge into general-purpose models, improving performance on specialized tasks. This quality and ease of dissemination drive the emergence of a skill economy: free s…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

MathDuels: Evaluating LLMs as Problem Posers and Solvers

2026-04-23 · Zhiqiu Xu, Shibo Jin, Shreya Arya, Mayur Naik

General AI

As frontier language models attain near-ceiling performance on static mathematical benchmarks, existing evaluations are increasingly unable to differentiate model capabilities, largely because they cast models solely as solvers of fixed problem sets. We introduce MathDuels, a self-play benchmark in which models occupy …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Code for All: Educational Applications of the "Vibe Coding" Hackathon in Programming Education across All Skill Levels

2026-04-24 · Ashley J. Chen, Yijia Cao, Minghao Shao, Ramesh Karri, Muhammad Shafique

General AI

The emergence of large language models has enabled vibe coding, a natural language approach to programming in which users describe intent and AI generates or revises code, potentially broadening access to programming while preserving meaningful learning outcomes. We investigate its educational value through a month-lon…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

CosmicDancePro -- Measuring LEO satellite's orbital decay and network connectivity implications during solar storms

2026-04-24 · Suvam Basak, Amitangshu Pal, Debopam Bhattacherjee

General AI

The May 2024 solar superstorm highlighted the vulnerability of rapidly expanding low Earth orbit (LEO) satellite networks to severe space weather events. To systematically evaluate LEO network resilience, we introduce an open-source tool, CosmicDancePro. It enables a comprehensive analysis of the effects of solar storm…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

FAIR_XAI: Improving Multimodal Foundation Model Fairness via Explainability for Wellbeing Assessment

2026-04-26 · Sophie Chiang, Tom Brennan, Fethiye Irmak Dogan, Jiaee Cheong, Hatice Gunes

General AI

In recent years, the integration of multimodal machine learning in wellbeing assessment has offered transformative potential for monitoring mental health. However, with the rapid advancement of Vision-Language Models (VLMs), their deployment in clinical settings has raised concerns due to their lack of transparency and…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

A systematic literature Review for Transformer-based Software Vulnerability detection

2026-04-27 · Fiza Naseer, Javed Ali Khan, Muhammad Yaqoob, Alexios Mylonas, Ishaya Gambo

General AI

Context: Software vulnerabilities pose significant security threats to software systems, especially as software is increasingly used across many areas of daily life, including health, government, and finance. Recently, transformer-based models have demonstrated promising results in automatic software vulnerability iden…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Aligned Multi-View Scripts for Universal Chart-to-Code Generation

2026-04-27 · Zhihan Zhang, Lizi Liao

General AI

Chart-to-code generation converts a chart image into an executable plotting script, enabling faithful reproduction and editable visualizations. Existing methods are largely Python-centric, limiting practical use and overlooking a critical source of supervision: the same chart can be expressed by semantically equivalent…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

MIMIC: A Generative Multimodal Foundation Model for Biomolecules

2026-04-27 · Siavash Golkar, Jake Kovalic, Irina Espejo Morales, Samuel Sledzieski, Minhuan Li, Ksenia Sokolova, Geraud Krawezik, Alberto Bietti, Claudia Skok Gibbs, Roman Klypa, Shengwei Xiong, Francois Lanusse, Liam Parker, Kyunghyun Cho, Miles Cranmer, Tom Hehir, Michael McCabe, Lucas Meyer, Rudy Morel, Payel Mukhopadhyay, Mariel Pettee, Helen Qu, Jeff Shen, David Fouhey, Hadi Sotoudeh, Vikram Mulligan, Pilar Cossio, Sonya M. Hanson, Alisha N. Jones, Olga G. Troyanskaya, Shirley Ho

General AI

Biological function emerges from coupled constraints across sequence, structure, regulation, evolution, and cellular context, yet most foundation models in biology are trained within one modality or for a fixed forward task. We present MIMIC, a generative multimodal foundation model trained on our newly curated and ali…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer

2026-04-27 · Boyang Wang, Guangyi Xu, Zhipeng Tang, Jiahui Zhang, Zezhou Cheng

General AI

Shot Boundary Detection (SBD) aims to automatically identify shot changes and divide a video into coherent shots. While SBD was widely studied in the literature, existing state-of-the-art methods often produce non-interpretable boundaries on transitions, miss subtle yet harmful discontinuities, and rely on noisy, low-d…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

World-R1: Reinforcing 3D Constraints for Text-to-Video Generation

2026-04-27 · Weijie Wang, Xiaoxuan He, Youping Gu, Yifan Yang, Zeyu Zhang, Yefei He, Yanbo Ding, Xirui Hu, Donny Y. Chen, Zhiyuan He, Yuqing Yang, Bohan Zhuang

General AI

Recent video foundation models demonstrate impressive visual synthesis but frequently suffer from geometric inconsistencies. While existing methods attempt to inject 3D priors via architectural modifications, they often incur high computational costs and limit scalability. We propose World-R1, a framework that aligns v…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

A well-motivated model of pedestrian dynamics

2026-04-29 · Ezel Üsten, Anna Sieben, Mohcine Chraibi, Armin Seyfried

General AI

In pedestrian dynamics, the internal drive that propels individuals toward their goals is typically captured by a single, fixed parameter, the desired walking speed. This simplification overlooks that motivation fluctuates in response to changing spatial and social conditions within a crowd. This paper proposes a dynam…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

ClassEval-Pro: A Cross-Domain Benchmark for Class-Level Code Generation

2026-04-29 · Yeheng Chen, Chaoxiang Xie, Yuling Shi, Wenhao Zeng, Yongpan Wang, Hongyu Zhang, Xiaodong Gu

General AI

LLMs have achieved strong results on both function-level code synthesis and repository-level code modification, yet a capability that falls between these two extremes -- compositional code creation, i.e., building a complete, internally structured class from a specification -- remains underserved. Current evaluations a…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

HealthNLP_Retrievers at ArchEHR-QA 2026: Cascaded LLM Pipeline for Grounded Clinical Question Answering

2026-04-29 · Md Biplob Hosen, Md Alomgeer Hussein, Md Akmol Masud, Omar Faruque, Tera L Reynolds, Lujie Karen Chen

General AI

Patient portals now give individuals direct access to their electronic health records (EHRs), yet access alone does not ensure patients understand or act on the complex clinical information contained in these records. The ArchEHR-QA 2026 shared task addresses this challenge by focusing on grounded question answering ov…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

MoRFI: Monotonic Sparse Autoencoder Feature Identification

2026-04-29 · Dimitris Dimakopoulos, Shay B. Cohen, Ioannis Konstas

General AI

Large language models (LLMs) acquire most of their factual knowledge during the pre-training stage, through next token prediction. Subsequent stages of post-training often introduce new facts outwith the parametric knowledge, giving rise to hallucinations. While it has been demonstrated that supervised fine-tuning (SFT…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

An adaptive wavelet-based PINN for problems with localized high-magnitude source

2026-04-30 · Himanshu Pandey, Ratikanta Behera

General AI

In recent years, physics-informed neural networks (PINNs) have gained significant attention for solving differential equations, although they suffer from two fundamental limitations, namely, spectral bias inherent in neural networks and loss imbalance arising from multiscale phenomena. This paper proposes an adaptive w…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Generalizable Sparse-View 3D Reconstruction from Unconstrained Images

2026-04-30 · Vinayak Gupta, Chih-Hao Lin, Shenlong Wang, Anand Bhattad, Jia-Bin Huang

General AI

Reconstructing 3D scenes from sparse, unposed images remains challenging under real-world conditions with varying illumination and transient occlusions. Existing methods rely on scene-specific optimization using appearance embeddings or dynamic masks, which requires extensive per-scene training and fails under sparse v…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

EASE: Federated Multimodal Unlearning via Entanglement-Aware Anchor Closure

2026-05-01 · Zihao Ding, Beining Wu, Jun Huang

General AI

Federated Multimodal Learning (FML) trains multimodal models across decentralized clients while keeping their image-text pairs private. However, joint embedding training entangles forgotten knowledge across both modalities and client gradient subspaces, hindering federated unlearning. Previous federated unlearning appr…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

GMGaze: MoE-Based Context-Aware Gaze Estimation with CLIP and Multiscale Transformer

2026-05-01 · Xinyuan Zhao, Yihang Wu, Ahmad Chaddad, Sarah A. Alkhodair, Reem Kateb

General AI

Gaze estimation methods commonly use facial appearances to predict the direction of a person gaze. However, previous studies show three major challenges with convolutional neural network (CNN)-based, transformer-based, and contrastive language-image pre-training (CLIP)-based methods, including late fusion of image feat…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Meritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Values

2026-05-01 · Shradha Sharma, Swapnil Dhamal, Shweta Jain

General AI

We propose a new framework for meritocratic fairness in budgeted combinatorial multi-armed bandits with full-bandit feedback (BCMAB-FBF). Unlike semi-bandit feedback, the contribution of individual arms is not received in full-bandit feedback, making the setting significantly more challenging. To compute arm contributi…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

DNQ: Deep Nash Q-Network for Partially Observable n-Player Games

2026-06-04 · Qintong Xie, Edward Koh, Xavier Cadet, Peter Chin

General AI

Many real-world competitive systems require multiple decision-makers to act simultaneously under shared constraints, limited information, and repeated interaction, as in auctions, resource allocation, and security competition. We study multi-turn simultaneous bidding as a controlled testbed for such problems and propos…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Discrete Causal Representations from Heterogeneous Domains: A Bayesian Approach with Social Survey Applications

2026-06-04 · Ankur Garg, Michael Stettler, Aaron Schein, Julius von Kügelgen

General AI

Causal representation learning aims to infer the high-level latent causal concepts that give rise to observed low-level measurements. This is particularly relevant for heterogeneous data from different environments or domains since distribution shifts often arise through sparse, localized changes in some of the underly…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Goedel-Architect: Streamlining Formal Theorem Proving with Blueprint Generation and Refinement

2026-06-04 · Jui-Hui Chung, Ziyang Cai, Zihao Li, Qishuo Yin, Rohit Agarwal, Simon Park, Rodrigo Porto, Narutatsu Ri, Ziran Yang, Shange Tang, Xingyu Dang, Hongzhou Lin, Mengdi Wang, Danqi Chen, Chi Jin, Liam H Fowl, Sanjeev Arora

General AI

We introduce Goedel-Architect, an agentic framework for formal theorem proving in Lean 4 centered on blueprint generation and refinement. A blueprint is a dependency graph of definitions and lemmas that builds up to the main theorem. First, Goedel-Architect generates a blueprint of formally stated definitions and lemma…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Pretraining Recurrent Networks without Recurrence

2026-06-04 · Akarsh Kumar, Phillip Isola

General AI

Training recurrent neural networks (RNNs) requires assigning credit across long sequences of computations. Standard backpropagation through time (BPTT) addresses this problem poorly: it is sequential in time, limiting parallelism, and suffers from vanishing or exploding gradients, making long-range associations difficu…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Will the Agent Recuse Itself? Measuring LLM-Agent Compliance with In-Band Access-Deny Signals

2026-06-04 · Thamilvendhan Munirathinam

General AI

As autonomous LLM agents increasingly hold real credentials and operate infrastructure without a human in the loop, operators have no standard way to tell an agent that a resource is off-limits. Access controls either let the agent in (it has valid credentials) or hard-fail it (indistinguishable from any other client).…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Cross-Modal Masking for Robust Silent Speech Synthesis Using sEMG and Lipreading

2026-06-08 · Eder del Blanco, David Gimeno-Gómez, Eva Navas, Carlos-D. Martínez-Hinarejos, Inma Hernáez

General AI

Speech restoration through silent speech interfaces (SSIs) has emerged as a promising assistive technology for individuals with impaired or absent laryngeal voice production. Among non-invasive SSI modalities, surface electromyography (sEMG) and video-based lipreading provide complementary articulatory information, yet…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Discovering Functionally Selective Brain Regions with a Deep Topographic Multimodal Model

2026-06-08 · Badr AlKhamissi, Johannes Mehrer, Lara Marinov, Ahmed Abdelaal, Abdulkadir Gokce, Martin Schrimpf

General AI

Nearby neurons in cortex share similar response profiles, producing systematic spatial organization across sensory and cognitive systems. Recent topographic models reproduce aspects of this structure but remain unimodal and spatially constrain each layer separately, yielding fragmented maps that capture neither the con…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

SynManDex: Synthesizing Human-like Dexterous Grasps from Synthetic Human Pre-Grasps

2026-06-08 · Yanming Shao, Zanxin Chen, Wenwei Lin, Mingjie Zhou, Tianxing Chen, Xiaokang Yang, Yichen Chi, Yao Mu

General AI

Human hand-object interactions encode functional intent, but direct transfer to robotic hands often fails under morphology, contact, and reachability constraints. We present SynManDex, a synthetic pipeline that uses generated human pre-grasps as affordance-aware proposals and resolves the final contacts with robot-nati…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

AnyMod-LLVE: Low-Light Video Enhancement with Modality-Agnostic Inference

2026-06-09 · Hangfeng Liang, Yutao Hu, Yanhan Hu, Xiaohan Wu, Wenqi Shao, Ying Fu

General AI

Low-light video enhancement (LLVE) remains a challenging task due to severe information degradation under low-illumination conditions. Recent multimodal approaches have significantly improved enhancement performance by incorporating auxiliary modalities, such as event streams and infrared images. However, these methods…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Instruction Finetuning DeepSeek-R1-8B Model Using LoRA and NEFTune

2026-06-09 · Wu Yuerong, Mingni Luo

General AI

Financial named-entity recognition (NER) is essential for translating unstructured financial reports and news into structured knowledge graphs. However, general-purpose large language models (LLMs) often misclassify financial entities or ignore domain-specific patterns. This paper investigates the use of DeepSeek-R1-8B…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

TacForeSight: Force-Guided Tactile World Model for Contact-Rich Manipulation

2026-06-09 · Yujie Zang, Yuhang Zheng, Xian Nie, Yupeng Zheng, Shuai Tian, Songen Gu, Chen Gao, Zining Wang, Shuicheng Yan, Wenchao Ding

General AI

Contact-rich manipulation requires robots to continuously perceive and regulate evolving physical interactions under dynamic contact transitions or complex surface geometries. Recent imitation learning methods improve contact-aware control by incorporating tactile or force feedback, but they rarely model the asymmetric…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

SDS-LoRA: Overcoming Anisotropic Gradient Scaling in Low-Rank Adaptation

2026-06-15 · Junghun Oh, Sungyong Baik, Kyoung Mu Lee

General AI

Low-Rank Adaptation (LoRA) enables efficient adaptation of large pre-trained models to downstream tasks by parameterizing weight updates with low-rank matrices. In this paper, we investigate the limitations of the LoRA parameterization from a geometric perspective. Specifically, we show that when a full fine-tuning gra…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences

2026-06-15 · Mingyang Li, Yurou Liu, Jieping Ye, Bing Su, Ji-Rong Wen, Zheng Wang

General AI

In this report, we present LOGOS (Language Of Generative Objects in Science), a scientific generative language model that unifies heterogeneous tasks across the natural sciences within a single autoregressive framework based on a shared scientific grammar. It encodes diverse scientific objects and their spatial interac…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Darshana Graph: A Parallel Commentary Corpus for Comparative Indian Philosophy, with Stylometric and Exploratory Graph Analyses

2026-06-16 · Joy Bose

General AI

We introduce Darshana Graph, a corpus of over 125,000 text records spanning classical Hindu, Buddhist, and Jain philosophical traditions, drawn from public-domain and openly licensed translations of sources including the Bhagavad Gita, Brahma Sutras, principal Upanishads, the Pali Canon, and core Jain texts. Its distin…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Future Dynamic 3D Reconstruction: A 3D World Model with Disentangled Ego-Motion

2026-06-16 · Nils Morbitzer, Jonathan Evers, Artem Savkin, Thomas Stauner, Nassir Navab, Federico Tombari, Stefano Gasperini

General AI

Forecasting the evolution of dynamic environments is crucial for autonomous agents. While generative world models have recently achieved high photorealism in 2D video synthesis by mixing ego-motion and environmental dynamics within the image plane, they exhibit physical inconsistencies, such as morphing or vanishing ob…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

IUU+DB: Tracking Illegal, Unreported, and Unregulated Fishing, Seafood Fraud, and Labor Abuse through LLM-driven Information Extraction

2026-06-16 · Henry Bodwell, Hong Yang, John C. Simeone, Kelvin Gorospe, Bella Sullivan, Lana Huang, Jessica Gephart, Sandy Aylesworth, Molly Masterton, Naren Ramakrishnan

General AI

Illegal, unreported, and unregulated fishing (IUU) traditionally refers to fishing activities that violate applicable laws or occur in areas that lack applicable laws. We propose the term IUU+ to capture a broader suite of fisheries sector environmental and associated supply chain trade-related crimes and behaviors. Al…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

MOCHI: Motion Enhancement of Collaborative Human-object Interactions

2026-06-16 · Jiye Lee, Yonghun Choi, Jungdam Won

General AI

Collaborative human-object interaction shows dynamic and complex movements that require mutual anticipation and continuous adjustment between participants and the shared object. Modeling such collaborative multi-human object interaction (MHOI) scenarios requires high-quality data acquisition as a foundational step; how…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Optimal scenario design for climate emulation

2026-06-17 · Christopher B. Womack, Shahine Bouabid, Andrei Sokolov, Popat Salunke, Glenn Flierl, Sebastian D. Eastham, Noelle E. Selin

General AI

As deep learning for physical systems continues to grow in popularity, efforts to improve generalizability have primarily focused on designing architectures that embed physical constraints. However, for machine-learning surrogate climate models (emulators), we show that the low structural diversity in existing scenario…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 5.3

Zero-Shot Long-Horizon Dexterous Manipulation via Multi-View 3D-Grounded VLM Reasoning

2026-06-17 · Jisoo Kim, Sangwon Baik, Taeksoo Kim, Sungjoo Kim, Junyoung Lee, Mingi Choi, Hanbyul Joo

General AI

We present a zero-shot framework for long-horizon dexterous manipulation that grounds language instructions into executable 3D task plans from calibrated multi-view RGB images. Rather than training an end-to-end policy, our system uses a vision-language model (VLM) to produce reference-frame task grounding and primitiv…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 5.0

On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment

2026-05-12 · Bo Yin, Qi Li, Xinchao Wang

General AI

Tool-using LLM agents fail through trajectories rather than only final responses, as they may execute unsafe tool calls, follow injected instructions, comply with harmful requests, or over-refuse benign tasks despite producing a seemingly safe answer. Existing safety-alignment signals are largely response-level or off-…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.8

Efficient and Adaptive Human Activity Recognition via LLM Backbones

2026-05-12 · Aleksandr Bredikhin, Philippe Lalanda, German Vega

General AI

Human Activity Recognition (HAR) is a core task in pervasive computing systems, where models must operate under strict computational constraints while remaining robust to heterogeneous and evolving deployment conditions. Recent advances based on Transformer architectures have significantly improved recognition performa…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.8

Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

2026-05-12 · Kexuan Shi, Hanxuan Li, Zeju Qiu, Yandong Wen, Simon Buchholz, Weiyang Liu

General AI

We introduce Pion, a spectrum-preserving optimizer for large language model (LLM) training based on orthogonal equivalence transformation. Unlike additive optimizers such as Adam and Muon, Pion updates each weight matrix through left and right orthogonal transformations, preserving its singular values throughout traini…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.8

Predicting Disagreement with Human Raters in LLM-as-a-Judge Difficulty Assessment without Using Generation-Time Probability Signals

2026-05-12 · Yo Ehara

General AI

Automatic generation of educational materials using large language models (LLMs) is becoming increasingly common, but assigning difficulty levels to such materials still requires substantial human effort. LLM-as-a-Judge has therefore attracted attention, yet disagreement with human raters remains a major challenge. We …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.8

Task-Adaptive Embedding Refinement via Test-time LLM Guidance

2026-05-12 · Ariel Gera, Shir Ashury-Tahan, Gal Bloch, Ohad Eytan, Assaf Toledo

General AI

We explore the effectiveness of an LLM-guided query refinement paradigm for extending the usability of embedding models to challenging zero-shot search and classification tasks. Our approach refines the embedding representation of a user query using feedback from a generative LLM on a small set of documents, enabling e…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.8

U-STS-LLM A Unified Spatio-Temporal Steered Large Language Model for Traffic Prediction and Imputation

2026-05-12 · Yichen Zhang, Jun Li

General AI

The efficient operation of modern cellular networks hinges on the accurate analysis of spatio-temporal traffic data. Mastering these patterns is essential for core network functions, chiefly forecasting future load to pre-empt congestion and imputing missing values caused by sensor failures or transmission errors to en…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 4.5

HDP: A Lightweight Cryptographic Protocol for Human Delegation Provenance in Agentic AI Systems

2026-04-06 · Asiri Dalugoda

General AI

Agentic AI systems increasingly execute consequential actions on behalf of human principals, delegating tasks through multi-step chains of autonomous agents. No existing standard addresses a fundamental accountability gap: verifying that terminal actions in a delegation chain were genuinely authorized by a human princi…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 4.5

QiMeng-PRepair: Precise Code Repair via Edit-Aware Reward Optimization

2026-04-07 · Changxin Ke, Rui Zhang, Jiaming Guo, Yuanbo Wen, Li Ding, Shuo Wang, Xuyuan Zhu, Xiong Peng, Di Huang, Zidong Du, Xing Hu, Qi Guo, Yunji Chen

General AI

Large Language Models (LLMs) achieve strong program repair performance but often suffer from over-editing, where excessive modifications overwrite correct code and hinder bug localization. We systematically quantify its impact and introduce precise repair task, which maximizes reuse of correct code while fixing only bu…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 4.5

Model Capability Dominates: Inference-Time Optimization Lessons from AIMO 3

2026-04-16 · Natapong Nitarach

General AI

Majority voting over multiple LLM attempts improves mathematical reasoning, but correlated errors limit the effective sample size. A natural fix is to assign different reasoning strategies to different voters. The approach, Diverse Prompt Mixer, is tested on the AIMO 3 competition: 3 models, 23+ experiments, 50 IMO-lev…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 4.5

Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration

2026-04-20 · Qifan Zhang, Dongyang Ma, Tianqing Fang, Jia Li, Jing Tang, Nuo Chen, Haitao Mi, Yan Wang

General AI

Most agents today ``self-evolve'' by following rewards and rules defined by humans. However, this process remains fundamentally dependent on external supervision; without human guidance, the evolution stops. In this work, we train agents to possess an intrinsic meta-evolution capability to spontaneously learn about uns…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 4.5

TEMPO: Scaling Test-time Training for Large Reasoning Models

2026-04-21 · Qingyang Zhang, Xinke Kong, Haitao Wu, Qinghua Hu, Minghao Wu, Baosong Yang, Yu Cheng, Yun Luo, Ganqu Cui, Changqing Zhang

General AI

Test-time training (TTT) adapts model parameters on unlabeled test instances during inference time, which continuously extends capabilities beyond the reach of offline training. Despite initial gains, existing TTT methods for LRMs plateau quickly and do not benefit from additional test-time compute. Without external ca…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 4.5

IAM: Identity-Aware Human Motion and Shape Joint Generation

2026-04-28 · Wenqi Jia, Zekun Li, Abhay Mittal, Chengcheng Tang, Chuan Guo, Lezi Wang, James Matthew Rehg, Lingling Tao, Size An

General AI

Recent advances in text-driven human motion generation enable models to synthesize realistic motion sequences from natural language descriptions. However, most existing approaches assume identity-neutral motion and generate movements using a canonical body representation, ignoring the strong influence of body morpholog…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 4.5

Toward Scalable Terminal Task Synthesis via Skill Graphs

2026-04-28 · Zhiyuan Fan, Tinghao Yu, Yuanjun Cai, Jiangtao Guan, Yun Yang, Dingxin Hu, Jiang Zhou, Xing Wu, Zhuo Han, Feng Zhang, Lilin Wang

General AI

Terminal agents have demonstrated strong potential for autonomous command-line execution, yet their training remains constrained by the scarcity of high-quality and diverse execution trajectories. Existing approaches mitigate this bottleneck by synthesizing large-scale terminal task instances for trajectory sampling. H…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 4.5

Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding

2026-04-29 · Hayate Iso, Tiyasa Mitra, Sudipta Mondal, Rasoul Shafipour, Venmugil Elango, Terry Kong, Yuki Huang, Seonjin Na, Izzy Putterman, Benjamin Chislett, Maor Ashkenazi, Joseph Guman, Gerald Shen, Tugrul Konuk, Ashwath Aithal, Ritika Borkar, Ran Zilberstein, Bita Rouhani

General AI

RL post-training of frontier language models is increasingly bottlenecked by autoregressive rollout generation, making rollout acceleration a central systems challenge. Many existing efficiency methods improve throughput by changing the rollout or optimization regime, for example, through off-policy execution, replay, …

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 4.5

FASH-iCNN: Making Editorial Fashion Identity Inspectable Through Multimodal CNN Probing

2026-04-29 · Morayo Danielle Adeyemi, Ryan A. Rossi, Franck Dernoncourt

General AI

Fashion AI systems routinely encode the aesthetic logic of specific houses, editors, and historical moments without disclosing it. We present FASH-iCNN, a multimodal system trained on 87,547 Vogue runway images across 15 fashion houses spanning 1991-2024 that makes this cultural logic inspectable. Given a photograph of…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Frequency Switching Mechanism for Parameter-E!cient Multi-Task Learning

2026-03-22 · Shih-Wen Liu, Yen-Chang Chen, Wei-Ta Chu, Fu-En Yang, Yu-Chiang Frank Wang

General AI

Multi-task learning (MTL) aims to enable a single model to solve multiple tasks efficiently; however, current parameter-efficient fine-tuning (PEFT) methods remain largely limited to single-task adaptation. We introduce \textbf{Free Sinewich}, a parameter-efficient multi-task learning framework that enables near-zero-c…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 4.3

Designing Any Imaging System from Natural Language: Agent-Constrained Composition over a Finite Primitive Basis

2026-03-26 · Chengshuai Yang

General AI

Designing a computational imaging system -- selecting operators, setting parameters, validating consistency -- requires weeks of specialist effort per modality, creating an expertise bottleneck that excludes the broader scientific community from prototyping imaging instruments. We introduce spec.md, a structured specif…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

The Kitchen Loop: User-Spec-Driven Development for a Self-Evolving Codebase

2026-03-26 · Yannick Roy

General AI

Code production is now a commodity; the bottleneck is knowing what to build and proving it works. We present the Kitchen Loop, a framework for autonomous, self-evolving software built on a unified trust model: (1) a specification surface enumerating what the product claims to support; (2) 'As a User x 1000', where an L…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

EpiScreen: Early Epilepsy Detection from Electronic Health Records with Large Language Models

2026-03-30 · Shuang Zhou, Kai Yu, Zaifu Zhan, Huixue Zhou, Min Zeng, Feng Xie, Zhiyi Sha, Rui Zhang

General AI

Epilepsy and psychogenic non-epileptic seizures often present with similar seizure-like manifestations but require fundamentally different management strategies. Misdiagnosis is common and can lead to prolonged diagnostic delays, unnecessary treatments, and substantial patient morbidity. Although prolonged video-electr…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Geometry-aware similarity metrics for neural representations on Riemannian and statistical manifolds

2026-03-30 · N Alex Cayco Gajic, Arthur Pellegrino

General AI

Similarity measures are widely used to interpret the representational geometries used by neural networks to solve tasks. Yet, because existing methods compare the extrinsic geometry of representations in state space, rather than their intrinsic geometry, they may fail to capture subtle yet crucial distinctions between …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Multimodal Analytics of Cybersecurity Crisis Preparation Exercises: What Predicts Success?

2026-03-30 · Conrad Borchers, Valdemar Švábenský, Sandesh K. Kafle, Kevin K. Tang, Jan Vykopal

General AI

Instructional alignment, the match between intended cognition and enacted activity, is central to effective instruction but hard to operationalize at scale. We examine alignment in cybersecurity simulations using multimodal traces from 23 teams (76 students) across five exercise sessions. Study 1 codes objectives and t…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Rethinking Language Model Scaling under Transferable Hypersphere Optimization

2026-03-30 · Liliang Ren, Yang Liu, Yelong Shen, Weizhu Chen

General AI

Scaling laws for large language models depend critically on the optimizer and parameterization. Existing hyperparameter transfer laws are mainly developed for first-order optimizers, and they do not structurally prevent training instability at scale. Recent hypersphere optimization methods constrain weight matrices to …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Learning Structural-Functional Brain Representations through Multi-Scale Adaptive Graph Attention for Cognitive Insight

2026-03-31 · Badhan Mazumder, Sir-Lord Wiafe, Aline Kotoski, Vince D. Calhoun, Dong Hye Ye

General AI

Understanding how brain structure and function interact is key to explaining intelligence yet modeling them jointly is challenging as the structural and functional connectome capture complementary aspects of organization. We introduced Multi-scale Adaptive Graph Network (MAGNet), a Transformer-style graph neural networ…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Are Latent Reasoning Models Easily Interpretable?

2026-04-06 · Connor Dilgren, Sarah Wiegreffe

General AI

Latent reasoning models (LRMs) have attracted significant research interest due to their low inference cost (relative to explicit reasoning models) and theoretical ability to explore multiple reasoning paths in parallel. However, these benefits come at the cost of reduced interpretability: LRMs are difficult to monitor…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Beyond the Final Actor: Modeling the Dual Roles of Creator and Editor for Fine-Grained LLM-Generated Text Detection

2026-04-06 · Yang Li, Qiang Sheng, Zhengjia Wang, Yehan Yang, Danding Wang, Juan Cao

General AI

The misuse of large language models (LLMs) requires precise detection of synthetic text. Existing works mainly follow binary or ternary classification settings, which can only distinguish pure human/LLM text or collaborative text at best. This remains insufficient for the nuanced regulation, as the LLM-polished human t…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

HI-MoE: Hierarchical Instance-Conditioned Mixture-of-Experts for Object Detection

2026-04-06 · Vadim Vashkelis, Natalia Trukhina

General AI

Mixture-of-Experts (MoE) architectures enable conditional computation by activating only a subset of model parameters for each input. Although sparse routing has been highly effective in language models and has also shown promise in vision, most vision MoE methods operate at the image or patch level. This granularity i…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Your Pre-trained Diffusion Model Secretly Knows Restoration

2026-04-06 · Sudarshan Rajagopalan, Vishal M. Patel

General AI

Pre-trained diffusion models have enabled significant advancements in All-in-One Restoration (AiOR), offering improved perceptual quality and generalization. However, diffusion-based restoration methods primarily rely on fine-tuning or Control-Net style modules to leverage the pre-trained diffusion model's priors for A…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Artificial Intelligence and the Structure of Mathematics

2026-04-07 · Maissam Barkeshli, Michael R. Douglas, Michael H. Freedman

General AI

Recent progress in artificial intelligence (AI) is unlocking transformative capabilities for mathematics. There is great hope that AI will help solve major open problems and autonomously discover new mathematical concepts. In this essay, we further consider how AI may open a grand perspective on mathematics by forging …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Constraint-Driven Warm-Freeze for Efficient Transfer Learning in Photovoltaic Systems

2026-04-07 · Yasmeen Saeed, Ahmed Sharshar, Mohsen Guizani

General AI

Detecting cyberattacks in photovoltaic (PV) monitoring and MPPT control signals requires models that are robust to bias, drift, and transient spikes, yet lightweight enough for resource-constrained edge controllers. While deep learning outperforms traditional physics-based diagnostics and handcrafted features, standard…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

DiffHDR: Re-Exposing LDR Videos with Video Diffusion Models

2026-04-07 · Zhengming Yu, Li Ma, Mingming He, Leo Isikdogan, Yuancheng Xu, Dmitriy Smirnov, Pablo Salamanca, Dao Mi, Pablo Delgado, Ning Yu, Julien Philip, Xin Li, Wenping Wang, Paul Debevec

General AI

Most digital videos are stored in 8-bit low dynamic range (LDR) formats, where much of the original high dynamic range (HDR) scene radiance is lost due to saturation and quantization. This loss of highlight and shadow detail precludes mapping accurate luminance to HDR displays and limits meaningful re-exposure in post-…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

LLM4CodeRE: Generative AI for Code Decompilation Analysis and Reverse Engineering

2026-04-07 · Hamed Jelodar, Samita Bai, Tochukwu Emmanuel Nwankwo, Parisa Hamedi, Mohammad Meymani, Roozbeh Razavi-Far, Ali A. Ghorbani

General AI

Code decompilation analysis is a fundamental yet challenging task in malware reverse engineering, particularly due to the pervasive use of sophisticated obfuscation techniques. Although recent large language models (LLMs) have shown promise in translating low-level representations into high-level source code, most exis…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts

2026-04-09 · Jiayuan Ye, Vitaly Feldman, Kunal Talwar

General AI

Large language models (LLMs) can struggle to memorize factual knowledge in their parameters, often leading to hallucinations and poor performance on knowledge-intensive tasks. In this paper, we formalize fact memorization from an information-theoretic perspective and study how training data distributions affect fact ac…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

LottieGPT: Tokenizing Vector Animation for Autoregressive Generation

2026-04-13 · Junhao Chen, Kejun Gao, Yuehan Cui, Mingze Sun, Mingjin Chen, Shaohui Wang, Xiaoxiao Long, Fei Ma, Qi Tian, Ruqi Huang, Hao Zhao

General AI

Despite rapid progress in video generation, existing models are incapable of producing vector animation, a dominant and highly expressive form of multimedia on the Internet. Vector animations offer resolution-independence, compactness, semantic structure, and editable parametric motion representations, yet current gene…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Pair2Scene: Learning Local Object Relations for Procedural Scene Generation

2026-04-13 · Xingjian Ran, Shujie Zhang, Weipeng Zhong, Li Luo, Bo Dai

General AI

Generating high-fidelity 3D indoor scenes remains a significant challenge due to data scarcity and the complexity of modeling intricate spatial relations. Current methods often struggle to scale beyond training distribution to dense scenes or rely on LLMs/VLMs that lack the ability for precise spatial reasoning. Buildi…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Learning Versatile Humanoid Manipulation with Touch Dreaming

2026-04-14 · Yaru Niu, Zhenlong Fang, Binghong Chen, Shuai Zhou, Revanth Senthilkumaran, Hao Zhang, Bingqing Chen, Chen Qiu, H. Eric Tseng, Jonathan Francis, Ding Zhao

General AI

Humanoid robots promise general-purpose assistance, yet real-world humanoid loco-manipulation remains challenging because it requires whole-body stability, dexterous hands, and contact-aware perception under frequent contact changes. In this work, we study dexterous, contact-rich humanoid loco-manipulation. We first de…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

PAL: Personal Adaptive Learner

2026-04-14 · Megha Chakraborty, Darssan L. Eswaramoorthi, Madhur Thareja, Het Riteshkumar Shah, Finlay Palmer, Aryaman Bahl, Michelle A Ihetu, Amit Sheth

General AI

AI-driven education platforms have made some progress in personalisation, yet most remain constrained to static adaptation--predefined quizzes, uniform pacing, or generic feedback--limiting their ability to respond to learners' evolving understanding. This shortfall highlights the need for systems that are both context…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Probabilistic Feature Imputation and Uncertainty-Aware Multimodal Federated Aggregation

2026-04-14 · Nafis Fuad Shahid, Maroof Ahmed, Md Akib Haider, Saidur Rahman Sagor, Aashnan Rahman, Md Azam Hossain

General AI

Multimodal federated learning enables privacy-preserving collaborative model training across healthcare institutions. However, a fundamental challenge arises from modality heterogeneity: many clinical sites possess only a subset of modalities due to resource constraints or workflow variations. Existing approaches addre…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Scalable Trajectory Generation for Whole-Body Mobile Manipulation

2026-04-14 · Yida Niu, Xinhai Chang, Xin Liu, Ziyuan Jiao, Yixin Zhu

General AI

Robots deployed in unstructured environments must coordinate whole-body motion -- simultaneously moving a mobile base and arm -- to interact with the physical world. This coupled mobility and dexterity yields a state space that grows combinatorially with scene and object diversity, demanding datasets far larger than th…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Agent-Aided Design for Dynamic CAD Models

2026-04-16 · Mitch Adler, Matthew Russo, Michael Cafarella

General AI

In the past year, researchers have started to create agentic systems that can design real-world CAD-style objects in a training-free setting, a new variety of system that we call Agent-Aided Design. Generally speaking, these systems place an agent in a feedback loop in which it can write code, compile that code to an a…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations

2026-04-16 · Manan Gupta, Dhruv Kumar

General AI

LLM-as-judge frameworks are increasingly used for automatic NLG evaluation, yet their per-instance reliability remains poorly understood. We present a two-pronged diagnostic toolkit applied to SummEval: $\textbf{(1)}$ a transitivity analysis that reveals widespread per-input inconsistency masked by low aggregate violat…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Why Do Vision Language Models Struggle To Recognize Human Emotions?

2026-04-16 · Madhav Agarwal, Sotirios A. Tsaftaris, Laura Sevilla-Lara, Steven McDonagh

General AI

Understanding emotions is a fundamental ability for intelligent systems to be able to interact with humans. Vision-language models (VLMs) have made tremendous progress in the last few years for many visual tasks, potentially offering a promising solution for understanding emotions. However, it is surprising that even t…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

AnchorSeg: Language Grounded Query Banks for Reasoning Segmentation

2026-04-20 · Rui Qian, Chuanhang Deng, Qiang Huang, Jian Xiong, Mingxuan Li, Yingbo Zhou, Wei Zhai, Jintao Chen, Dejing Dou

General AI

Reasoning segmentation requires models to ground complex, implicit textual queries into precise pixel-level masks. Existing approaches rely on a single segmentation token $\texttt{<SEG>}$, whose hidden state implicitly encodes both semantic reasoning and spatial localization, limiting the model's ability to explicitly …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale

2026-04-20 · A. Sophia Koepke, Daniil Zverev, Shiry Ginosar, Alexei A. Efros

General AI

The Platonic Representation Hypothesis suggests that neural networks trained on different modalities (e.g., text and images) align and eventually converge toward the same representation of reality. If true, this has significant implications for whether modality choice matters at all. We show that the experimental evide…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Dual Alignment Between Language Model Layers and Human Sentence Processing

2026-04-20 · Tatsuki Kuribayashi, Alex Warstadt, Yohei Oseki, Ethan Gotlieb Wilcox

General AI

A recent study (Kuribayashi et al., 2025) has shown that human sentence processing behavior, typically measured on syntactically unchallenging constructions, can be effectively modeled using surprisal from early layers of large language models (LLMs). This raises the question of whether such advantages of internal laye…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Evolutionary Negative Module Pruning for Better LoRA Merging

2026-04-20 · Anda Cao, Zhuo Gou, Yi Wang, Kaixuan Chen, Yu Wang, Can Wang, Mingli Song, Jie Song

General AI

Merging multiple Low-Rank Adaptation (LoRA) experts into a single backbone is a promising approach for efficient multi-task deployment. While existing methods strive to alleviate interference via weight interpolation or subspace alignment, they rest upon the implicit assumption that all LoRA matrices contribute constru…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

2026-04-20 · Haoyu Wu, Jiwen Yu, Yingtian Zou, Xihui Liu

General AI

Video world models have achieved remarkable success in simulating environmental dynamics in response to actions by users or agents. They are modeled as action-conditioned video generation models that take historical frames and current actions as input to predict future frames. Yet, most existing approaches are limited …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

SynAgent: Generalizable Cooperative Humanoid Manipulation via Solo-to-Cooperative Agent Synergy

2026-04-20 · Wei Yao, Haohan Ma, Hongwen Zhang, Yunlian Sun, Liangjun Xing, Zhile Yang, Yuanjun Guo, Yebin Liu, Jinhui Tang

General AI

Controllable cooperative humanoid manipulation is a fundamental yet challenging problem for embodied intelligence, due to severe data scarcity, complexities in multi-agent coordination, and limited generalization across objects. In this paper, we present SynAgent, a unified framework that enables scalable and physicall…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

DP-FlogTinyLLM: Differentially private federated log anomaly detection using Tiny LLMs

2026-04-21 · Isaiah Thompson, Tanmay Sen, Ritwik Bhattacharya

General AI

Modern distributed systems generate massive volumes of log data that are critical for detecting anomalies and cyber threats. However, in real world settings, these logs are often distributed across multiple organizations and cannot be centralized due to privacy and security constraints. Existing log anomaly detection m…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

InHabit: Leveraging Image Foundation Models for Scalable 3D Human Placement

2026-04-21 · Nikita Kister, Pradyumna YM, István Sárándi, Jiayi Wang, Anna Khoreva, Gerard Pons-Moll

General AI

Training embodied agents to understand 3D scenes as humans do requires large-scale data of people meaningfully interacting with diverse environments, yet such data is scarce. Real-world motion capture is costly and limited to controlled settings, while existing synthetic datasets rely on simple geometric heuristics tha…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Structure-guided molecular design with contrastive 3D protein-ligand learning

2026-04-21 · Carles Navarro, Philipp Tholke, Gianni de Fabritiis

General AI

Structure-based drug discovery faces the dual challenge of accurately capturing 3D protein-ligand interactions while navigating ultra-large chemical spaces to identify synthetically accessible candidates. In this work, we present a unified framework that addresses these challenges by combining contrastive 3D structure …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling

2026-04-21 · Boyu Chen, Yi Chen, Lu Qiu, Jerry Bai, Yuying Ge, Yixiao Ge

General AI

Scaling humanoid foundation models is bottlenecked by the scarcity of robotic data. While massive egocentric human data offers a scalable alternative, bridging the cross-embodiment chasm remains a fundamental challenge due to kinematic mismatches. We introduce UniT (Unified Latent Action Tokenizer via Visual Anchoring)…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Interval POMDP Shielding for Imperfect-Perception Agents

2026-04-22 · William Scarbro, Ravi Mangal

General AI

Autonomous systems that rely on learned perception can make unsafe decisions when sensor readings are misclassified. We study shielding for this setting: given a proposed action, a shield blocks actions that could violate safety. We consider the common case where system dynamics are known but perception uncertainty mus…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Reliability as a Design Principle: A Systematic Review and Integrated Framework for Renewable-Based Microgrids

2026-04-22 · Mohammed Zeehan Saleheen, Markus Wagner, Reza Razzaghi, Hao Wang

General AI

Reliable operation is a central motivation for deploying renewable-based microgrids. This paper presents a systematic rapid review that positions reliability as the central organizing principle for microgrid design. Specifically, this review systematically synthesizes recent literature to examine how planning assumptio…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs

2026-04-23 · Hao-Yu Hsu, Tianhang Cheng, Jing Wen, Alexander G. Schwing, Shenlong Wang

General AI

Understanding human activities and their surrounding environments typically relies on visual perception, yet cameras pose persistent challenges in privacy, safety, energy efficiency, and scalability. We explore an alternative: 4D perception without vision. Its goal is to reconstruct human motion and 3D scene layouts pu…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

UniGenDet: A Unified Generative-Discriminative Framework for Co-Evolutionary Image Generation and Generated Image Detection

2026-04-23 · Yanran Zhang, Wenzhao Zheng, Yifei Li, Bingyao Yu, Yu Zheng, Lei Chen, Jiwen Lu, Jie Zhou

General AI

In recent years, significant progress has been made in both image generation and generated image detection. Despite their rapid, yet largely independent, development, these two fields have evolved distinct architectural paradigms: the former predominantly relies on generative networks, while the latter favors discrimin…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Complexity of Linear Regions in Self-supervised Deep ReLU Networks

2026-04-27 · Mufhumudzi Muthivhi, Terence L. van Zyl

General AI

There has been growing interest in studying the complexity of Rectified Linear Unit (ReLU) based activation networks. Recent work investigates the evolution of the number of piecewise-linear partitions (linear regions) that are formed during training. However, current research is limited to examining the complexity of …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

From Syntax to Emotion: A Mechanistic Analysis of Emotion Inference in LLMs

2026-04-28 · Bangzhao Shu, Arinjay Singh, Mai ElSherief

General AI

Large language models (LLMs) are increasingly used in emotionally sensitive human-AI applications, yet little is known about how emotion recognition is internally represented. In this work, we investigate the internal mechanisms of emotion recognition in LLMs using sparse autoencoders (SAEs). By analyzing sparse featur…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Artistic Practice Opportunities in CST Evaluations: A Longitudinal Group Deployment of ArtKrit

2026-04-29 · Catherine Liu, Tao Long, Asya Vaisberg, Chau Vu, Jiaju Ma, Jingyi Li

General AI

Creativity support tools (CSTs) aim to elevate the quality of artists' creative processes and artifacts. Yet most current CST evaluations overlook temporal and social aspects of tool use. To address this gap, we present a longitudinal, group-based CST evaluation through a three-week deployment of ArtKrit, a computation…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Causal Learning with Neural Assemblies

2026-04-29 · Evangelia Kopadi, Dimitris Kalles

General AI

Can Neural Assemblies -- groups of neurons that fire together and strengthen through co-activation -- learn the direction of causal influence between variables? While established as a computationally general substrate for classification, parsing, and planning, neural assemblies have not yet been shown to internalize ca…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

STARRY: Spatial-Temporal Action-Centric World Modeling for Robotic Manipulation

2026-04-29 · Yuxuan Tian, Yurun Jin, Bin Yu, Yukun Shi, Hao Wu, Chi Harold Liu, Kai Chen, Cong Huang

General AI

Robotic manipulation critically requires reasoning about future spatial-temporal interactions, yet existing VLA policies and world-model-enhanced policies do not fully model action-relevant spatial-temporal interaction structure. We propose STARRY, a world-model-enhanced action-generation policy that aligns spatial-tem…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Uncertainty-Aware Pedestrian Attribute Recognition via Evidential Deep Learning

2026-04-29 · Zhuofan Lou, Shihang Zhang, Fangle Zhu, Shengjie Ye, Pingyu Wang

General AI

We propose UAPAR, an Uncertainty-Aware Pedestrian Attribute Recognition framework. To the best of our knowledge, this is the first EDL-based uncertainty-aware framework for pedestrian attribute recognition (PAR). Unlike conventional deterministic methods, which fail to assess prediction reliability on low-quality sampl…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

An Empirical Evaluation of Code Smell Detection in Angular Applications

2026-04-30 · Maykon Nunes, Emanuel Coutinho, Carla Bezerra, Ivan Machado

General AI

Angular is one of the most widely adopted frameworks for developing large-scale, dynamic web applications. As projects increase in scope and complexity, developers face growing challenges in managing architecture and maintaining clean, modular code. These challenges often lead to design flaws, commonly referred to as c…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

Crab: A Semantics-Aware Checkpoint/Restore Runtime for Agent Sandboxes

2026-04-30 · Tianyuan Wu, Chaokun Chang, Lunxi Cao, Wei Gao, Wei Wang

General AI

Autonomous agents act through sandboxed containers and microVMs whose state spans filesystems, processes, and runtime artifacts. Checkpoint and restore (C/R) of this state is needed for fault tolerance, spot execution, RL rollout branching, and safe rollback-yet existing approaches fall into two extremes: application-l…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons

2026-04-30 · Kehong Gong, Zhengyu Wen, Dao Thien Phong, Mingxi Xu, Weixia He, Qi Wang, Ning Zhang, Zhengyu Li, Guanli Hou, Dongze Lian, Xiaoyu He, Mingyuan Zhang, Hanwang Zhang

General AI

Recent methods for arbitrary-skeleton motion capture from monocular video follow a factorized pipeline, where a Video-to-Pose network predicts joint positions and an analytical inverse-kinematics (IK) stage recovers joint rotations. While effective, this design is inherently limited, since joint positions do not fully …

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 4.3

OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction

2026-04-30 · Junyoung Lee, Sookwan Han, Jeonghwan Kim, Inhee Lee, Mingi Choi, Jisoo Kim, Wonjung Woo, Hanbyul Joo

General AI

Human-robot collaboration has been studied primarily in dyadic or sequential settings. However, real homes require multiadic collaboration, where multiple humans and robots share a workspace, acting concurrently on interleaved subtasks with tight spatial and temporal coupling. This regime remains underexplored because …

Review: pending
Role: unreviewed
Read: soon

Open source Details

Daily Archives

Research Workflow

Papers