Research Paper Cockpit

Daily Digest - 2026-05-05

Papers first seen in this daily snapshot.

Daily Archives

Quick jump into generated daily digests.

Research Workflow

Latest digest: 2026-05-13.

Papers

59 visible entries

arxiv Score 26.4

PRISM: Exposing and Resolving Spurious Isolation in Federated Multimodal Continual Learning

2026-05-01 · Beining Wu, Zihao Ding, Jun Huang

Research Track A · General AI

While current federated multimodal continual learning over mixture-of-experts low-rank adaptation (MoE-LoRA) is built on the unverified assumption that routing isolates task-specific knowledge into disjoint experts, we argue that routing operates per-sample, while forgetting accumulates across the task sequence, and gr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.2

Towards Multi-Agent Autonomous Reasoning in Hydrodynamics

2026-05-01 · Jinpai Zhao, Albert Cerrone, Joannes Westerink, Clint Dawson

General AI

Single-agent systems (SAS) have become the default pattern for LLM-driven scientific workflows, but routing planning, tool use, and synthesis through a single context window comes with a well-known cost: as tool specifications and observational traces accumulate, the effective context available for each decision shrink…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.2

FT-RAG: A Fine-grained Retrieval-Augmented Generation Framework for Complex Table Reasoning

2026-05-02 · Zebin Guo, Weidong Geng, Ruichen Mao

General AI

Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by grounding responses in external knowledge during inference. However, conventiona RAG systems under-perform on structured tabular data, largely due to coarse retrieval granularity and insufficient table semantic comprehension. To address these…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.4

Forager: a lightweight testbed for continual learning with partial observability in RL

2026-05-01 · Steven Tang, Xinze Xiong, Anna Hakhverdyan, Andrew Patterson, Jacob Adkins, Jiamin He, Esraa Elelimy, Parham Mohammad Panahi, Martha White, Adam White

Research Track A · General AI

In continual reinforcement learning (CRL), good performance requires never-ending learning, acting, and exploration in a big, partially observable world. Most CRL experiments have focused on loss of plasticity -- the inability to keep learning -- in one-off experiments where some unobservable non-stationarity is added …

Review
pending
Role
unreviewed
Read
now
huggingface Score 18.4

Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation

2026-05-01 · Ziwen Zhao, Menglin Yang

General AI

Retrieval-augmented generation (RAG) enhances large language models with external knowledge, and tree-based RAG organizes documents into hierarchical indexes to support queries at multiple granularities. However, existing Tree-RAG methods designed for single-document retrieval face critical challenges in scaling to cro…

Review
pending
Role
unreviewed
Read
now
huggingface Score 18.4

PhysicianBench: Evaluating LLM Agents in Real-World EHR Environments

2026-05-04 · Ruoqi Liu, Imran Q. Mohiuddin, Austin J. Schoeffler, Kavita Renduchintala, Ashwin Nayak, Prasantha L. Vemu, Shivam C. Vedak, Kameron C. Black, John L. Havlik, Isaac Ogunmola, Stephen P. Ma, Roopa Dhatt, Jonathan H. Chen

General AI

We introduce PhysicianBench, a benchmark for evaluating LLM agents on physician tasks grounded in real clinical setting within electronic health record (EHR) environments. Existing medical agent benchmarks primarily focus on static knowledge recall, single-step atomic actions, or action intent without verifiable execut…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.2

Enhanced LLM Reasoning by Optimizing Reward Functions with Search-Driven Reinforcement Learning

2026-05-03 · Arash Ahmadi, Sarah Sharif, Yaser, Banad

General AI

Mathematical reasoning is a key benchmark for large language models. Reinforcement learning is a standard post-training mechanism for improving the reasoning capabilities of large language models, yet performance remains sensitive to the design of the reward function that drives policy optimization. This paper introduc…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.2

Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces

2026-05-04 · Chenchen Zhang

General AI

As large language model (LLM) agents evolve from isolated tool users into coordinated teams, reinforcement learning (RL) must optimize not only individual actions but also how work is spawned, delegated, communicated, aggregated, and stopped. This paper studies RL for LLM-based multi-agent systems through orchestration…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.9

MPCS: Neuroplastic Continual Learning via Multi-Component Plasticity and Topology-Aware EWC

2026-05-04 · Joern Hentsch

Research Track A · General AI

Continual learning systems face a fundamental tension between plasticity -- acquiring new knowledge -- and stability -- retaining prior knowledge. We introduce MPCS (Multi-Plasticity Continual System), a neuroplastic architecture that integrates eleven complementary mechanisms: task-driven neurogenesis, Fourier-encoded…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.4

Sentinel-VLA: A Metacognitive VLA Model with Active Status Monitoring for Dynamic Reasoning and Error Recovery

2026-05-02 · Wenhao Li, Xiu Su, Yichao Cao, Hongyan Xu, Xiaobo Xia, Shan You, Yi Chen, Chang Xu

Research Track A · General AI

Vision-language-action (VLA) models have advanced the field of embodied manipulation by harnessing broad world knowledge and strong generalization. However, current VLA models still face several key challenges, including limited reasoning capability, lack of status monitoring, and difficulty in self-correction. In this…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.9

Sequential Learning and Catastrophic Forgetting in Differentiable Resistor Networks

2026-05-02 · Maniru Ibrahim

Research Track A

Differentiable physical networks provide a simple setting in which learning can be studied through the interaction between trainable parameters and physical equilibrium constraints. We investigate sequential learning in differentiable resistor networks governed by Kirchhoff's laws. Although individual input--output map…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.4

T^2PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning

2026-05-04 · Haixin Wang, Hejie Cui, Chenwei Zhang, Xin Liu, Shuowei Jin, Shijie Geng, Xinyang Zhang, Nasser Zalmout, Zhenyu Shi, Yizhou Sun

General AI

Recent progress in multi-turn reinforcement learning (RL) has significantly improved reasoning LLMs' performances on complex interactive tasks. Despite advances in stabilization techniques such as fine-grained credit assignment and trajectory filtering, instability remains pervasive and often leads to training collapse…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.2

Semantic Risk-Aware Heuristic Planning for Robotic Navigation in Dynamic Environments: An LLM-Inspired Approach

2026-05-04 · Hamza Ahmed Durrani, Rafay Suleman Durrani

General AI

The integration of Large Language Model (LLM) reasoning principles into classical robot path planning represents a rapidly emerging research direction. In this paper, we propose a Semantic Risk-Aware Heuristic (SRAH) planner that encodes LLM-inspired cost functions penalising geometrically cluttered or high-risk zones …

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.2

Tool Use as Action: Towards Agentic Control in Mobile Core Networks

2026-05-04 · Purna Sai Garigipati, Onur Ayan, Kishor Chandra Joshi, Xueli An

General AI

Artificial Intelligence (AI) will play an essential role in 6G. It will fundamentally reshape the network architecture itself and drive major changes in the design of network entities, interfaces, and procedures. The adoption of agentic AI in next-generation networks is expected to enhance network intelligence and auto…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.0

OceanPile: A Large-Scale Multimodal Ocean Corpus for Foundation Models

2026-04-25 · Yida Xue, Ningyu Zhang, Tingwei Wu, Zhe Ma, Daxiong Ji, Zhao Wang, Guozhou Zheng, Huajun Chen

General AI

The vast and underexplored ocean plays a critical role in regulating global climate and supporting marine biodiversity, yet artificial intelligence has so far delivered limited impact in this domain due to a fundamental data bottleneck. Specifically, ocean data are highly fragmented across disparate sources and inheren…

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.4

AcademiClaw: When Students Set Challenges for AI Agents

2026-05-04 · Junjie Yu, Pengrui Lu, Weiye Si, Hongliang Lu, Jiabao Wu, Kaiwen Tao, Kun Wang, Lingyu Yang, Qiran Zhang, Xiuting Guo, Xuanyu Wang, Yang Wang, Yanjie Wang, Yi Yang, Zijian Hu, Ziyi Yang, Zonghan Zhou, Binghao Qiang, Borui Zhang, Chenning Li, Enchang Zhang, Feifan Chen, Feng Jian, Fengyin Sun, Hao Qiu, Hao Zheng, Haoran Zhu, Hongyu Liu, Jianbin Deng, Jiaxin Song, Jiaying Chi, Jiayou Shi, Jie Fang, Jinghui Zhong, Jingyu Zhou, Jinze Li, Junfeng Yi, Junyan Yu, Junzhi Xue, Ni Song, Pengyi Chen, Qi Chen, Quansheng Li, Rui Tao, Shenghai Gong, Shenhang Lu, Tianqi Shen, Tianxiang Zhu, Tiehan Kang, Tingyu Li, Wendi Wu, Xiao Shen, Xiao Zhou, Xiaotao Zhang, Xinrong Li, Xuankun Yang, Xun Zhang, Yan Li, Ye Lu, Yi Wang, Yibo Zhou, Yichi Zhang, Yihao Sun, Yijun Huang, Yixin Zhu, Yixuan Wu, Yuchen Sun, Yue Wu, Yuheng Sun, Yukun Li, Yutian Tu, Yuxuan Qin, Yuzhuo Wu, Zeyu Li, Zhengyu Lou, Zhenning Ran, Zizhu He, Pengfei Liu

General AI

Benchmarks within the OpenClaw ecosystem have thus far evaluated exclusively assistant-level tasks, leaving the academic-level capabilities of OpenClaw largely unexamined. We introduce AcademiClaw, a bilingual benchmark of 80 complex, long-horizon tasks sourced directly from university students' real academic workflows…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.4

Automated In-the-Wild Data Collection for Continual AI Generated Image Detection

2026-05-04 · Thanasis Pantsios, Dimitrios Karageorgiou, Christos Koutlis, George Karantaidis, Olga Papadopoulou, Symeon Papadopoulos

Research Track A · General AI

The rapid advancement of generative Artificial Intelligence (AI) has introduced significant challenges for reliable AI-generated image detection. Existing detectors often suffer from performance degradation under distribution shifts and when encountering newly emerging generative models. In this work, we propose a data…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.2

AI-Generated Smells: An Analysis of Code and Architecture in LLM and Agent-Driven Development

2026-05-04 · Yuecai Zhu, Nikolaos Tsantalis, Peter C. Rigby

General AI

The promise of Large Language Models in automated software engineering is often measured by functional correctness, overlooking the critical issue of long term maintainability. This paper presents a systematic audit of technical debt in AI-generated software, revealing that AI does not eliminate flaws but rather introd…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.2

EvoPoC: Automated Exploit Synthesis for DeFi Smart Contracts via Hierarchical Knowledge Graphs

2026-05-04 · Ruichao Liang, Jing Chen, Xianglong Li, Huangpeng Gu, Yebo Feng, Yue Xue, Cong Wu, Yang Liu

General AI

Smart contract vulnerabilities in Decentralized Finance caused over billions of dollars losses every year, yet the security community faces a critical bottleneck: identifying a vulnerability is not the same as proving it is exploitable. Manual PoC construction is prohibitively labor-intensive, leaving most disclosed vu…

Review
pending
Role
unreviewed
Read
now
huggingface Score 13.4

Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning

2026-05-01 · Chengshuai Shi, Wenzhe Li, Xinran Liang, Yizhou Lu, Wenjia Yang, Ruirong Feng, Seth Karten, Ziran Yang, Zihan Ding, Gabriel Sarch, Danqi Chen, Karthik Narasimhan, Chi Jin

General AI

Given the rapidly growing capabilities of vision-language models (VLMs), extending them to interactive decision-making tasks such as video games has emerged as a promising frontier. However, existing approaches either rely on large-scale supervised fine-tuning (SFT) on human trajectories or apply reinforcement learning…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.2

Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detection

2026-05-04 · Mohamad Khajezade, Fatemeh H. Fard, Mohamed Sami Shehata

General AI

Cross-language code clone detection (X-CCD) is challenging because semantically equivalent programs written in different languages often share little surface similarity. Although large language models (LLMs) have shown promise for semantic clone detection, their use as black-box systems raises concerns about cost, repr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.2

Visual Latents Know More Than They Say: Unsilencing Latent Reasoning in MLLMs

2026-05-04 · Xin Zhang, Qiqi Tao, Jiawei Du, Moyun Liu, Joey Tianyi Zhou

General AI

Continuous latent-space reasoning offers a compact alternative to textual chain-of-thought for multimodal models, enabling high-dimensional visual evidence to be integrated without explicit reasoning tokens. However, we identify a previously overlooked optimization pathology in existing latent visual reasoning methods:…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.4

Autonomous Drift Learning in Data Streams: A Unified Perspective

2026-05-02 · Xiaoyu Yang, En Yu, Jie Lu

Research Track A

In the pursuit of autonomous learning systems, the foundational assumption of stationarity, the premise that data distributions and model behaviors remain constant, is fundamentally untenable. Historically, the research community has addressed non-stationary environments almost exclusively under the scope of concept dr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.2

SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering

2026-05-04 · Jiujiu Chen, Yazheng Liu, Sihong Xie, Hui Xiong

General AI

Large language models excel at complex reasoning, yet evaluating their intermediate steps remains challenging. Although process reward models provide step-wise supervision, they often suffer from a risk compensation effect, where incorrect steps are offset by later correct ones, assigning high rewards to flawed reasoni…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.4

A Compound AI Agent for Conversational Grant Discovery

2026-05-04 · Zhisheng Tang, Mayank Kejriwal

Research Track B · General AI

Research funding discovery remains fundamentally fragmented: researchers navigate disparate agency portals (e.g., in the United States, NSF, NIH, DARPA, Grants.gov, and many others) with heterogeneous interfaces, search capabilities, and data schemas. We present a compound AI system that unifies this landscape through …

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.2

CellxPert: Inference-Time MCMC Steering of a Multi-Omics Single-Cell Foundation Model for In-Silico Perturbation

2026-04-30 · Andac Demir, Erik W. Anderson, Jeremy L. Jenkins, Srayanta Mukherjee

General AI

In this work, we introduce CellxPert, a scalable multimodal foundation model that unifies single-cell and spatial multi-omics within a common representation space. CellxPert jointly encodes transcriptomic (scRNA-seq), chromatin-accessibility (ATAC-seq), and surface-proteomic (CITE-seq) measurements, while directly inco…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 11.2

GIFT: Guided Fine-Tuning and Transfer for Enhancing Instruction-Tuned Language Models

2026-05-02 · Zhiwen Ruan, Yichao Du, Jianjie Zheng, Longyue Wang, Yun Chen, Peng Li, Jinsong Su, Yang Liu, Guanhua Chen

General AI

A promising paradigm for adapting instruction-tuned language models is to learn task-specific updates on a pretrained base model and subsequently merge them into the instruction-tuned model. However, existing approaches typically treat the instruction-tuned model as a passive target that is only involved at the final m…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.2

Flexi-LoRA with Input-Adaptive Ranks: Efficient Finetuning for Speech and Reasoning Tasks

2026-05-03 · Zongqian Li, Yixuan Su, Han Zhou, Zihao Fu, Nigel Collier

General AI

Parameter-efficient fine-tuning methods like Low-Rank Adaptation (LoRA) have become essential for deploying large language models, yet their static parameter allocation remains suboptimal for inputs of varying complexity. We present Flexi-LoRA, a novel framework that dynamically adjusts LoRA ranks based on input comple…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.2

TrajRAG: Retrieving Geometric-Semantic Experience for Zero-Shot Object Navigation

2026-05-03 · Yiyao Wang, Sixian Zhang, Keming Zhang, Xinhang Song, Songjie Du, Shuqiang Jiang

General AI

Existing zero-shot Object Goal Navigation (ObjectNav) methods often exploit commonsense knowledge from large language or vision-language models to guide navigation. However, such knowledge arises from internet-scale text rather than embodied 3D experience, and episodic observations collected during navigation are typic…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.2

AlbumFill: Album-Guided Reasoning and Retrieval for Personalized Image Completion

2026-05-04 · Yu-Ju Tsai, Brian Price, Qing Liu, Luis Figueroa, Daniil Pakhomov, Zhihong Ding, Scott Cohen, Ming-Hsuan Yang

General AI

Personalized image completion aims to restore occluded regions in personal photos while preserving identity and appearance. Existing methods either rely on generic inpainting models that often fail to maintain identity consistency, or assume that suitable reference images are explicitly provided. In practice, suitable …

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.2

Bolek: A Multimodal Language Model for Molecular Reasoning

2026-05-04 · Frederic Grabowski, Jacek Szczerbiński, Maciej Jaśkowski, Kalina Jasińska-Kobus, Paweł Dąbrowski-Tumański, Tomasz Jetka, Bartosz Topolski

General AI

Molecular property models increasingly support high-stakes drug-discovery decisions, but their outputs are often difficult to audit: classical predictors return scores without rationale, while language models can produce fluent explanations weakly grounded in the input molecule. We introduce Bolek, a compact multimodal…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.2

DynoSLAM: Dynamic SLAM with Generative Graph Neural Networks for Real-World Social Navigation

2026-05-04 · Danil Tokhchukov, Veronika Morozova, Gonzalo Ferrer

General AI

Traditional Simultaneous Localization and Mapping (SLAM) algorithms rely heavily on the static environment assumption, which severely limits their applicability in real-world spaces populated by moving entities, such as pedestrians. In this work, we propose DynoSLAM, a tightly-coupled Dynamic GraphSLAM architecture tha…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.2

Autonomous LLM Agent Worms: Cross-Platform Propagation, Automated Discovery and Temporal Re-Entry Defense

2026-05-04 · Mingming Zha, Xiaofeng Wang

General AI

Autonomous LLM agents operate as long-running processes with persistent workspaces, memory files, scheduled task state, and messaging integrations. These features create a new propagation risk: attacker-influenced content can be written into persistent agent state, re-enter the LLM decision context through scheduled au…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.2

FlexSQL: Flexible Exploration and Execution Make Better Text-to-SQL Agents

2026-05-04 · Quang Hieu Pham, Yang He, Ping Nie, Canwen Xu, Davood Rafiei, Yuepeng Wang, Xi Ye, Jocelyn Qiaochu Chen

General AI

Text-to-SQL over large analytical databases requires navigating complex schemas, resolving ambiguous queries, and grounding decisions in actual data. Most current systems follow a fixed pipeline where schema elements are retrieved once upfront and the database is only revisited for post-hoc repair, limiting recovery fr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.2

OphMAE: Bridging Volumetric and Planar Imaging with a Foundation Model for Adaptive Ophthalmological Diagnosis

2026-05-04 · Tienyu Chang, Zhen Chen, Renjie Liang, Jinyu Ding, Jie Xu, Sunu Mathew, Amir Reza Hajrasouliha, Andrew J. Saykin, Ruogu Fang, Yu Huang, Jiang Bian, Qingyu Chen

General AI

The advent of foundation models has heralded a new era in medical artificial intelligence (AI), enabling the extraction of generalizable representations from large-scale unlabeled datasets. However, current ophthalmic AI paradigms are predominantly constrained to single-modality inference, thereby creating a dissonance…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.2

TRACE: Temporal Reasoning over Context and Evidence for Activity Recognition in Smart Homes

2026-05-04 · Yingtian Shi, Abivishaq Balasubramanian, Jessica Herring, Jiachen Li, Juan Macias Romero, Rosemarie Santa Gonzalez, Varun Mishra, Agata Rozga, Xiang Zhi Tan, Thomas Plötz

General AI

Human activity recognition (HAR) in smart homes remains challenging because many daily activities exhibit similar local sensor patterns, while minimally intrusive sensing provides sparse and ambiguous observations. As a result, methods based on short temporal or event windows often fail to capture the broader temporal …

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.2

When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition

2026-05-04 · Pehuén Moure, Niclas Pokel, Bilal Bounajma, Yingqiang Gao, Roman Boehringer, Longbiao Cheng, Shih-Chii Liu

General AI

Automatic speech recognition (ASR) systems remain brittle on dysarthric and other atypical speech. Recent audio-language models raise the possibility of improving performance by conditioning on additional clinical context at inference time, but it is unclear whether these models can make use of such information. We int…

Review
pending
Role
unreviewed
Read
now
huggingface Score 9.4

Perceptual Flow Network for Visually Grounded Reasoning

2026-05-04 · Yangfu Li, Yuning Gong, Hongjian Zhan, Teng Li, Yuanhuiyi Lyu, Tianyi Chen, Qi Liu, Ziyuan Huang, Zhihang Zhong, Dandan Zheng, Yue Lu

General AI

Despite the success of Large-Vision Language Models (LVLMs), general optimization objectives (e.g., standard MLE) fail to constrain visual trajectories, leading to language bias and hallucination. To mitigate this, current methods introduce geometric priors from visual experts as additional supervision. However, we obs…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.2

Enhancing RL Generalizability in Robotics through SHAP Analysis of Algorithms and Hyperparameters

2026-05-04 · Lingxiao Kong, Cong Yang, Oya Deniz Beyan, Zeyd Boukhers

General AI

Despite significant advances in Reinforcement Learning (RL), model performance remains highly sensitive to algorithm and hyperparameter configurations, while generalization gaps across environments complicate real-world deployment. Although prior work has studied RL generalization, the relative contribution of specific…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.2

HAAS: A Policy-Aware Framework for Adaptive Task Allocation Between Humans and Artificial Intelligence Systems

2026-05-04 · Vicente Pelechanoa, Antoni Mestre, Manoli Albert, Miriam Gil

General AI

Deciding how to distribute work between humans and AI systems is a central challenge in organisational design. Most approaches treat this as a binary choice, yet the operational reality is richer: humans and AI routinely share tasks or take complementary roles depending on context, fatigue, and the stakes involved. Gov…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.2

MolmoAct2: Action Reasoning Models for Real-world Deployment

2026-05-04 · Haoquan Fang, Jiafei Duan, Donovan Clay, Sam Wang, Shuo Liu, Weikai Huang, Xiang Fan, Wei-Chuan Tsai, Shirui Chen, Yi Ru Wang, Shanli Xing, Jaemin Cho, Jae Sung Park, Ainaz Eftekhar, Peter Sushko, Karen Farley, Angad Wadhwa, Cole Harrison, Winson Han, Ying-Chun Lee, Eli VanderBilt, Rose Hendrix, Suveen Ellawela, Lucas Ngoo, Joyce Chai, Zhongzheng Ren, Ali Farhadi, Dieter Fox, Ranjay Krishna

General AI

Vision-Language-Action (VLA) models aim to provide a single generalist controller for robots, but today's systems fall short on the criteria that matter for real-world deployment. Frontier models are closed, open-weight alternatives are tied to expensive hardware, reasoning-augmented policies pay prohibitive latency fo…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.2

Virtual Scanning for NSCLC Histology: Investigating the Discriminatory Power of Synthetic PET

2026-05-04 · Fatih Aksu, Laura Ciuffetti, Francesco Di Feola, Filippo Ruffini, Giulia Romoli, Fabrizia Gelardi, Arturo Chiti, Valerio Guarrasi, Paolo Soda

General AI

Accurate histological differentiation between adenocarcinoma (ADC) and squamous cell carcinoma (SCC) is critical for personalized treatment in non-small cell lung cancer (NSCLC). While [$^{18}$F]FDG PET/CT is a standard tool for the clinical evaluation of lung cancer, its utility is often limited by high costs and radi…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.4

Code World Model Preparedness Report

2026-05-01 · Daniel Song, Peter Ney, Cristina Menghini, Faizan Ahmad, Aidan Boyd, Nathaniel Li, Ziwen Han, Jean-Christophe Testud, Saisuke Okabayashi, Maeve Ryan, Jinpeng Miao, Hamza Kwisaba, Felix Binder, Spencer Whitman, Jim Gust, Esteban Arcaute, Dhaval Kapil, Jacob Kahn, Ayaz Minhas, Tristan Goodman, Lauren Deason, Alexander Vaughan, Shengjia Zhao, Summer Yue

General AI

This report documents the preparedness assessment of Code World Model (CWM), a model for code generation and reasoning about code from Meta. We conducted pre-release testing across domains identified in our Frontier AI Framework as potentially presenting catastrophic risks, and also evaluated the model's misaligned pro…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.4

From Context to Skills: Can Language Models Learn from Context Skillfully?

2026-05-03 · Shuzheng Si, Haozhe Zhao, Yu Lei, Qingyi Wang, Dingwei Chen, Zhitong Wang, Zhenhailong Wang, Kangyang Luo, Zheng Wang, Gang Chen, Fanchao Qi, Minjia Zhang, Maosong Sun

General AI

Many real-world tasks require language models (LMs) to reason over complex contexts that exceed their parametric knowledge. This calls for context learning, where LMs directly learn relevant knowledge from the given context. An intuitive solution is inference-time skill augmentation: extracting the rules and procedures…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.2

SCALE-LoRA: Auditing Post-Retrieval LoRA Composition with Residual Merging and View Reliability

2026-05-02 · Shuaipeng Zhou, Yu Zhang

General AI

Libraries of Low-Rank Adaptation (LoRA) adapters are becoming a practical by-product of parameter-efficient adaptation. Once such adapters accumulate, a natural question is no longer how to train one adapter for one task, but how to reuse an open pool of adapters for a new task given only a small support set. Prior wor…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.2

FunFuzz: An LLM-Powered Evolutionary Fuzzing Framework

2026-05-04 · Mario Rodríguez Béjar, B. Romera-Paredes, Jose L. Hernández-Ramos

General AI

Modern fuzzers increasingly use Large Language Models (LLMs) to generate structured inputs, but LLM-driven fuzzing is sensitive to prompt initialization and sampling variance, which can reduce exploration efficiency and lead to redundant inputs. We present FunFuzz, a multi-island evolutionary fuzzing framework that run…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.2

SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection

2026-05-04 · Shikhar Shukla

General AI

Speculative decoding accelerates large language model (LLM) inference by using a small draft model to propose candidate tokens that a larger target model verifies. A critical hyperparameter in this process is the speculation length~$γ$, which determines how many tokens the draft model proposes per step. Nearly all exis…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 7.4

Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling

2026-04-30 · Ansar Aynetdinov, Patrick Haller, Alan Akbik

General AI

Recent research has shown that filtering massive English web corpora into high-quality subsets significantly improves training efficiency. However, for high-resource non-English languages like German, French, or Japanese, aggressive filtering creates a strategic dilemma: should practitioners prioritize diversity by tra…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 7.4

Generative Modeling with Orbit-Space Particle Flow Matching

2026-05-04 · Sinan Wang, Jinjin He, Shenyifan Lu, Ruicheng Wang, Greg Turk, Bo Zhu

General AI

We present Orbit-Space Geometric Probability Paths (OGPP), a particle-native flow-matching framework for generative modeling of particle systems. OGPP is motivated by two insights: (i) particles are defined up to permutation symmetries, so anonymous indexing inflates per-index target variance and yields curved, hard-to…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.2

ProPACT: A Proactive AI-Driven Adaptive Collaborative Tutor for Pair Programming

2026-05-04 · Anahita Golrang, Kshitij Sharma, olga viberg

General AI

Effective pair programming depends on coordination of attention, cognitive effort, and joint regulation over time, yet most adaptive learning systems remain individual-centric and reactive. This paper introduces ProPACT, a proactive AI-driven adaptive collaborative tutor that treats collaboration itself as the object o…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.4

MASCing: Configurable Mixture-of-Experts Behavior via Activation Steering Masks

2026-04-30 · Jona te Lintelo, Lichao Wu, Marina Krček, Sengim Karayalçin, Stjepan Picek

General AI

Mixture-of-Experts (MoE) architectures in Large Language Models (LLMs) have significantly reduced inference costs through sparse activation. However, this sparse activation paradigm also introduces new safety challenges. Since only a subset of experts is engaged for each input, model behavior becomes coupled to routing…

Review
pending
Role
unreviewed
Read
later
huggingface Score 6.4

Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance

2026-05-01 · Minchan Kwon, Sunghyun Baek, Minseo Kim, Jaemyung Yu, Dongyoon Han, Junmo Kim

General AI

Large Language Model (LLM) Red-Teaming, which proactively identifies vulnerabilities of LLMs, is an essential process for ensuring safety. Finding effective and diverse attacks in red-teaming is important, but achieving both is challenging. Generative Flow Networks (GFNs) that perform distribution matching are a promis…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.2

SignVerse-2M: A Two-Million-Clip Pose-Native Universe of 25+ Sign Languages

2026-05-03 · Sen Fang, Hongbin Zhong, Yanxin Zhang, Dimitris N. Metaxas

General AI

Existing large-scale sign language resources typically provide supervision only at the level of raw video-text alignment and are often produced in laboratory settings. While such resources are important for semantic understanding, they do not directly provide a unified interface for open-world recognition and translati…

Review
pending
Role
unreviewed
Read
later
huggingface Score 5.4

ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models

2026-04-29 · Rui Xu, Jiepeng Wang, Hao Pan, Yang Liu, Xin Tong, Shiqing Xin, Changhe Tu, Taku Komura, Wenping Wang

General AI

In this paper, we study an under-explored but important factor of diffusion generative models, i.e., the combinatorial complexity. Data samples are generally high-dimensional, and for various structured generation tasks, additional attributes are combined to associate with data samples. We show that the space spanned b…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

Equilibrium Stability and Uniqueness with a Large Number of Commodities and Patient Consumers

2026-05-04 · Xinyang Wang

General AI

We show that a large effective number of commodities can be a source of equilibrium stability and uniqueness: expanding substitution opportunities strengthens aggregate substitution effects. We study finite dated-commodity exchange economies obtained by truncating a countably infinite-horizon environment with discounte…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

Private Speech Classification without Collapse: Stabilized DP Training and Offline Distillation

2026-05-04 · Yadi Wen, Tianxin Li, Enji Liang, Rong Du, Yue Fu

General AI

We study example-level private supervised speech classification under a practical release constraint: training may access privileged side information, but the released model must be audio-only. This setting is important because speech systems can often exploit richer side information during development, whereas deploym…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

Uncountably many conditionally inaccessible decisions exist in every finite probability space

2026-05-04 · Zalán Gyenis, Miklós Rédei, Leszek Wroński

General AI

In a recent paper \cite{Redei-Jing2026} the notion of conditional $p$-inaccessibility of a decision based on utility maximization was defined and examples of conditionally $p$-inaccessible decisions were given. The conditional inaccessibility of a decision based on maximizing utility calculated by a probability measure…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

When Is the Same Model Not the Same Service? A Measurement Study of Hosted Open-Weight LLM APIs

2026-05-04 · Haorui Li, Zhenghui He, Xuanzi Liu, Yang Xu, Dongsheng Liu, Jiakang Ma, Lupan Wu, Yangjie Wu, Xiongchao Tang, Tianhui Shi

General AI

Open-weight large language models (LLMs) are often described as downloadable model artifacts, but in production they are increasingly consumed as hosted APIs. This paper studies the intermediary service layer that turns a model release into an operational endpoint. Using sampled request logs, provider metadata, compati…

Review
pending
Role
unreviewed
Read
later
huggingface Score 5.0

Soft Anisotropic Diagrams for Differentiable Image Representation

2026-04-27 · Laki Iinbor, Zhiyang Dou, Wojciech Matusik

General AI

We introduce Soft Anisotropic Diagrams (SAD), an explicit and differentiable image representation parameterized by a set of adaptive sites in the image plane. In SAD, each site specifies an anisotropic metric and an additively weighted distance score, and we compute pixel colors as a softmax blend over a small per-pixe…

Review
pending
Role
unreviewed
Read
later