Research Paper Cockpit

Daily Digest - 2026-07-03

Papers first seen in this daily snapshot.

Daily Archives

Quick jump into generated daily digests.

Research Workflow

Latest digest: 2026-07-04.

Papers

61 visible entries

huggingface Score 22.8

AgenticSTS: A Bounded-Memory Testbed for Long-Horizon LLM Agents

2026-07-02 · Xiangchen Cheng, Yunwei Jiang, Jianwen Sun, Zizhen Li, Chuanhao Li, Xiangcheng Cao, Yihao Liu, Fanrui Zhang, Li Jin, Kaipeng Zhang

General AI

Memory for a long-horizon LLM agent is a contract about what each future decision is allowed to see. The simplest contract appends past observations, tool calls, and reflections to every prompt, which makes prior context easy to access but also turns it into a jumbled mixture in which the effect of any single memory co…

Review
pending
Role
unreviewed
Read
now
huggingface Score 22.4

When Search Agents Should Ask: DiscoBench for Clarification-Aware Deep Search

2026-06-26 · Yiling Tao, Shihan Deng, Meiling Tao, Pengzhi Wei, Zhichao Hu, Zhihao Zhu

General AI

Search agents powered by large language models (LLMs) are increasingly used to solve complex information-seeking tasks, requiring multi-step retrieval and reasoning to fulfill user goals. However, existing benchmarks often assume that user queries are complete and explicit, overlooking the fact that real-world search r…

Review
pending
Role
unreviewed
Read
now
arxiv Score 22.3

Hidden Forgetting in Continual Multimodal Learning: When Accuracy Survives but Grounding Fails

2026-07-02 · Qianyu Chen, Canran Xiao, Runxuan Tang

Research Track A · General AI

Multimodal large language models must continually adapt to evolving tasks and domains, yet standard continual learning metrics mainly measure whether old answers remain correct, leaving the stability of multimodal grounding largely unexamined. We study this overlooked failure mode and ask whether a continually adapted …

Review
pending
Role
unreviewed
Read
now
huggingface Score 21.0

Breaking Failure Cascades: Step-Aware Reinforcement Learning for Medical Multimodal Reasoning

2026-06-30 · Junha Jung, Minbyul Jeong, Suhyeon Lim, Sungwook Jung, Jaehoon Yun, Taeyun Roh, Mujeen Sung, Jaewoo Kang

General AI

Recent multimodal large language models have shown great promise in clinical image reasoning, but existing post-training pipelines remain predominantly outcome-centric, relying on final answer correctness or sequence-level preferences. This suffers from sparse credit assignment, making it difficult to optimize the reas…

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.6

Visually Grounded Self-Reflection for Vision-Language Models via Reinforcement Learning

2026-07-02 · Liyan Tang, Fangcong Yin, Greg Durrett

General AI

Large vision-language models can reason over multimodal inputs by generating textual chains of thought (CoT). A key capability exhibited in CoT reasoning is self-reflection: revisiting earlier decisions and correcting previous errors. However, existing LVLMs often fail to properly attend to visual inputs during reflect…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.6

ReContext: Recursive Evidence Replay as LLM Harness for Long-Context Reasoning

2026-07-02 · Yanjun Zhao, Ruizhong Qiu, Tianxin Wei, Yuanchen Bei, Zhining Liu, Lingjie Chen, Ismini Lourentzou, Hanghang Tong, Jingrui He

General AI

Understanding and reasoning over long contexts has become a key requirement for deploying large language models (LLMs) in realistic applications. Although recent LLMs support increasingly long context windows, they often fail to use relevant evidence that is already present in the input, revealing a gap between context…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.8

Denser $\neq$ Better: Limits of On-Policy Self-Distillation for Continual Post-Training

2026-07-02 · Meng Wang, Haohan Zhao, Wenzhuo Liu, Lu Yang, Geng Liu, Haiyang Guo, Guo-Sen Xie, Gaofeng Meng, Hongbin Liu, Fei Zhu

Research Track A · General AI

Continual post-training enables foundation models to acquire new knowledge while preserving existing capabilities. Recent work suggests that on-policy learning can mitigate forgetting, with on-policy self-distillation emerging as a particularly attractive approach. In this work, we revisit this optimistic view through …

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.8

Geometry-Preserving Orthonormal Initialization for Low-Rank Adaptation in RLVR

2026-06-30 · Ruijia Zhang, Jiacheng Zhu, Hanqing Zhu, Laixi Shi

General AI

Low-rank adaptation (LoRA) and its variants enable parameter-efficient fine-tuning of large language models under the supervised fine-tuning (SFT) paradigm. However, their efficacy and behavior under Reinforcement learning with verifiable rewards (RLVR) are less well understood. In particular, two structurally initiali…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.6

DemoPSD: Disagreement-Modulated Policy Self-Distillation

2026-07-02 · Yunhe Li, Hao Shi, Wenhao Liu, Mengzhe Ruan, Hanxu Hou, Zhongxiang Dai, Shuang Qiu, Linqi Song

General AI

On-policy self-distillation (OPSD) has emerged as a practical method for training large language models (LLMs) to reason, where a single model acts as both the teacher and the student with different levels of information access. However, recent studies have found that the teacher's dense token-level supervision, condit…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.6

VisionAId: An Offline-First Multimodal Android Assistant for People with Visual Impairment, Featuring Personalized Object Retrieval

2026-07-02 · Cristian-Gabriel Florea, Stelian Spînu

General AI

Over 285 million people worldwide live with a visual impairment, for whom everyday tasks such as avoiding obstacles, locating personal belongings, recognizing familiar faces, or handling cash remain persistent obstacles to personal autonomy. Existing assistive applications are typically limited to recognizing predefine…

Review
pending
Role
unreviewed
Read
now
huggingface Score 16.8

PACE: A Proxy for Agentic Capability Evaluation

2026-07-02 · Yueqi Song, Lintang Sutawika, Jiarui Liu, Lindia Tjuatja, Jiayi Geng, Yunze Xiao, Daniel Lee, Aditya Bharat Soni, Vincent Lo, Xiang Yue, Graham Neubig

General AI

Evaluating LLM agents on benchmarks like SWE-Bench and GAIA can be expensive, time-consuming, and requires complex infrastructure. A single evaluation can cost thousands of dollars and take days to complete. In contrast, non-agentic LLM benchmarks that test individual capabilities (e.g., reasoning, code generation) are…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.6

Reasoning LLM Improves Speaker Recognition in Long-form TV Dramas

2026-07-02 · Yuxuan Li, Lingxi Xie, Xinyue Huo, Jihao Qiu, Jiacheng Shao, Pengfei Chen, Jiannan Ge, Kaiwen Duan, Qi Tian

General AI

Long-form TV dramas present a formidable challenge for comprehensive video understanding, where deciphering complex storyline often relies on \textbf{speaker recognition}, the task of accurately attributing each spoken utterance to its respective character. In this paper, we advance this field through two primary contr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.6

Seek to Segment: Active Perception for Panoramic Referring Segmentation

2026-07-02 · Song Tang, Shuming Hu, Xincheng Shuai, Henghui Ding, Yu-Gang Jiang

General AI

Existing referring segmentation models passively process static images captured from fixed perspectives, limiting their applicability in Embodied AI, where agents must perform active perception in the continuous 360$^\circ$ environments. To bridge this gap, we introduce a novel task: Active Panoramic Referring Segmenta…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.6

Will Scaling Improve Social Simulation with LLMs?

2026-07-02 · Caleb Ziems, William Held, Su Doga Karaca, David Grusky, Tatsunori Hashimoto, Diyi Yang

General AI

Large Language Model (LLM) social simulations are a promising research method, but they are not yet faithful enough to be adopted widely. In this work, we investigate whether the current scaling paradigm in language modeling is likely to close these gaps, or whether simulation fidelity is orthogonal to general capabili…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.8

ASPIRE: Agentic /Skills Discovery for Robotics

2026-06-30 · Runyu Lu, Yubo Wu, Ethan Kou, Letian Fu, Wenli Xiao, Ajay Mandlekar, Yinzhen Xu, Guanya Shi, Ken Goldberg, Ang Chen, Mosharaf Chowdhury, Yuke Zhu, Linxi "Jim" Fan, Guanzhi Wang

Research Track A · General AI

Traditional robot programming is challenging: it requires orchestrating multimodal perception, managing physical contact dynamics, and handling diverse configurations and execution failures. We introduce ASPIRE (Agentic Skill Programming through Iterative Robot Exploration), a continual learning system that autonomousl…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.8

EvoPolicyGym: Evaluating Autonomous Policy Evolution in Interactive Environments

2026-07-02 · Zhilin Wang, Han Song, Runzhe Zhan, Jusen Du, Jiacheng Chen, Tianle Li, Qingyu Yin, Yulun Wu, Zhennan Shen, Tong Zhu, Yanshu Li, Guanjie Chen, Derek F. Wong, Yafu Li, Yu Cheng, Yang Yang

General AI

Autonomous agents are increasingly expected to improve executable policies through feedback, yet existing evaluations often collapse this process into a final score or confound it with open-ended software-engineering progress. We introduce Autonomous Policy Evolution, a controlled evaluation setting in which a harness-…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.6

Multi-Head Recurrent Memory Agents

2026-07-01 · Jiatong Li, Samuel Yeh, Sharon Li

Research Track A · General AI

Recurrent memory agents extend LLMs to arbitrarily long contexts by iteratively consolidating input into a fixed-size memory window. Despite their scalability, these agents exhibit a well-documented reliability problem: end-to-end performance degrades systematically as context length grows. We diagnose this failure by …

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.6

EAGLE-360: Embodied Active Global-to-Local Exploration in 360$^\circ$

2026-07-02 · Jingtao Xu, Zizhuo Lin, Jianwen Sun, Yi Yang, Yawei Luo

General AI

While Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in standard visual understanding, adapting them for active visual search in 360$^\circ$ panoramic environments exposes fundamental limitations. Specifically, standard MLLMs struggle to effectively model inherent panoramic properti…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.6

Hawk: Harnessing Hardware-Aware Knowledge for High-Performance NPU Kernel Generation

2026-07-02 · Junyi Wen, Ruiyan Zhuang, Yongjia Xu, Pengtu Li, Rui Zou, Hongyi Chen, Chingman Wan, Puxu Yang, Wuhui Chen, Yanlin Wang

General AI

Developing high-performance kernels for Neural Processing Units (NPUs) is a critical industry bottleneck, requiring developers to manually navigate implicit hardware constraints and strict memory hierarchies. While large language models offer immense automation potential, they fail catastrophically on NPUs due to a fun…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.6

Learning to Evolve Scenes: Reasoning about Human Activities with Scene Graphs

2026-07-02 · Francesca Pistilli, Simone Alberto Peirone, Giuseppe Averta

General AI

Understanding human behavior while interacting with the surrounding world is crucial for many applications of embodied AI. First-person videos are particularly informative for this problem, as they well capture how activities reshape the scene over time. However, existing approaches often rely on implicit visual or lan…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.6

Many Voices, One Reward: Multi-Role Rubric Generation for LLM Judging and Reward Modeling

2026-07-02 · Dazhi Fu, Jiuding Yang, Yiwen Guo, Jicong Fan

General AI

Reliable reward and preference signals are critical for evaluating and optimizing large language models on open-ended tasks. Rubric-based judges offer a transparent way to decompose such judgments into explicit evaluation criteria, but existing annotation-free rubric generators typically rely on a single generic evalua…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

ISM:Self-Improving Strategy Memory for Continual Mathematical Reasoning

2026-06-30 · Prakhar Dixit, Tim Oates

Research Track A · General AI

We propose Intelligent Schema Memory (ISM), a self-evolving memory-augmented system that improves mathematical reasoning for a frozen LLM under continual learning with hard episodic resets. ISM maintains a compact, self-refined bank of strategy schemas learned from both successful and failed episodes, with symbolic too…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

Knowledge Distillation from Large Reasoning Models to Compact Student Models: A Case Study on the John O Bryan Mathematics Competition

2026-06-30 · Gaurab Baral, Aaditya Khanal, Yangyang Tao, Junxiu Zhou

General AI

This paper investigates knowledge distillation from a large reasoning model (DeepSeek-R1) to a compact student model (Qwen2.5-7B). Using historical problems from the John O'Bryan Mathematics Competition at Northern Kentucky University (2011-2025), we build a Chain-of-Thought (CoT) training corpus through a dual-agent f…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.6

AgentsCAD: Automated Design for Manufacturing of FDM Parts via Multi-Agent LLM Reasoning and Geometric Feature Recognition

2026-07-02 · Emmanuel George, Christopher Keefe, Peter Pak, Amir Barati Farimani

General AI

Parts manufactured with Fused Deposition Modeling (FDM) often require Design for Additive Manufacturing (DFAM) modifications to ensure printability, structural integrity, and reduced post-processing. Current slicers identify defects such as steep overhangs but are unable to modify the underlying geometry. This work pre…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.6

RetailSMV: Exocentric vs. Egocentric Adaptation of Foundation Video World Models in Retail

2026-07-01 · Amirreza Rouhi, Rajat Aggarwal, Parikshit Sakurikar, Anoop M. Namboodiri, Sashi P. Reddi

General AI

Foundation video diffusion models are increasingly viewed as world simulators for embodied agents, yet their pretraining on internet-scale generic video leaves them poorly aligned with real-world deployment domains. We study parameter-efficient adaptation of a pretrained foundation video world model to retail scenes: w…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.6

Alignment Is All You Need For X-to-4D Generation

2026-07-02 · Qiaowei Miao, Kehan Li, Yawei Luo, Yi Yang

General AI

Generative diffusion models excel at synthesizing high-quality images, videos, and 3D content under multimodal control. However, arbitrary user-defined modality-to-4D (X-to-4D) generation remains challenging due to the high cost of constructing diverse datasets and the limited scalability of existing methods. This pape…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.6

TestEvo-Bench: An Executable and Live Benchmark for Test and Code Co-Evolution

2026-07-02 · Jiale Amber Wang, Kaiyuan Wang, Pengyu Nie

General AI

Software tests and code evolve together: a code change should be followed by new or updated tests that record the new software behavior. Yet existing test generation and update benchmarks often isolate the test from the code change, and rely on static metadata that does not verify whether a test is executable or semant…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.5

CLIMB: Centroid-Based Hierarchical Memory for Online Continual Self-Supervised Learning

2026-06-30 · Julien Lefebvre, Stefan Duffner, Mathieu Lefort

Research Track A · General AI

Online Continual Self-Supervised Learning (OCSSL) aims to learn representations from a continuous stream of unlabeled data, without knowledge of task boundaries and under memory constraints. Existing methods rely either on replay buffers that exploit latent space structure, or on regularization alone. We present CLIMB …

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

Episodic-to-Semantic Consolidation Without Identity Drift

2026-07-02 · Xue Qin, Simin Luan, Cong Yang, Zhijun Li

Research Track A · General AI

Long-running adaptive intelligent agents face a structural tension between knowledge consolidation and information integrity. Memory consolidation is conventionally treated as an agent-changing operation: a model is fine-tuned, a prompt rewritten, a policy distilled, or a reflection appended to the context that governs…

Review
pending
Role
unreviewed
Read
now
huggingface Score 12.8

AnyGroundBench: A Specialized-Domain Benchmark for Video Grounding in Vision-Language Models

2026-07-02 · Rintaro Otsubo, Ryo Fujii, Reina Ishikawa, Taiki Kanaya, Kanta Sawafuji, Hiroki Kajita, Shigeki Sakai, Hideo Saito, Ryo Hachiuma

Research Track A · General AI

Vision-Language Models (VLMs) have demonstrated immense promise in Spatio-Temporal Video Grounding (STVG). However, current evaluation protocols are largely confined to zero-shot assessments on general, daily-life benchmarks. This creates a critical disconnect from real-world applications in specialized fields, where m…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.6

Bayesian Sparse Low-Rank Adaptation for Large Language Model Uncertainty Estimation

2026-07-02 · Jijie Zhang, Zhe Ren, Quan Zhang, Dandan Guo

General AI

Large language models (LLMs) exhibit remarkable reasoning capabilities, but their task-specific fine-tuning is notoriously plagued by overconfidence, severely hindering trustworthy deployment. We propose Data-Adaptive Lower-Rank Adaptation (DALorRA), a simple and effective variational Bayesian sparse framework that shi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.6

Program-as-Weights: A Programming Paradigm for Fuzzy Functions

2026-07-02 · Wentao Zhang, Liliana Hotsko, Woojeong Kim, Pengyu Nie, Stuart Shieber, Yuntian Deng

General AI

Many everyday programming tasks resist clean rule-based implementation, such as alerting on important log lines, repairing malformed JSON, or ranking search results by intent, and are increasingly outsourced to large language model APIs at the cost of locality, reproducibility, and price. We propose fuzzy-function prog…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.3

SAOT: Self-Supervised Continual Graph Learning with Structure-Aware Optimal Transport

2026-07-01 · Yuting Zhang, Yanbei Liu, Zhitao Xiao, Lei Geng, Yanwei Pang, Xiao Wang

Research Track A · General AI

Self-supervised Continual Graph Learning (CGL) aims to successively learn from a graph sequence with different tasks without label supervision - a paradigm that has attracted widespread attention. Most existing self-supervised CGL methods rely on instance-level consistency objectives that enforce stability of individua…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.6

Combating Textual Noise and Redundancy: Entropy-Aware Dense Visual Token Pruning

2026-07-02 · Xuehui Wang, Xuankun Yang, Wei Shen

General AI

Visual token pruning is a crucial strategy for accelerating VLMs by compressing redundant image patches, yet existing methods often fail to preserve critical cues under dense instructions and fine-grained queries. In this paper, we investigate this failure and identify two underlying bottlenecks: the widespread dispers…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.6

Controllable Sim Agents with Behavior Latents

2026-07-02 · Juanwu Lu, Junyu Zhu, Ziran Wang

General AI

Realistic traffic simulation requires agents that imitate logged behavior and can also be steered along interpretable axes. Such controllability enables engineers to isolate variables, reproduce specific edge cases, and test autonomous systems without real-world risk. We introduce Controllable Neural Variational Agents…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.6

Neuron-Aware Data Selection for Annotation-Free LLM Self-Distillation

2026-07-02 · Zhuowei Chen, Xiang Lorraine Li

General AI

Post-training large language models (LLMs) without real-world interaction feedback or human-labeled supervision remains challenging, particularly in specialized domains where expert annotations are costly to obtain. Recent annotation-free self-evolution methods address this by using the model's own outputs as supervisi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.6

What LLM Agents Say When No One Is Watching: Social Structure and Latent Objective Emergence in Multi-Agent Debates

2026-07-02 · Arman Ghaffarizadeh, Danyal Mohaddes, Aliakbar Izadkhah, Shahriar Noroozizadeh

General AI

LLM agents will increasingly act in socially structured settings where role, audience, and relational context can shape what is advantageous or costly to say. We study whether such social structure, without any explicit objective in the prompt, changes what an agent expresses publicly relative to an off-the-record (OTR…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.1

QFedAgent: Quantum-Enhanced Personalized Federated Learning for Multi-Agent Activity Recognition

2026-07-02 · Quoc Bao Phan, Tuy Tan Nguyen

General AI

Federated learning (FL) enables collaborative model training across distributed devices without sharing raw data, making it suitable for privacy-sensitive robotic sensing applications. However, multi-agent systems generate heterogeneous and non-independent and identically distributed (non-IID) multimodal sensor streams…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 11.0

Morphing into Hybrid Attention Models

2026-06-29 · Disen Lan, Jianbin Zheng, Yuxi Ren, Xin Xia, Xuanda Wang, Xuefeng Xiao, Xipeng Qiu, Yu Cheng

General AI

Hybrid attention models improve long-context efficiency by retaining only a subset of full-attention layers and replacing the remaining layers with linear attention. However, the effectiveness of Transformer-to-hybrid conversion critically depends on which layers preserve full attention. Existing hybrid layer selection…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.6

Distributed Attacks in Persistent-State AI Control

2026-07-02 · Josh Hills, Ida Caspary, Asa Cooper Stickland

General AI

As AI coding agents become more autonomous, they increasingly ship code iteratively, with the codebase persisting across sessions. This persistence creates a new attack surface: a misaligned or prompt-injected agent can distribute attacks across pull requests (PRs) and time its payload for the PR with the best natural …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.6

Human Capital, Not Model Benchmarks, Predicts Hybrid Intelligence in Forecasting

2026-07-02 · Vivienne Ming

General AI

Whether pairing people with AI helps or hurts is usually reported as a single average effect. Using a real-money prediction market (Polymarket) as an objective, externally resolved benchmark, this pilot shows that the value of human-AI collaboration depends on a specific, measurable form of human capital. Analyzed at t…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.6

LACUNA: A Testbed for Evaluating Localization Precision for LLM Unlearning

2026-07-02 · Matteo Boglioni, Thibault Rousset, Siva Reddy, Marius Mosbach, Verna Dankers

General AI

LLMs memorize sensitive training data, including personally identifiable information (PII), creating a pressing need for reliable post hoc removal methods. Unlearning has emerged as a promising solution, with state-of-the-art(SOTA) methods often following a localize-first, unlearn-second paradigm that targets specific …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.5

Scalable Behaviour Cloning on Browser Using via Skill Distillation

2026-06-30 · Kaisen Yang, Zheng Jiang, Yuzhao Peng, Houde Qian, Boshi Zhang, Youjie Zheng, Shijin Hong, Qingle Liu, Ruoyu Han, Bohan Lyu, Bingxiang He, Eren Cai, Calvin Xiao, Qinhuai Na

Research Track A · Research Track B · General AI

Internet users collectively perform an enormous range of skilled work through web browsers, from software development and document editing to search, forms, and enterprise workflows, making human browsing a highly scalable but under-exploited source of reusable browser skills. We argue that the bottleneck for browser a…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 10.0

Transferability for General Reasoning: An Automated Curriculum for Multi-Domain RLVR

2026-06-27 · Yongjin Yang, Jiarui Liu, Yinghui He, Lechen Zhang, Bernhard Schölkopf, Zhijing Jin

General AI

Reinforcement learning with verifiable rewards (RLVR) has been extended from single-domain training to multi-domain reasoning suites spanning mathematics, programming, and science. However, the training curriculum (how often each domain is sampled) is typically fixed or hand-tuned, even though reasoning skills transfer…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.8

Discrete Diffusion Language Models for Interactive Radiology Report Drafting

2026-07-01 · Max Van Puyvelde, Halil Ibrahim Gulluk, Wim Van Criekinge, Olivier Gevaert

General AI

Diffusion language models, which generate text by denoising a token canvas bidirectionally instead of emitting tokens left to right, have become competitive with autoregressive (AR) generation. Medical foundation models, however, remain almost entirely autoregressive. We adapt a mixture-of-experts diffusion language mo…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 9.8

SkillCoach: Self-Evolving Rubrics for Evaluating and Enhancing Agentic Skill-Use

2026-07-02 · Jiayin Zhu, Kelong Mao, Yudong Guo, Dengbo He, Sulong Xu, Simiu Gu, Yutao Yue

General AI

Skills are becoming a reusable operational layer for LLM agents, encoding SOPs, domain rules, tool workflows, scripts, and validation routines. In realistic skill repositories, overlapping skills make reliable skill-use difficult. Final verifier success is too coarse for both evaluation and training, since an agent may…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.6

Next-Generation Agentic Reinforcement Learning Systems Enable Self-Evolving Agents

2026-07-01 · Ran Yan, Wei Fu, Jiale Li, Shusheng Xu, Zhiyu Mei, Jiaxuan Gao, Jiarui Zhang, Wentai Zhang, Hao Dai, Xujie Shen, Chuyi He, Zhen Pu, Jun Mei, Zhiyao Lin, Haitao Wang, Zhiqiang Ding, Jiawei Zhang, Huaijie Wang, Ruida Xu, Honghua Dong, Youhe Jiang, Yi Wu, Tongkai Yang, Binhang Yuan

General AI

LLM agents are rapidly being deployed in production, including coding assistants, customer-support chatbots, and scientific research assistants, yet they remain fundamentally static in enterprise deployment. The LLM weights, system prompts, tool repertoires, and in-context harnesses are frozen at deployment time, and a…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.6

Language Models as Measurement Apparatus for Culture

2026-07-02 · Kent K. Chang

General AI

Language models are increasingly used to quantify cultural phenomena, but what makes such measurement distinctively cultural? This paper argues that NLP work on culture is a material-discursive practice: the apparatus -- model, data, annotation, evaluation -- participates in constituting the cultural reality it measure…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.6

Online Safety Monitoring for LLMs

2026-07-02 · Mona Schirmer, Metod Jazbec, Alexander Timans, Christian Naesseth, Maja Waldron, Eric Nalisnick

General AI

Despite alignment training, LLMs remain prone to generating unsafe outputs at deployment time. Monitoring outputs online and raising an alarm when safety can no longer be assumed is therefore critical. We study a simple real-time monitor that turns a verifier signal from an external model into an alarm decision by thre…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.6

Adoption and Ecosystem Health: A Longitudinal Analysis of Open-Source Multi-Agent Frameworks

2026-07-02 · Xi Zhang, Papi Menon, Vivian Chu, Koray Cosguner

General AI

Since ChatGPT's launch in November 2022, open-source agentic frameworks have proliferated, making framework selection important for engineering teams while obscured by popularity signals such as GitHub stars. This paper analyzes 15 major open-source AI agent framework repositories from late 2022 to early 2026, using 80…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.6

G-RRM: Guiding Symbolic Solvers with Recurrent Reasoning Models

2026-07-02 · Timo Bertram, Sidhant Bhavnani, Richard Freinschlag, Erich Kobler, Andreas Mayr, Günter Klambauer

General AI

In this work, we focus on SE-RRMs, a symbol-equivariant instantiation of RRMs that exhibits improved extrapolation to larger problem sizes. We propose a neuro-symbolic approach, ``Guiding with Recurrent Reasoning Models'' (G-RRM), which integrates SE-RRMs with symbolic solvers for constraint satisfaction problems. SE-R…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.6

Learning Spectral and Polarimetric Clues for One-to-Multimodal Novel View Synthesis

2026-07-02 · Federico Lincetto, Gianluca Agresti, Mattia Rossi, Piergiorgio Sartor, Pietro Zanuttigh

General AI

Neural rendering techniques allow for accurate reconstruction of the geometry and color appearance of 3D scenes. Some methods have extended their use to additional imaging modalities, such as multispectral, infrared, or polarimetric data. However, all of these approaches require expensive sensors and calibrated setups …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.8

From SRA to Self-Flow: Data Augmentation or Self-Supervision?

2026-07-02 · Dengyang Jiang, Mengmeng Wang, Harry Yang, Jingdong Wang

General AI

Representation alignment has become an effective way to accelerate diffusion transformer training and improve generation quality. Recent self-alignment methods, such as SRA and Self-Flow, further remove the dependency on external pretrained encoders by constructing alignment within the diffusion model itself. However, …

Review
pending
Role
unreviewed
Read
later
huggingface Score 6.8

Learning to Move Before Learning to Do: Task-Agnostic pretraining for VLAs

2026-07-02 · Junhao Shi, Siyin Wang, Xiaopeng Yu, Li Ji, Jingjing Gong, Xipeng Qiu

General AI

Vision-Language-Action (VLA) models are fundamentally bottlenecked by the scarcity of expert demonstrations -- triplets of observations, instructions, and actions that are costly to collect at scale. We argue that this bottleneck stems from conflating two distinct learning objectives: acquiring physical competence (how…

Review
pending
Role
unreviewed
Read
later
huggingface Score 6.8

Multi-Resolution Flow Matching: Training-Free Diffusion Acceleration via Staged Sampling

2026-07-02 · Xingyu Zheng, Xianglong Liu, Yifu Ding, Weilun Feng, Junqing Lin, Jinyang Guo, Haotong Qin

General AI

Hardware-agnostic strategies for accelerating text-to-image diffusion, such as timestep distillation and feature caching, can reduce inference time without custom kernels or system-level optimization. Among them, multi-resolution generation strategies have recently received broad attention, attaining more than 5x speed…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.6

Self-Organized Learning in Oscillatory Neural Networks with Memristive Signed Couplings

2026-07-01 · Riley Acker, Aman Desai, Garrett Kenyon, Frank Barrows

General AI

Oscillatory neural networks (ONNs) have emerged as a promising neuromorphic architecture, leveraging coupled dynamical systems to perform computation and represent information through phase relationships. Their interactions can be designed to support intrinsic energy-minimizing dynamics, enabling tasks such as associat…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.6

PointDiT: Pixel-Space Diffusion for Monocular Geometry Estimation

2026-07-02 · Haofei Xu, Rundi Wu, Philipp Henzler, Nikolai Kalischek, Michael Oechsle, Fabian Manhardt, Marc Pollefeys, Andreas Geiger, Federico Tombari, Michael Niemeyer

General AI

State-of-the-art single-image 3D reconstruction methods often rely on complex hybrid architectures and loss functions, or compress geometry into latent spaces in order to leverage pre-trained latent diffusion models. In this work, we show that such architectural overhead and intricate loss formulations are unnecessary.…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.6

When Do LLM Personas Support Visualization Design? A Cross-Model Study of Color Assignment and Chart Choice

2026-07-02 · Shahreen Salim, Klaus Mueller

General AI

Large language model personas are increasingly used to approximate diverse users during early-stage visualization design, but it remains unclear whether persona-conditioned outputs reflect stable personality effects or artifacts of model choice and task framing. We examine this question across two visualization-relevan…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.6

WorldDirector: Building Controllable World Simulators with Persistent Dynamic Memory

2026-07-02 · Hanlin Wang, Hao Ouyang, Qiuyu Wang, Wen Wang, Qingyan Bai, Ka Leong Cheng, Yue Yu, Yixuan Li, Yihao Meng, Zichen Liu, Yanhong Zeng, Yujun Shen, Qifeng Chen

General AI

We present WorldDirector, a highly controllable video world model framework designed for persistent dynamic object memory and unrestricted viewpoint exploration. Unlike existing world models that entangle physical dynamics with pixel rendering and rely on continuous visual observation to sustain motion, our framework e…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.8

FRAME: Learning the Adaptation Domain with a Mixture of Fractional-Fourier Experts

2026-06-30 · Tom Saliencro, Maya Lindqvist, Rohan Desai, Priya Nair, Daniel Whitmore

General AI

Parameter-efficient fine-tuning (PEFT) reparameterizes weight updates in a fixed basis: low-rank adapters operate in the spatial domain, while a recent line of spectral methods operates in a fixed Fourier domain. We argue that the choice of domain is itself a design degree of freedom that should be learned, and that no…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.8

Nonlinearity-Aware LoRA: Structured Gate Adaptation under Low-Rank Constraints

2026-06-30 · Shuai Yuan, Sudong Cai, Bingzhi Chen, Shuyuan Zheng, Chuan Xiao, Makoto Onizuka, Rui Mao

General AI

Low-rank adaptation (LoRA) is commonly viewed as an update-space approximation to full fine-tuning, yet this view is incomplete for self-gated Transformer feed-forward networks. In gated FFNs, a low-rank residual can change not only projected features but also the nonlinear selection weights that determine which channe…

Review
pending
Role
unreviewed
Read
later