Research Paper Cockpit

Daily Digest - 2026-06-13

Papers first seen in this daily snapshot.

Daily Archives

Quick jump into generated daily digests.

Research Workflow

Latest digest: 2026-07-04.

Papers

56 visible entries

arxiv Score 22.3

MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning

2026-06-10 · Shang Ma, Jisheng Dang, Wencan Zhang, Yifan Zhang, Bimei Wang, Hong Peng, Bin Hu, Qi Tian, Tat-Seng Chua

General AI

We propose a multi-agent collaborative framework built upon a lightweight Multimodal Large Language Model (MLLM), specifically designed for social intelligence reasoning. A key feature of our approach is that both the training and inference phases are augmented via knowledge distillation. Within this architecture, mult…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.3

ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages

2026-06-11 · Tanmoy Kanti Halder, Akash Ghosh, Subhadip Baidya, Arijit Roy, Sriparna Saha

General AI

Multimodal Large Language Models (MLLMs) have shown promising reasoning capabilities in general domains, yet their performance remains limited in specialized settings such as healthcare, especially in multilingual and low-resource scenarios. This gap is critical in regions like rural India, where patients often express…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.3

MiniMax Sparse Attention

2026-06-11 · Xunhao Lai, Weiqi Xu, Yufeng Yang, Qiaorui Chen, Yang Xu, Lunbin Zeng, Xiaolong Li, Haohai Sun, Haichao Zhu, Vito Zhang, Pengyu Zhao

General AI

Ultra-long-context capability is becoming indispensable for frontier LLMs: agentic workflows, repository-scale code reasoning, and persistent memory all require the model to jointly attend over hundreds of thousands to millions of tokens, yet the quadratic cost of softmax attention makes this untenable at deployment sc…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.3

Reward Modeling for Multi-Agent Orchestration

2026-06-11 · King Yeung Tsang, Zihao Zhao, Vishal Venkataramani, Haizhou Shi, Zixuan Ke, Semih Yavuz, Shafiq Joty, Hao Wang

General AI

Multi-Agent Systems (MAS) built on Large Language Models (LLMs) require effective orchestration to coordinate specialized agents, yet training such orchestrators is hindered by limited supervision and high computational cost. We propose Orchestration Reward Modeling (OrchRM), a self-supervised framework for evaluating …

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.0

Learning What to Remember: A Cognitively Grounded Multi-Factor Value Model for Agentic Memory

2026-06-11 · Zhibao Chen, Qian Cheng

Research Track A · General AI

Long-running LLM agents accumulate interaction histories far larger than any context window, forcing a standing decision: what to encode deeply, what to forget, and what to retrieve under a fixed memory budget. Production systems answer with semantic similarity or recency -- both mis-specified for the forgetting decisi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.5

Speculative Rollback Correction for Quality-Diverse Web Agent Imitation

2026-06-10 · Longkun Hao, Hongyu Lin, Hao Li, Zhichao Yang, Haojie Hao, Dongshuo Huang, Haitao Yang, Hongyu Ge, Ming jie Xie, Yanjun Wu, Zi Hao Yin, Yan Bai, Yihang Lou

Research Track B · General AI

Training interactive web agents through imitation learning from expert trajectories has emerged as a highly effective approach. However, determining the optimal timing for expert intervention presents a critical challenge in this context. Delayed intervention often leads to the accumulation of early-stage errors, pushi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.5

Federated continual learning: A comprehensive survey on lifelong and privacy-preserving learning over distributed and non-stationary data

2026-06-09 · Masoume Gholizade, Fabrizio Ruffini, Pietro Ducange, Francesco Marcelloni

Research Track A · General AI

Federated Learning (FL) enables collaborative and privacy-preserving model training across distributed clients, but most existing FL systems implicitly assume data stationarity. In real-world settings-such as healthcare, industrial IoT (IIOT), cybersecurity, and smart cities-data streams are inherently non-stationary, …

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.3

Agents-K1: Towards Agent-native Knowledge Orchestration

2026-06-11 · Zongsheng Cao, Bihao Zhan, Jinxin Shi, Jiong Wang, Fangchen Yu, Zhijie Zhong, Zijie Guo, Tianshuo Peng, Zhuo Liu, Yi Xie, Xiang Zhuang, Yue Fan, Runmin Ma, Shiyang Feng, Xiangchao Yan, Anran Liu, Peng Ye, Wenlong Zhang, Shufei Zhang, Chunfeng Song, Fenghua Ling, Jie Zhou, Liang He, Bo Zhang, Lei Bai

General AI

Current LLM-based research agents have advanced through agent orchestration, yet largely overlook scientific knowledge orchestration. Existing works often reduce papers to abstracts, surface mentions, and flat \texttt{cites} edges, omitting key entities, claims, evidence, mechanisms, and method lineages essential for s…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.3

InterleaveThinker: Reinforcing Agentic Interleaved Generation

2026-06-11 · Dian Zheng, Harry Lee, Manyuan Zhang, Kaituo Feng, Zoey Guo, Ray Zhang, Hongsheng Li

General AI

Recent image generators have demonstrated impressive photorealism and instruction-following capabilities in single-image generation and editing. However, constrained by their architectures, they cannot achieve interleaved generation (text-image sequence), which has crucial applications in visual narratives, guidance, a…

Review
pending
Role
unreviewed
Read
now
huggingface Score 17.5

ToolSense: A Diagnostic Framework for Auditing Parametric Tool Knowledge in LLMs

2026-06-04 · Ashutosh Hathidara, Sai Shruthi Sistla, Sebastian Schreiber, Sahil Bansal

General AI

Large language models deployed as agents over large tool catalogs face a critical tool-retrieval bottleneck. As embedding-based retrieval approaches rely on compact encoders that may under-capture specialized tool semantics, parametric tool retrieval addresses this by encoding each tool as a virtual token appended to t…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.5

Amnesia: A Stealthy Replay Attack on Continual Learning Dreams

2026-06-10 · Ahmed Sharshar, Naveen Kumar Kummari, Mohsen Guizani

Research Track A · General AI

Continual learning (CL) models often use experience replay to reduce catastrophic forgetting, but their robustness to replay sampling interference remains underexplored. Existing CL attacks alter inputs or training pipelines (poisoning/backdoors) and rarely include explicit auditable constraints, limiting realism. Here…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.3

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

2026-06-11 · Jundong Xu, Qingchuan Li, Jiaying Wu, Yihuai Lan, Shuyue Stella Li, Huichi Zhou, Bowen Jiang, Lei Wang, Jun Wang, Anh Tuan Luu, Caiming Xiong, Hae Won Park, Bryan Hooi, Zhiyuan Hu

General AI

Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. In contrast, real-world deployment is inherently dynamic, requiring agents to continually align their knowledge, skills, and behavior with changing environments and updated …

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.0

TaskFusion: Continual Anomaly Detection for Heterogeneous Tabular Data

2026-06-10 · Dayananda Herurkar, Federico Raue, Joachim Folz, Jörn Hees, Andreas Dengel

Research Track A · General AI

Continual anomaly detection in tabular data is challenging and remains largely underexplored, particularly in settings with heterogeneous feature schemas, distribution shifts, and severe class imbalance. In many real-world applications, data arrive sequentially from diverse domains, rendering conventional continual lea…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.3

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

2026-06-11 · Seokju Cho, Ryo Hachiuma, Abhishek Badki, Hang Su, Byung-Kwan Lee, Chan Hee Song, Sifei Liu, Subhashree Radhakrishnan, Seungryong Kim, Yu-Chiang Frank Wang, Min-Hung Chen

General AI

Spatial reasoning, the ability to determine where objects are, how they relate, and how they move in 3D, remains a fundamental challenge for vision-language models (VLMs). Tool-augmented agents attempt to address this by augmenting VLMs with specialist perception modules, yet their effectiveness is bounded by the actio…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.0

MOSAIC: Modality-Specific Adaptation for Incremental Continual Learning in Parkinson's Disease Gait Assessment

2026-06-11 · Minlin Zeng, Zhipeng Zhou, Yang Qiu, Martin J. McKeown, Zhiqi Shen

Research Track A · General AI

Gait-based Parkinson's disease assessment increasingly relies on heterogeneous sensors, but clinical systems rarely collect all modalities simultaneously. New sensors may arrive through device upgrades, protocol changes, or multi-center deployment, while historical patient data are often unavailable because of privacy …

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.0

The Stable Recovery Manifold: Geometric Principles Governing Recoverability in Continual Learning

2026-06-11 · Ayushman Trivedi, Bhavika Melwani

Research Track A · General AI

Catastrophic forgetting is often viewed as the destruction of previously learned knowledge during sequential learning. Building on the Accessibility Collapse framework, we investigate the geometric structure of recoverability in continual learning. Using Split CIFAR-100 and a sequentially trained ResNet-18, we analyze …

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

2026-06-10 · Michal Chudoba, Sergey Alyaev, Petra Galuscakova, Tomasz Wiktorski

General AI

There are two main Parameter-Efficient Fine-Tuning (PEFT) techniques for Large Language Models (LLMs). While Low-Rank Adaptation (LoRA) introduces additional weights between the LLM layers, Soft Prompting introduces additional fine-tuning-specific raw tokens to an LLM input. However, both require modification to the co…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

A Three-Layer Framework for AI in Scientific Discovery

2026-06-11 · Guojun Liao

General AI

Current discussions of AI in scientific discovery are often dominated by two visible capabilities: search over existing knowledge and execution through optimization, simulation, and automation. Both are important, but neither fully captures the central act of discovery: the formation and evolution of models. This paper…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

2026-06-11 · Zilin Xiao, Qi Ma, Chun-cheng Jason Chen, Xintao Chen, Avinash Atreya, Hanjie Chen, Vicente Ordonez

General AI

Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is poorly suited for complex reasoning tasks: a semantically similar problem may demand an entirely different solution strategy, wh…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

Recursive Agent Harnesses

2026-06-11 · Elias Lumer, Sahil Sen, Kevin Paul, Vamse Kumar Subbiah

General AI

Recursive language models (RLMs) showed that recursion over model calls is an effective strategy for long-context reasoning, and production coding agents have begun to write code that spawns subagents at scale, most recently in Anthropic's dynamic workflows. We name and study the pattern between these two lines of work…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents

2026-06-11 · Zihao Wang, Yiming Li, Yutong Wu, Zheyu Liu, Kangjie Chen, Fok Kar Wai, Pin-Yu Chen, Vrizlynn L. L. Thing, Bo Li, Dacheng Tao, Tianwei Zhang

Research Track B · General AI

Web agents driven by large language models (LLMs) are increasingly deployed in real-world environments, where they operate over untrusted web content and execute actions with direct consequences. This makes them vulnerable to prompt-injection attacks, in which seemingly benign content embeds adversarial instructions th…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.5

Phase model analysis of the effect of M-current on neural synchrony in hippocampal networks

2026-06-10 · Megha Manoj, Sue Ann Campbell

Research Track A

Neural assemblies, transiently coordinated groups of neurons, observed in the hippocampus are thought to underlie the formation of episodic memories. Acetylcholine (ACh), a neuromodulator, that is received by the hippocampus, plays a critical role in memory and learning. A well supported hypothesis suggests that high l…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

$\texttt{WEAVER}$, Better, Faster, Longer: An Effective World Model for Robotic Manipulation

2026-06-11 · Arnav Kumar Jain, Yilin Wu, Jesse Farebrother, Gokul Swamy, Andrea Bajcsy

General AI

The potential impacts of world models (WMs, i.e., learned simulators) on robotics are far-reaching -- policy evaluation, policy improvement, and test-time planning -- all with limited real-world interaction. To unlock these downstream capabilities, a WM needs to jointly satisfy three desiderata: $\textit{(i)}$ fidelity…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

LEDGER: A Long-Context Benchmark of Corporate Annual Reports for Grounded Financial Retrieval and Extraction

2026-06-11 · Charles Moslonka, Amaury de Vitry, Arthur Garnier, Hicham Randrianarivo, Emmanuel Malherbe

General AI

Finance reporting is a natural proving ground for large language models, and the very-long-context capabilities of recent models across all sizes make rigorous evaluation in this domain an increasingly pressing need. Yet most public financial resources reduce the task to plain-text SEC 10-K filings paired with a handfu…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

Mana: Dexterous Manipulation of Articulated Tools

2026-06-11 · Zhao-Heng Yin, Guanya Shi, Pieter Abbeel, C. Karen Liu

General AI

Articulated tool manipulation remains a major challenge in dexterous robotics due to the need to coordinate internal degrees of freedom and contact-rich interactions. While prior work has largely focused on rigid objects, articulated tool use remains underexplored because of its physical complexity and the difficulty o…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data

2026-06-11 · Jiwen Liu, Shujuan Li, Zhixue Fang, Xiaohan Li, Yan Zhou, Zijie Meng, Zhimin Zhang, Yawen Luo, Guoxin Zhang, Yu-Shen Liu, Pengfei Wan

General AI

Cloning camera motion from reference videos is an important task in video generation, as videos provide intuitive and precise control. Existing methods either directly use parametric representations that fail to handle multi-shot generation or synthesize cross-paired data, which suffer from data scarcity, resulting in …

Review
pending
Role
unreviewed
Read
now
huggingface Score 12.5

See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous Agents

2026-06-11 · Siyi Chen, Xiaoyan Zhang, Meng Wu, Jonathan Tremblay, Valts Blukis, Stan Birchfield, Rene Vidal, Alvaro Velasquez, Sijia Liu, Qing Qu

General AI

Multi-agent systems communicate mostly through text, paying a lossy and expensive decode and re-encode cost. KV-cache communication is a promising alternative, yet most prior work is homogeneous, using duplicate copies of the same model, and avoids the central challenge of cross-model latent alignment; existing heterog…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.3

HyperTool: Beyond Step-Wise Tool Calls for Tool-Augmented Agents

2026-06-11 · Yaxin Du, Yifan Zhou, Yujie Ge, Jiajun Wang, Xianghe Pang, Shuo Tang, Tuney Zheng, Bryan Dai, Jian Yang, Siheng Chen

General AI

Tool-augmented LLM agents commonly rely on step-wise atomic tool calls, where each invocation, observation, and value transfer is exposed in the main reasoning trace. This creates an \emph{execution-granularity mismatch}: locally deterministic tool workflows are unfolded into repeated model-visible decisions, consuming…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.0

Accurate and Resource-Efficient Federated Continual Learning

2026-06-09 · Jebacyril Arockiaraj, Dhruv Parikh, Jayashree Adivarahan, Rajgopal Kannan, Viktor Prasanna

Research Track A · General AI

Federated continual learning (FCL) must learn from distributed task streams under limited resources, such as communication, computation, memory, and label availability. Existing FCL methods often rely on repeated local optimization, replay, and full supervision. Analytic alternatives avoid iterative training and replay…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

System Report for CCL25-Eval Task 5: New Dataset and LoRA-Fine-Tuned Qwen2.5

2026-06-10 · Haotao Xie

General AI

Recently, large language models (LLMs) have achieved promising progress in the fields of classical Chinese translation and the generation of classical poetry. However, domain-specific research on precise translation and affective-semantic understanding of classical poetry remains limited. The main challenge is that mos…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

Reasoning as Pattern Matching: Shared Mechanisms in Human and LLM Everyday Reasoning

2026-06-11 · Zach Studdiford, Gary Lupyan

General AI

When large language models (LLMs) fail to generalize or make haphazard errors in reasoning, it is often taken as evidence that LLMs are not truly reasoning, but rather performing a kind of pattern matching. The implication is that people's behavior does not exhibit the same types of failures because human reasoning use…

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.5

The Cold-Start Safety Gap in LLM Agents

2026-06-05 · Chung-En Sun, Linbo Liu, Tsui-Wei Weng

General AI

Are tool-calling LLM agents equally safe throughout a conversation? We discover they are not: agents are most vulnerable at the very start of a session and become substantially safer after a few regular agentic tasks -- a phenomenon we term the cold-start safety gap. To study this systematically, we introduce Safety Ov…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 10.5

Risk Under Pressure: Compute-Aware Evaluation of Adversarial Robustness in Language Models

2026-06-09 · Malikeh Ehghaghi, Boglárka Ecsedi, Marsha Chechik, Colin Raffel

General AI

Adversarial robustness evaluations of large language models (LLMs) typically report attack success rate (ASR) under fixed query budgets, implicitly treating all attacks as equally costly. In practice, the computational expense of different attack strategies can vary by orders of magnitude. Consequently, ASR at a fixed …

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.5

Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents

2026-06-11 · Yujun Zhou, Kehan Guo, Haomin Zhuang, Xiangqi Wang, Yue Huang, Zhenwen Liang, Pin-Yu Chen, Tian Gao, Nuno Moniz, Nitesh V. Chawla, Xiangliang Zhang

General AI

Interactive LLM agents are becoming part of daily work, but they do not reliably become easier to work with over time: a correction remembered in one session may still be violated in the next. We study this gap between preference access and preference compliance. In tasks derived from anonymized real-user friction case…

Review
pending
Role
unreviewed
Read
now
huggingface Score 9.5

TreeSeeker: Tree-Structured Trial, Error, and Return in Deep Search

2026-06-10 · Zhuofan Shi, Mingzhe Ma, Lu Wang, Fangkai Yang, Pu Zhao, Yiming Guan, Youling Huang, Wei Zhang, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan

General AI

Deep search requires agents to answer complex questions through multi-step web search, browsing, evidence comparison, and synthesis. A central challenge is deciding how to search when several directions look plausible but only some will later lead to reliable evidence. If an agent greedily follows the current best-look…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.0

CloudCons: A Comprehensive End-to-End Benchmark for Cloud Resource Consolidation

2026-06-11 · Xiaobin Zhang, Lefei Shen, Mouxiang Chen, Zhuo Li, Hongkai Li, Han Fu, Jianling Sun, Xiaoxue Ren, Chenghao Liu

Research Track A · General AI

Driven by conservative over-provisioning to guarantee service reliability, resource utilization in cloud data centers remains at low levels. To mitigate this, the forecast-then-optimize paradigm has emerged to optimize consolidation by anticipating future demands. While emerging time series foundation models promise to…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.5

From 2D Grids to 1D Tokens: Reforming Shared Representations for Multimodal Image Fusion

2026-06-10 · Yuchen Xian, Yunqiu Xu, Yang He, Yi Yang

General AI

Multimodal image fusion aims to integrate complementary information from different modalities into a fused image that preserves rich local details while maintaining globally consistent appearance. Existing approaches build shared representations on 2D feature grids, which excel at modeling local structures but offer li…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

2026-06-11 · Amy Xin, Jiening Siow, Junjie Wang, Zijun Yao, Fanjin Zhang, Jian Song, Lei Hou, Juanzi Li

General AI

LLM-based agents have shown increasing potential in automating scientific discovery. Given an optimizable metric and an execution environment, they can propose, validate, and iterate scientific solutions, and have produced results that outperform human-designed approaches. As model capabilities continue to improve, we …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.5

OpenRoundup: Multi-Table Data Wrangling Through Interactive Visualization

2026-06-10 · Stephen Kasica, Charles Berret, Tamara Munzner

Research Track A

Data journalists routinely integrate records across multiple independently published sources to support accountability reporting, yet no existing interactive wrangling tool treats the collection of tables -- rather than the single table -- as its primary unit of work. We present OpenRoundup, an open-source, browser-bas…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.5

A Sustainable Integrated Framework for Multi-Type Urban Waste Collection and Recycling

2026-06-11 · Víctor Blanco, J. Fernando Camacho-Vallejo, Yolanda Hinojosa

Research Track A

Urban waste management faces increasing operational and environmental challenges driven by population growth, heterogeneous waste streams, traffic congestion, and the need for sustainable collection infrastructures. We present an integrated optimization framework for the design of multi-type urban waste collection and …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Automated reproducibility assessments in the social and behavioral sciences using large language models

2026-06-11 · Tobias Holtdirk, Pietro Marcolongo, Anna Steinberg Schulten, Felix Henninger, Stefan Rose, Sarah Ball, Bolei Ma, Frauke Kreuter, Markus Weinmann, Stefan Feuerriegel

General AI

Reproducibility in the social and behavioral sciences is typically evaluated by independent researchers who reanalyze the original data to assess whether the published findings can be recovered. However, such approaches are resource-intensive and difficult to scale. Here, we show that large language models (LLMs) can a…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Beyond Uniform Tokens: Adaptive Compression for Time Series Language Models

2026-06-11 · Jialin Gan, Xin Qiu, Guangzhe Chen, Xue Wang

Research Track A · General AI

Large language models (LLMs) have enabled time series (TS) analysis by jointly modeling numerical observations and textual context through a shared token interface. However, TS tokens and prompt tokens exhibit fundamentally different information structures, making uniform token processing inefficient. In this paper, we…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.5

Where, What, Why, and Importance: Structured Defect Grounding for Text-to-Image Feedback

2026-06-04 · Huaisong Zhang, Hao Yu, Yuxuan Zhang, Jiahe Wang, Xinrui Chen, Haoxiang Cao, Feng Lu, Wendong Zhang, Changqian Yu, Chun Yuan

General AI

Despite generating increasingly photorealistic images, text-to-image (T2I) models still exhibit localized, subtle, and structurally complex failures. Diagnosing these failures requires instance-level feedback that answers where a defect occurs, what type it is, why it is defective, and its importance to overall image q…

Review
pending
Role
unreviewed
Read
later
huggingface Score 6.5

A Stationary (and Therefore Compatible) Representation is All You Need

2026-06-10 · Niccolò Biondi, Federico Pernici, Simone Ricci, Alberto Del Bimbo

General AI

Learning compatible representations aims to learn feature representations that can be used interchangeably over time whenever a model undergoes updates. In this paper, we demonstrate that stationary representations learned by d-Simplex fixed classifiers imply compatibility as in its formal definition. This result estab…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

Improving Robotic Generalist Policies via Flow Reversal Steering

2026-06-11 · Andy Tang, William Chen, Andrew Wagenmaker, Chelsea Finn, Sergey Levine

General AI

Generalist policies can learn a wide range of skills from diverse robot datasets. In order to solve or improve on challenging news tasks, we need a way to infer and invoke the appropriate actions from the policy's rich behavioral prior, especially when directly commanding the policy fails. We focus on flow matching gen…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 5.5

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

2026-05-30 · Etienne Casanova, Rafal Kocielnik, R. Michael Alvarez

General AI

Large Language Models (LLMs) are increasingly used for zero-shot annotation and LLM-as-a-judge tasks, yet their reliability hinges on how model-internalized priors interact with user-provided instructions. We investigate three dimensions of this interaction: (1) how an LLM's familiarity with data and task definitions a…

Review
pending
Role
unreviewed
Read
later
huggingface Score 5.5

HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers

2026-06-11 · Guozhen Zhang, Xuerui Qiu, Yutao Cui, Tianhui Song, Changlin Li, Junzhe Li, Tao Huang, Xiao Zhang, Yang Li, Jianbing Wu, Miles Yang, Zhao Zhong, Liefeng Bo, Limin Wang

General AI

Holistic visual tokenizers are fundamental to unified multimodal models (UMMs) as they map diverse visual inputs into a unified representation space. In this paper, we present HYDRA-X, the first UMM that unifies image and video tokenization within a single Vision Transformer (ViT). Our design is driven by two core chal…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.3

Fine-tuning MLIP foundation models: strategies for accuracy and transferability

2026-06-10 · Tamás Lajos Tompa, Eszter Varga-Umbrich, Ilyes Batatia, Alin M. Elena, Noam Bernstein, Gábor Csányi

General AI

Adapting machine-learned interatomic potential (MLIP) foundation models to specialised tasks through fine-tuning is an increasingly important practice, yet systematic guidance on when and how to fine-tune is currently limited. We evaluate seven fine-tuning strategies -- naive full-parameter updates, two layer-freezing …

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.3

GIVE: Grounding Human Gestures in Vision-Language-Action Models

2026-06-11 · Pengfei Liu, Gen Li, Junqiao Fan, Boyu Ma, Jindou Jia, Yang Xiao, Jianfei Yang

General AI

Human communication is inherently multimodal, where language is often accompanied by non-verbal cues such as gestures to convey intentions. However, current Vision-Language-Action (VLA) models treat robotic manipulation as a pure text-driven task, overlooking the important role of gestures in Human-Robot Interaction (H…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.3

Hölder++: Improving the Quality-Coherence Trade-off in Multimodal VAEs

2026-06-11 · Huyen Vo, María Martínez-García, Isabel Valera

General AI

Existing approaches for multimodal variational autoencoders (VAEs) face a trade-off between generative quality and coherence-i.e., they struggle to generate realistic and diverse samples that, at the same time, are semantically consistent across modalities. A recent work shows that using a simple approximation to Hölde…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.3

Influcoder: Distilling Decoders' Gradient Influence Rankings into an Encoder for Data Attribution

2026-06-11 · Dimitri Kachler, Damien Sileo, Pascal Denis

General AI

With the growth of LLMs' (Large Language Models) capabilities, there has been an increasing push to curate high quality datasets by filtering samples in the training data. In general, Data Attribution (DA) methods aim to estimate how individual samples in a training dataset can precondition a model to generate certain …

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.3

LoRA-Muon: Spectral Steepest Descent on the Low-Rank Manifold

2026-06-11 · Franz Louis Cesista, Katherine Crowson, Cédric Simal, Stella Biderman

General AI

Low-Rank Adaptation (LoRA) significantly reduces compute and memory costs for finetuning Deep Learning models but is often harder to tune than dense training: when using factor-wise optimizers such as AdamW, it is sensitive to initialization choices, its optimal learning rates transfer poorly across ranks, and it often…

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.5

Flash-GMM: A Memory-Efficient Kernel for Scalable Soft Clustering

2026-06-09 · Gal Bloch, Ariel Gera, Matan Orbach, Ohad Eytan, Assaf Toledo

General AI

We present Flash-GMM, a fused Triton kernel for efficient computation of Gaussian Mixture Models (GMMs) over large-scale data in a single GPU pass. By eliminating the need to materialize the full responsibility matrix in GPU memory, Flash-GMM achieves a 20times speedup over existing implementations and enables training…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Adapting Prithvi-EO for Fallow Detection for Food-Water Nexus: ViT-Adapter Necks and Parameter-Efficient Backbone tuning of Geospatial Foundation Model

2026-06-10 · Sk Muhammad Asif, Orhun Aydin

General AI

Understanding spatial distribution of fallow land is important for optimizing the food-water (FW) nexus, given fallowing's role in crop rotation and water conservation. Fallow is a low accuracy class in USDA Cropland Data Layer (CDL). Geospatial foundation model (GFM), Prithvi-EO has shown strong transferability across…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

RepWAM: World Action Modeling with Representation Visual-Action Tokenizers

2026-06-11 · Junke Wang, Qihang Zhang, Shuai Yang, Yiming Luo, Yujun Shen, Zuxuan Wu, Yu-Gang Jiang, Yinghao Xu

General AI

This work presents RepWAM, a representation-centric world action model (WAM) built on representation visual-action tokenizers. Existing WAMs typically inherit reconstruction-oriented video tokenizers from pretrained video generation models. Although these tokenizers preserve visual fidelity, pixel reconstruction alone …

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

The Moving Drone: Negotiating Agency Between the Voice and the Virtual

2026-06-11 · Nithya Shikarpur, Victor Arul, Anna Huang

General AI

Melodic material in Hindustani music is presented in relation to a tonic, usually sustained by the tanpura, a four-stringed drone instrument. Rooted in Hindustani music, 'The Moving Drone' sets the traditionally static drone into motion that, throughout the performance, gains increasing agency transitioning from reacti…

Review
pending
Role
unreviewed
Read
later