Research Paper Cockpit

Daily Digest - 2026-06-24

Papers first seen in this daily snapshot.

Daily Archives

Quick jump into generated daily digests.

Research Workflow

Latest digest: 2026-07-04.

Papers

62 visible entries

arxiv Score 36.4

RECALL: Recovery Experience Collection for Active Lifelong Learning in Vision-Language-Action Models

2026-06-22 · Ulas Berk Karli, Tesca Fitzgerald

Research Track A · General AI

Vision-Language-Action (VLA) models are commonly fine-tuned through passive imitation learning, where additional demonstrations are collected for tasks where the policy performs poorly. This approach incurs several downsides: it requires the robot to fail before data collection is triggered, provides little guidance ab…

Review
pending
Role
unreviewed
Read
now
arxiv Score 24.4

MixedPEFT: Combining Multiple PEFT Methods with Mixed Objectives for Unsupervised Domain Adaptation

2026-06-20 · Mohammed Rawhani, Dervis Karaboga, Ozkan Ufuk Nalbantoglu, Alper Basturk, Bahriye Akay

Research Track A · General AI

Pre-trained language models struggle when applied to new domains, as full fine-tuning is computationally expensive and prone to catastrophic forgetting. This study addresses this challenge by presenting a novel parameter-efficient strategy for unsupervised domain adaptation that combines custom PEFT architectures with …

Review
pending
Role
unreviewed
Read
now
arxiv Score 23.9

CADRE: Stable, Parameter Efficient Adaptation of Medical Vision Language Models with Bounded Forgetting and Prior Drift

2026-06-22 · Amrita Singh, Rishabh Jha

Research Track A

Medical vision-language models (VLMs) such as BiomedCLIP generalize broadly, but adapting them to a clinical service is as much a safety problem as an accuracy one. Updating a deployed model for a new imaging modality can fail silently in two ways that harm patients: it can forget modalities it already handled (catastr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 22.2

CineCap: Structured Reasoning with Spatio-Temporal Anchors for Cinematographic Video Captioning

2026-06-23 · Xinyu Mao, Yuhui Zeng, Xiaokun Liu, Wenyu Qin, Meng Wang, Xin Tao, Pengfei Wan, Xiaohan Xing, Max Meng

General AI

Cinematographic captioning aims to describe how a video is filmed using professional film-language concepts such as camera movement, shot size, depth of field, composition, and shooting angle. This capability is important for fine-grained video understanding and controllable movie-quality video generation, yet remains …

Review
pending
Role
unreviewed
Read
now
huggingface Score 21.4

Escaping the Self-Confirmation Trap: An Execute-Distill-Verify Paradigm for Agentic Experience Learning

2026-06-23 · Shiding Zhu, Yudi Qi, Yajie Wang, Jiaze Li, Chao Song, Yaorui Shi, Yibo Miao, Hanqi Gao, Kai Zhang

General AI

Experience-driven self-evolution is critical for large language model (LLM) agents to improve through open-world interaction. However, existing experience learning methods mostly rely on single-agent loops, where the same agent executes tasks, summarizes outcomes, and determines memory content. This setup makes agents …

Review
pending
Role
unreviewed
Read
now
huggingface Score 21.4

Qwen-AgentWorld: Language World Models for General Agents

2026-06-23 · Yuxin Zuo, Zikai Xiao, Li Sheng, Fei Huang, Jianhong Tu, Yuxuan Liu, Tianyi Tang, Xiaomeng Hu, Yang Su, Qingfeng Lan, Yantao Liu, Qin Zhu, Yinger Zhang, Bowen Yu, Haiquan Zhao, Haiyang Xu, Jianxin Yang, Jiayang Cheng, Junyang Wang, Lianghao Deng, Mingfeng Xue, Tianyi Bai, Yang Fan, Yubo Ma, Yucheng Li, Zeyu Cui, Zhihai Wang, Zhihui Xie, Zhuorui Ye, An Yang, Dayiheng Liu, Jingren Zhou, Ning Ding

General AI

A world model predicts environment dynamics based on current observations and actions, serving as a core cognitive mechanism for reasoning and planning. In this work, we investigate how world modeling based on language models can further push the boundaries of general agents. (i) We first focus on building foundation m…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.2

Are We Ready For An Agent-Native Memory System?

2026-06-23 · Wei Zhou, Xuanhe Zhou, Shaokun Han, Hongming Xu, Guoliang Li, Zhiyu Li, Feiyu Xiong, Fan Wu

Research Track A · General AI

Memory for large language model (LLM) agents has rapidly evolved from simple retrieval-augmented mechanisms into a data management system that supports persistent information storage, retrieval, update, consolidation, and dynamic lifecycle governance throughout agent execution. Despite this evolution, existing evaluati…

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.2

Agentic Collaborative Cognition for Zero-Shot 3D Understanding

2026-06-23 · Wenxin Wang, Bo Zhang, Feng Chen, Zixuan Wang, Wen Li, Changsheng Li, Yinjie Lei

General AI

Recent advancements have explored agentic zero-shot 3D understanding by reformulating it as video keyframe understanding with Multimodal Large Language Models (MLLMs). However, existing methods face an intrinsic bottleneck due to the finite observation perspectives inherent in videos and the implicit perception of 3D s…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.8

MIRAGE: Stealthy Visual Prompt Injection for Vulnerability Detection in Web Agents

2026-06-16 · Xuelong Dai, Jianyu Ma, Boyang Ma, Biwei Yan, Yijun Yang, Yue Zhang

Research Track B · General AI

Multimodal Large Language Model (MLLM)-based web agents provide practical, high-precision solutions for visual browser automation; however, they inherently expand the attack surface, introducing novel vision-based vulnerabilities. Existing adversarial evaluations targeting these agents frequently rely on permissive thr…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.5

Social World Model for Lifelong Social Intelligence

2026-06-19 · Yu Luo

Research Track A · General AI

Social intelligence is a core competency for language agents, yet current research primarily focuses on static capability evaluation rather than how these skills are continuously shaped and accumulated. This gap calls for a shift toward sustainable learning paradigms. Currently, two methodological pain points exist: so…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.9

Attention-Spectrum Regularization for Replay-Free Continual Multimodal LLMs

2026-06-22 · Chuangxin Zhao, Canran Xiao, Siyuan Ma, Mengyao Lyu, Yanbiao Ma, Jun Xia, Guiguang Ding, Yang Liu

Research Track A · General AI

Multimodal large language models (MLLMs) are increasingly required to adapt to non-stationary streams of visual domains, question types, and user instructions, yet continual fine-tuning often causes severe forgetting of previously acquired multimodal skills. Existing continual vision-language methods mainly preserve ou…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.2

SHERLOC: Structured Diagnostic Localization for Code Repair Agents

2026-06-23 · Hovhannes Tamoyan, Sean Narenthiran, Erik Arakelyan, Mira Mezini, Boris Ginsburg

General AI

LLM agents solve repository-level coding tasks through multi-turn tool use, but utilize half their budget on locating faults before editing. Dedicated localization frameworks have emerged, yet are still evaluated as file retrieval rather than actionable diagnosis, producing locations without the diagnostic context a re…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.2

UniDrive: A Unified Vision-Language and Grounding Framework for Interpretable Risk Understanding in Autonomous Driving

2026-06-23 · Xiaowei Gao, Pengxiang Li, Yitai Cheng, Ruihan Xu, James Haworth, Stephen Law, Yun Ye

General AI

Recent multimodal large language models (MLLMs) have shown strong potential for autonomous driving scene understanding, yet existing methods still face a fundamental trade-off between temporal reasoning and spatial precision. Models that rely on single-frame or low-resolution inputs often miss small, distant, or partia…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.5

Gradient-Free Warm-Start Library Recovery: an Amortized-Regret Separation

2026-06-19 · Jianwei Lou

Research Track A · General AI

Continual learning that is gradient-free, local, online, and append-only is attractive for edge and streaming deployment, but its value is usually argued informally. We give a provable account on recurring-regime streams. Given segmentation, a warm-start library learner attains amortized recovery cost $O\!\big(KD/\vare…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.0

Task-Differentiated Atomic Skill Expansion and Routing for Continual Learning Across Highly Heterogeneous Tasks

2026-06-19 · Jiacheng Wang, Xinjia He, Qi Ding, Yutao Yang, Jie Zhou, Liyang Yu, Liang Dou, Qin Chen

Research Track A · General AI

Continual learning (CL) is commonly studied under the assumption that sequential tasks are semantically related or structurally similar. However, in highly heterogeneous settings, where tasks differ substantially in reasoning patterns and input-output formats, existing methods often suffer from catastrophic forgetting …

Review
pending
Role
unreviewed
Read
now
huggingface Score 16.4

ReMMD: Realistic Multilingual Multi-Image Agentic Verification for Multimodal Misinformation Detection

2026-06-23 · Chenhao Dang, Dantong Zhu, Jun Yang, Conghui He, Weijia Li

General AI

Multimodal misinformation detection is increasingly important because viral posts now combine long multilingual narratives, several images, mixed provenance, and subtle text--image framing errors. Existing benchmarks and methods remain poorly matched to this setting: they usually isolate short captions, single images, …

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.2

EG-VQA: Benchmarking Verifiable Video Question Answering with Grounded Temporal Evidence

2026-06-23 · Linpeng Huang, Weixing Chen, Zexin Chen, Yang Liu, Liang Lin

General AI

Recent advances in Video Large Language Models (Video-LLMs) have yielded promising performance on video question answering (VideoQA). Nevertheless, existing benchmarks are predominantly evaluated through answer correctness, while the grounding of predictions in relevant video evidence remains largely unexamined. This d…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.2

IV-CoT: Implicit Visual Chain-of-Thought for Structure-Aware Text-to-Image Generation

2026-06-23 · Zixuan Li, Haokun Lin, Yicheng Xiao, Zhiwei Li, Xinyang Song, Zelong Zheng, Yong He, Heng Yao, Ke Ding, Chao Yu, Chuan Yuan, Qi Li, Zhenan Sun

General AI

Unified multi-modal large language models (MLLMs) have achieved strong text-to-image generation quality, but still struggle with structure-aware prompt following, where object counts, spatial relations, attribute bindings, and coarse layouts must be preserved. We attribute this limitation in part to the entanglement of…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.2

Accuracy and Satisfaction in Multi-Turn LLM Dialogues for NFR Assessment

2026-06-23 · Ali Pourghasemi Fatideh, Wilder Baldwin, Maria Dhakal, Collin McMillan, Sepideh Ghanavati

General AI

LLM-based dialogue assistants have become mainstream tools for software developers, yet current evaluation benchmarks focus exclusively on functional correctness. This leaves a critical gap in assessing the quality and accuracy of these conversations when handling Non-Functional Requirements (NFRs), which are inherentl…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.0

LingxiDiagBench: A Multi-Agent Framework for Benchmarking LLMs in Chinese Psychiatric Consultation and Diagnosis

2026-06-11 · Shihao Xu, Tiancheng Zhou, Jiatong Ma, Yanli Ding, Yiming Yan, Ming Xiao, Guoyi Li, Haiyang Geng, Yunyun Han, Jianhua Chen, Yafeng Deng

General AI

Mental disorders are highly prevalent worldwide, but the shortage of psychiatrists and the inherent subjectivity of interview-based diagnosis create substantial barriers to timely and consistent mental-health assessment. Progress in AI-assisted psychiatric diagnosis is constrained by the absence of benchmarks that simu…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.9

CFAgentBench: A Reproducible Environment and Benchmark for Autonomous Construction-Finance Agents

2026-06-20 · Rishi Srivastava

Research Track B · General AI

We introduce CFAgentBench, a reproducible, self-hostable environment and benchmark for autonomous construction-finance agents: a CFO/controller-class agent operating across the real software stack a US construction finance team runs - ERP, project management, email, documents, pay applications, payroll, certified payro…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

Multi-Agent Transactive Memory

2026-06-18 · To Eun Kim, Xuhong He, Dishank Jain, Ambuj Agrawal, Negar Arabzadeh, Fernando Diaz

Research Track B · General AI

The decentralized deployment of LLM agents with diverse capabilities across diverse tasks motivates infrastructure for knowledge sharing across heterogeneous agent populations. Just as search engines index human-generated artifacts to support human problem solving, retrieval systems can organize agent-generated artifac…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.5

When Web Agents Finish but Still Fail: Reproducible Triggers and Trace Diagnostics for Parallel Web Exploration

2026-06-16 · Aagam Sogani, Botao Rui, Swetha Vaidyanathan, Rishi Agarwal, Minghao Yan, Shivaram Venkataraman

Research Track B · General AI

Long-horizon web agents often fail in ways hidden by final-answer evaluation: they may visit useful pages, produce a well-formed answer, and terminate confidently while still missing fields, over-including unsupported items, or relying on stale evidence. We study these failures with Parallel WebBench, a parallel web-ex…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.4

Black-Box Continual Learning for Vision-Language Models

2026-06-22 · Yuting Li, Weihang Fang, Haoyuan Gao, Linghe Kong, Yexin Li, Lichao Sun, Weiran Huang

Research Track A · General AI

The rapid deployment of Vision-Language Models (VLMs) in dynamic environments necessitates the ability to learn continuously without forgetting. However, traditional continual learning (CL) settings often rely on white-box paradigms, which is increasingly invalidated by the shift toward cloud-hosted models. In this pap…

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.4

DREAM: Dense Retrieval Embeddings via Autoregressive Modeling

2026-06-23 · Yixuan Tang, Yi Yang

General AI

Dense retrieval embedding models are a fundamental component of modern retrieval-based AI systems. Most dense retrievers are trained with contrastive objectives, which require labeled positive and negative document pairs that are often costly and difficult to obtain. In this work, we investigate whether the autoregress…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.2

FLUX3D: High-Fidelity 3D Gaussian Generation with Diffusion-Aligned Sparse Representation

2026-06-23 · Haorui Ji, Weizhe Liu, Hongdong Li, Hengkai Guo

General AI

Sparse voxel representation has emerged as a scalable foundation for image-to-3D Gaussian Splatting (3DGS) generation, yet current methods struggle to preserve high-frequency visual details of input images due to two structural bottlenecks. First, they adopt discriminative 2D features optimized for semantic abstraction…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.2

Scaling Laws for Task-Specific LLM Distillation

2026-06-23 · Lavinia Ghita, Dhruv Desai, Ioana Boier

General AI

Large Language Models (LLMs) achieve strong performance across a growing range of domains, yet their scale poses deployment challenges in applications where latency and cost constraints are critical. This paper derives empirical scaling laws for domain-specific LLM compression, quantifying how in-domain and general kno…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.4

Fast and Slow Variational Continual Learning

2026-06-22 · Subarnaduti Paul, Yohan Jung, Mohammad Emtiyaz Khan, Siddharth Swaroop, Thomas Möllenhoff, Martin Mundt

Research Track A · General AI

Continual learning remains a major challenge for modern deep networks, partly because commonly used optimizers lack inherent mechanisms for continual adaptation. One such natural mechanism is fast and slow adaptation to balance stability and plasticity. This mechanism has deep roots in neuroscience and biology, but the…

Review
pending
Role
unreviewed
Read
now
huggingface Score 12.4

FlowR2A: Learning Reward-to-Action Distribution for Multimodal Driving Planning

2026-06-23 · Xirui Li, Zhe Liu, Xiaoqing Ye, Wenhua Han, Yifeng Pan, Junyu Han, Hengshuang Zhao

General AI

Multimodal driving planning faces a long-standing tension between two paradigms: scoring-based methods benefit from dense reward supervision but are confined to a fixed action vocabulary, while anchor-based methods generate proposals dynamically yet suffer from sparse supervision constrained to a single ground-truth tr…

Review
pending
Role
unreviewed
Read
now
huggingface Score 12.4

Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning

2026-06-23 · Chenhao Dang, Jing Ma, Mingjie Liao

General AI

The composition of training data, governed by the diversity of sources and their mixing strategy, is a cornerstone of Large Language Model (LLM) pre-training. Online Data Mixing (ODM), the technique of adaptively adjusting data mixtures during training, has emerged as a promising direction to improve efficiency. Howeve…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.2

Grad Detect: Gradient-Based Hallucination Detection in LLMs

2026-06-23 · Anand Kamat, Daniel Blake, Brent M. Werness

General AI

Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse tasks, yet they remain prone to generating hallucinations. Detecting these hallucinations is critical for deploying LLMs reliably in high-stakes applications. We present Grad Detect, a gradient-based approach for predicting hallucinat…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.2

PHANTOM: A Large-Scale Dataset of Multimodal Adversarial Attacks for Vision-Language Models

2026-06-23 · Simone Gallivanone, Hossein Khodadadi, Mauro Dore, Mauro Medda, Nicola Franco

General AI

We introduce a large-scale, open-source dataset of pre-generated adversarial attacks for vision-language models (VLMs). The dataset is designed to be diverse, representative, and practical, extending existing benchmarks by covering 10 high-level categories and 55 subcategories of harmful intents. Our primary goal is to…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.4

AOHP: An Open-Source OS-Level Agent Harness for Personalized, Efficient and Secure Interaction

2026-06-22 · Shanhui Zhao, Jiacheng Liu, Guohong Liu, Jichao Yan, Jialei Ye, Yuhao Yang, Hao Wen, Shizuo Tian, Yizhen Yuan, Yuxuan Chen, Yunxin Liu, Ju Ren, Ya-Qin Zhang, Chao Huang, Yao Guo, Yuanchun Li

General AI

AI agents are driving a new software paradigm, with the ability to autonomously call tools, extract information, manage memory, and complete tasks that span applications and data sources. Most existing end-user operating systems, however, are designed for application-centric workflows and offer little native support fo…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.4

Are Text-to-Image Models Inductivist Turkeys? A Counterfactual Benchmark for Causal Reasoning

2026-06-23 · Jiayi Lei, Yuandong Pu, Xingyu Han, Rongpeng Zhu, Jing Xu, Jinyao Wang, Zijian Zhou, Bin Fu, Yuewen Cao, Yihao Liu, Yongsheng Li

General AI

Text-to-image (T2I) generation models have achieved remarkable progress in producing visually realistic images from natural language prompts. Yet it remains unclear whether their success reflects genuine causal understanding or sophisticated pattern matching over visual-textual correlations. Inspired by Russell's induc…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.4

NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers?

2026-06-23 · Yuru Wang, Lejun Cheng, Yuxin Zuo, Sihang Zeng, Bingxiang He, Che Jiang, Junlin Yang, Yuchong Wang, Kaikai Zhao, Weifeng Huang, Kai Tian, Zhenzhao Yuan, Jincheng Zhong, Weizhi Wang, Ning Ding, Bowen Zhou, Kaiyan Zhang

General AI

We introduce NatureBench, a cross-discipline benchmark of 90 tasks distilled from peer-reviewed Nature-family publications, designed to evaluate whether AI coding agents can move beyond reproduction toward discovery on real scientific problems. NatureBench is built on NatureGym, an automated pipeline that constructs a …

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.2

BenchX: Benchmarking AI Models for Cancer Detection and Localization with Demographic and Protocol Biases

2026-06-23 · Qi Chen, Wenxuan Li, Pedro R. A. S. Bassi, Xinze Zhou, Jakob Wasserthal, Ibrahim Ethem Hamamci, Sezgin Er, Ashwin Kumar, Yiwen Ye, Yuhan Wang, Yuyin Zhou, Akshay S. Chaudhari, Curtis Langlotz, Kang Wang, Yang Yang, Alan L. Yuille, Zongwei Zhou

General AI

Artificial intelligence (AI) has achieved remarkable success in medical imaging, but it is widely recognized that these models often perform inconsistently across real-world clinical settings. Such inconsistencies occur when patient demographics and imaging protocols vary, for example, in detecting small tumors, analyz…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.0

Whose Agent Are You? Multi-Layer Fingerprinting and Attribution of Autonomous Web Agents

2026-06-18 · Dayeon Kang, Hyejun Jeong, Jade Sheffey, Pubali Datta, Amir Houmansadr

Research Track B · General AI

As AI web agents proliferate, combining large language models with autonomous, browser-level control, indiscriminate content scraping by web agents has emerged as a privacy and security challenge. Existing defenses, such as robots.txt and active bot-blocking, are insufficient, as they are widely violated and easily cir…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.9

Can Scale Save Us From Plasticity Loss in Large Language Models?

2026-06-23 · J. Fernando Hernandez-Garcia, Tomás Figliolia, Beren Millidge

Research Track A · General AI

The loss of plasticity - the ability of a network to learn new information after having already learned older information - is a fundamental challenge in creating artificial neural networks capable of continual learning. Although this phenomenon has been known for decades, it has mostly been studied in older, relativel…

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.7

MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management

2026-06-18 · Guangyi Liu, Gao Wu, Congxiao Liu, Pengxiang Zhao, Liang Liu, Mading Li, Qi Zhang, Mengyan Wang, Liang Guo, Yong Liu

Research Track B · General AI

MLLM-based mobile GUI agents have made substantial progress on short-horizon tasks, yet remain unreliable on long-horizon tasks that require retaining intermediate facts across many steps and app transitions. We attribute this limitation to ReAct-style prompting, which passively accumulates per-step records, leading to…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 10.4

World Value Models for Robotic Manipulation

2026-06-23 · Zhihao Wang, Jianxiong Li, Yu Cui, Yuan Gao, Xianyuan Zhan, Junzhi Yu, Xiao Ma

General AI

Generalist value models play a pivotal role in scaling robotic policy learning from large-scale, mixed-quality data. Mathematically, accurate value estimation demands deep temporal understanding, requiring models to both ground the current belief using historical context and plan over future outcomes. However, most exi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.2

DiffusionBench: On Holistic Evaluation of Diffusion Transformers

2026-06-23 · Xingjian Leng, Jaskirat Singh, Zhanhao Liang, Ethan Smith, Martin Bell, Aninda Saha, Yuhui Yuan, Liang Zheng

General AI

Diffusion transformer (DiT) research on image generation has converged to a single evaluation setup: class-conditional generation on ImageNet. While methods improve the FID and related metrics, it is increasingly unclear whether they reflect real progress in generative modeling. The natural alternative, i.e., text-to-i…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.2

FLAT: Feedforward Latent Triangle Splatting for Geometrically Accurate Scene Generation

2026-06-23 · Orest Kupyn, Goutam Bhat, Philipp Henzler, Fabian Manhardt, Christian Rupprecht, Federico Tombari

General AI

Generating explorable 3D scenes from a single image requires strong generative priors and accurate geometric representations suitable for downstream use. Current video diffusion models offer high-quality generation and implicitly encode multi-view geometric structure in latent space. However, existing feedforward laten…

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.0

EventVLA: Event-Driven Visual Evidence Memory for Long-Horizon Vision-Language-Action Policies

2026-06-18 · Ganlin Yang, Zhangzheng Tu, Yuqiang Yang, Sitong Mao, Junyi Dong, Tianxing Chen, Jiaqi Peng, Jing Xiong, Jiafei Cao, Jifeng Dai, Wengang Zhou, Yao Mu, Tai Wang

General AI

Memory remains a critical bottleneck for long-horizon robotic manipulation, as standard Vision-Language-Action (VLA) policies often fail when task-relevant cues become occluded or unobservable over time. While existing memory-augmented methods utilize historical context, they either suffer from severe information bottl…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.2

Bayesian Adaptation Gym: A Benchmark for the Bayesian Low-Rank Adaptation of Multi-Modal Language Models

2026-06-20 · Colin Samplawski, Ramneet Kaur, Manoj Acharya, Anirban Roy, Adam D. Cobb

General AI

Large multi-modal language models are increasingly deployed in high-stakes domains, making well-calibrated uncertainty essential. Traditional Bayesian methods approximate posteriors over all model weights, which becomes intractable for modern large models. For this reason, recent work instead considers Bayesian low-ran…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.2

"Zooming In" on Agentic Web Browsers as Assistive Technologies: A Case Study with a Low-Vision Technology Expert

2026-06-23 · Laura Colazzo, Giuseppe Anzillotti

General AI

Agentic Web Browsers (AWBs), powered by Large Language Models (LLMs), are emerging as autonomous systems capable of navigating the Web on behalf of users. Beyond enhancing productivity, they could also offer significant promise as Assistive Technologies (ATs) for visually-impaired individuals, transforming web interact…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.2

OpenThoughts-Agent: Data Recipes for Agentic Models

2026-06-23 · Negin Raoof, Richard Zhuang, Marianna Nezhurina, Etash Guha, Atula Tejaswi, Ryan Marten, Charlie F. Ruan, Tyler Griggs, Alexander Glenn Shaw, Hritik Bansal, E. Kelly Buchanan, Artem Gazizov, Reinhard Heckel, Chinmay Hegde, Sankalp Jajee, Daanish Khazi, Emmanouil Koukoumidis, Xiangyi Li, Hange Liu, Shlok Natarajan, Harsh Raj, Nicholas Roberts, Ethan Shen, Nishad Singhi, Michael Siu, Ashima Suvarna, Hanwen Xing, Patrick Yubeaton, Robert Zhang, Leon Liangyu Chen, Xiaokun Chen, Steven Dillmann, Saadia Gabriel, Xunyi Jiang, Anurag Kashyap, Boxuan Li, Yein Park, Minh Pham, Sujay Sanghavi, Lin Shi, Ke Sun, Yixin Wang, Zhiwei Xu, Erica Zhang, Siyan Zhao, Wanjia Zhao, Jenia Jitsev, Alex Dimakis, Benjamin Feuer, Ludwig Schmidt

General AI

Agentic language models dramatically expand the applications of AI yet little is publicly known about how to curate training data for broadly capable agents. Existing open efforts such as SWE-Smith, SERA, and Nemotron-Terminal typically target a single benchmark, leaving open the question of how to train models that ge…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.2

World Models in Pieces: Structural Certification for General Agents

2026-06-23 · Yikai Lu, Yifei Wu, Xinyu Lu, Tongxin Li

General AI

In the big-world regime, agents cannot be universally capable and their ability is inevitably specialized across a world model in pieces. Consequently, standard uniform guarantees fail to distinguish between the understanding of critical bottlenecks and irrelevant failures. We first formalize this limitation by proving…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.2

Grading the Grader: Lessons from Evaluating an Agentic Data Analysis System

2026-06-23 · Tian Zheng, Kai-Tai Hsu

General AI

Agentic data analysis systems produce rich outputs, including code, numerical results, and verbal diagnostics. This makes them more challenging to evaluate than single-turn LLM responses. It is therefore necessary to distinguish genuine disagreement between an agent's output and a ground-truth answer from grading artif…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.2

Vision-Language Model Reasoning for Contextual Semantic Mapping in Intralogistics

2026-06-23 · Marvin Rüdt, Hao Pang, Constantin Enke, Zäzilia Seibold, Kai Furmans

General AI

Autonomous mobile robots operating in intralogistics environments rely on geometric maps for localization and navigation, but lack semantic understanding of objects and their contextual properties. We present a contextual semantic mapping pipeline that combines SLAM-based geometric mapping, SAM-based instance segmentat…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.2

CoRDE: Concept-Prior Routed Diffusion Experts for Structural Generalization in Robot Manipulation

2026-06-20 · Haidong Huang, Xixin Zhao, Yaohua Zhou, Jiayu Song, Jiayi Zhang, Jun Ma, Haiyue Zhu, Xiaocong Li

General AI

Diffusion models excel at capturing multi-modal action distributions in robot imitation learning. However, in multi-task and long-horizon scenarios, monolithic architectures lack structural generalization capabilities, suffering from gradient conflicts between distinct semantic sub-stages. While pure data-driven Mixtur…

Review
pending
Role
unreviewed
Read
later
arxiv Score 7.2

Virtual Simulation for Mental Health

2026-06-23 · Anna Fang

General AI

Poorly designed interventions or those deployed without adequate safeguards can harm the communities they aim to serve, thus exacerbating existing vulnerabilities and leaving individuals unsupported. This is especially the case for the mental health context, where there is a growing trend of relying on technological in…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 7.0

QG-MIL: A Gated Transformer Aggregator for Domain-Agnostic Multiple Instance Learning in Medical Imaging

2026-06-18 · Luca Zedda, Davide Antonio Mura, Cecilia Di Ruberto, Maurizio Atzori, Muhammed Furkan Dasdelen, Carsten Marr, Andrea Loddo

General AI

Attention-based Multiple Instance Learning aggregators in medical imaging are prone to attention concentration, producing overconfident and unstable predictions. We introduce QG-MIL, a gated transformer aggregator that addresses this through four synergistic architectural components: RMSNorm-based pre-normalization, pe…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

TerraMARS: A Domain-Adapted Small-Language-Model Pipeline for Mars Terraforming Literature

2026-06-18 · Jyotsna Singh, Ash Black, Jeff Larsen, Scott R. Saleska

General AI

Researchers are interested in learning about Mars so that it may eventually become habitable for humans. To achieve this, there is a need for comprehensive knowledge of the planet's atmosphere, hydrology, surface chemistry, radiation environment, and spatial features through the scientific literature. These contain val…

Review
pending
Role
unreviewed
Read
later
arxiv Score 6.2

CrossPool: Efficient Multi-LLM Serving for Cold MoE Models through KV-Cache and Weight Disaggregation

2026-06-23 · Zhuoren Ye, Tianyu Wo, Dinghao Xue, Mingming Zhang, Yuchen Teng, Chunming Hu, Renyu Yang

General AI

Emerging LLM services increasingly host many sparse MoE models, yet most models receive sparse requests and remain cold. This creates a GPU memory problem: model weights are stable and model-determined, while KV-cache is transient and demand-determined. Because cold models rarely reach peak KV-cache demand at the same …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.2

InSight: Self-Guided Skill Acquisition via Steerable VLAs

2026-06-23 · Maggie Wang, Lars Osterberg, Stephen Tian, Ola Shorinwa, Jiajun Wu, Mac Schwager

General AI

Vision-language-action (VLA) models can learn manipulation skills from demonstrations, but their capabilities are bounded by the skills in the training data. We present InSight, a framework that unlocks autonomous skill acquisition by rendering VLAs steerable at the primitive-action level (e.g., "move gripper to the bo…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.8

FirstPass: Grounding AI Scientific Judgment in Multi-Round Editorial Outcomes

2026-06-18 · Prabhjot Singh, Somnath Luitel, Manmeet Singh, Josh Durkee

General AI

AI systems for peer review fail on three fronts: they train on Computer Science and Machine Learning venues alone, ignore the iterative dialogue that validates science, and evaluate on stylistic mimicry rather than real editorial judgment. We introduce FirstPass, a dataset and fine-tuned model that addresses all three.…

Review
pending
Role
unreviewed
Read
later
huggingface Score 5.4

FedOT: Ownership Verification and Leakage Tracing via Watermarks for Federated LDMs

2026-06-22 · Wenlong Cheng, Yuan Gan, Yunqiu Xu, Jiaxu Miao

General AI

Training Latent Diffusion Models (LDMs) within Federated Learning (FL) has attracted increasing attention due to its ability to combine the powerful generative capacity of LDMs with the privacy-preserving properties of FL. However, FL requires sharing the global model with multiple participants, which risks unauthorize…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

Subspace-Constrained Federated Learning with Low-Rank Adaptation

2026-06-21 · Neranjan Senarath, Rohit Muralitharan, Sadia Asif

General AI

Federated low-rank adaptation methods are attractive for fine-tuning large models under communication and privacy constraints, but heterogeneous client data can induce geometric misalignment between local low-rank updates. We study whether this subspace misalignment leads to destructive aggregation and slower convergen…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

Bounds for Standard Errors in Combined Data

2026-06-23 · Jooyoung Cha, Yuya Sasaki, Nelson Matthew P. Tan

General AI

We propose methods for constructing lower bounds on the standard errors of parameters estimated from moment conditions obtained across different samples. Sharp explicit bounds are derived by exploiting geometric inequalities when no information about correlations across samples is available. Furthermore, we develop com…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

Material identification using laboratory X-ray beam tracking: quantitativeness and signal-to-noise ratio requirements

2026-06-23 · Sumera Rehman, Ashkan Ajeer, Connor Darling, Marco Endrizzi, Alessandro Olivo, Silvia Cipiccia

General AI

Simultaneous structural and elemental characterisation of a specimen in a non-destructive manner is an instrumental approach with applications in a variety of fields including energy materials, cultural heritage and life sciences. This is routinely performed at synchrotron facilities, e.g. by combining X-ray imaging an…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

Real vs. Complex Spectral Bases for Neural Operators: The Role of Green's Function Alignment

2026-06-23 · Jason Sulskis, Sathya Ravi

General AI

Fourier Neural Operators (FNO) learn solution operators of partial differential equations by parameterizing global convolutions in the complex Fourier domain. For real-valued PDE solutions, the complex FFT carries representational redundancy through conjugate symmetry. We introduce the Hartley Neural Operator (HNO), th…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.2

Stability Checking of Markov Jump Linear Systems via Probabilistic Temporal Logic (Extended Version)

2026-06-23 · Lena Becker, Holger Hermanns

General AI

Markov jump linear systems (MJLSs) model dynamical phenomena subject to random switching among multiple linear modes, driven by an underlying Markov chain. Classical notions such as mean and mean-square stability characterize the long-term asymptotic behaviour of the first and second moments of an MJLS, but they can be…

Review
pending
Role
unreviewed
Read
later