Research Paper Cockpit

Daily Digest - 2026-06-18

Papers first seen in this daily snapshot.

Daily Archives

Quick jump into generated daily digests.

Research Workflow

Latest digest: 2026-07-04.

Papers

40 visible entries

arxiv Score 23.3

Native Active Perception as Reasoning for Omni-Modal Understanding

2026-06-17 · Zhenghao Xing, Ruiyang Xu, Yuxuan Wang, Jinzheng He, Ziyang Ma, Qize Yang, Yunfei Chu, Jin Xu, Junyang Lin, Chi-Wing Fu, Pheng-Ann Heng

General AI

Passive models for long video understanding typically rely on a "watch-it-all" paradigm, processing frames uniformly regardless of query difficulty, causing computational cost to grow with video duration. Although interactive frameworks have emerged, they often rely on global pre-scanning, and their context cost still …

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.3

Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games

2026-06-17 · Shengyuan Ding, Xilin Wei, Xinyu Fang, Haodong Duan, Dahua Lin, Jiaqi Wang, Yuhang Zang

Research Track A · General AI

Deploying multimodal foundation models as closed-loop policies increasingly requires conditioning actions on observations that are no longer visible. However, existing benchmarks either expose the full state, conflate hidden-state reconstruction with other agent skills, or test recall only after an episode has ended. W…

Review
pending
Role
unreviewed
Read
now
huggingface Score 19.5

SciOrch: Learning to Orchestrate Expert LLMs for Solving Frontier Multimodal Scientific Reasoning Tasks

2026-06-14 · Jingru Guo, Xiangyuan Xue, Lian Zhang, Wanghan Xu, Siki Chen, Philip Torr, Wanli Ouyang, Lei Bai, Zhenfei Yin

General AI

Frontier scientific reasoning remains a major challenge for large language models (LLMs), where even the strongest commercial systems fall short of expert-level performance. A closer look at model behavior reveals substantial complementarity that single-model evaluation hides: different frontier models excel on differe…

Review
pending
Role
unreviewed
Read
now
huggingface Score 19.5

Guava: An Effective and Universal Harness for Embodied Manipulation

2026-06-16 · Haowen Liu, Xirui Li, Shaoxiong Yao, Peng Shi, Tianyi Zhou, Jia-Bin Huang, Furong Huang, Jiayuan Mao

General AI

Language models trained on large-scale vision-language data have demonstrated strong potential for embodied agents. Harnessing models through embodied tools use offers a promising alternative to end-to-end vision-language-action systems by combining high-level reasoning with external modules for perception, planning, a…

Review
pending
Role
unreviewed
Read
now
huggingface Score 18.5

IndustryBench-MIPU: Benchmarking Multi-Image Attribute Value Extraction for Industrial Products

2026-06-12 · Haonan Qi, Jin Cao, Yongqi Zhang, Xintong Wang, Weidong Tang, Bin Chen, Chengfu Huo, Haojun Pan, Hengyu You, Jing Li, Yingde Wang, Liang Ding

General AI

Industrial products such as valves and circuit breakers are defined by dense technical specifications that govern procurement, compatibility, and safety across supply chains. These specifications are scattered across multiple heterogeneous product images, including specification tables, nameplates, and technical drawin…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.3

UBP2: Uncertainty-Balanced Preference Planning for Efficient Preference-based Reinforcement Learning

2026-06-17 · Mohamed Nabail, Leo Cheng, Jingmin Wang, Nicholas Rhinehart

General AI

Preference-based RL provides an approach to learning reward models from pairwise comparisons of behaviors, bypassing the need for explicit reward design. However, existing methods typically rely on passive data collection and suffer from poor sample efficiency, especially during the early stages of learning. We introdu…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.3

Diffusion-Proof: Recipe for Formal Theorem Proving Beyond Auto-Regressive Generation

2026-06-17 · Ruida Wang, Rui Pan, Pengcheng Wang, Shizhe Diao, Tong Zhang

General AI

Enhancing the formal math reasoning capabilities of Large Language Models (LLMs) has become a key focus in both mathematical and computer science communities in recent years. While significant progress has been made in using state-of-the-art Auto-Regressive (AR) LLMs for formal theorem proving, these models suffer from…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.3

Enhancing Decision-Making with Large Language Models through Multi-Agent Fictitious Play

2026-06-17 · Leyang Shen, Yang Zhang, Xiaoyan Zhao, Chun Kai Ling, Tat-Seng Chua

General AI

Large language model (LLM)-based multi-agent systems (MAS) have demonstrated great potential in solving tasks with execution complexity, by distributing subtasks across cooperative agents. However, this divide-and-conquer paradigm falls short on decision-making tasks that are also prevalent in the real world. These tas…

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.5

Beyond Scalar Distances: Semantic Attribute Gradients from Frozen MLLMs for Visual Embeddings

2026-06-13 · Shubhang Bhatnagar, Dheeraj Baiju, Narendra Ahuja

General AI

Vision encoders for retrieval are typically trained with class-label supervision: each training pair reduces to a scalar that uniformly pushes the embedding apart or pulls it together, as if every visual attribute either differed or matched. A multimodal large language model (MLLM), shown the same pair, can articulate …

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.5

PAIWorld: A 3D-Consistent World Foundation Model for Robotic Manipulation

2026-06-16 · Yuhang Huang, Xuan Lv, Junyan Xu, Zhiyuan Yu, Jiazhao Zhang, Ruizhen Hu, Wancheng Feng, Shilong Zou, Hewen Xiao, Ziqiao Zhou, Kaiyun Huang, Zhiyu Peng, Juzhan Xu, Hang Zhao, Chenyang Zhu, Renjiao Yi, Yifei Huang, Douhui Wu, Yan Zhang, Kexu Cheng, Chunhe Song, Yunzhi Xue, Xiuhong Zhang, Leitao Guo, Yunji Chen, Bin Wu, Haibin Yu, Kai Xu

General AI

World foundation models (WFMs) are powerful simulators, yet they predominantly operate in a single-view setting and lack the multi-view 3D consistency required for robotic manipulation. While robotic systems rely on multiple cameras (egocentric, eye-to-hand, and wrist-mounted) for policy learning, current multi-view wo…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

A Unified Framework for Efficient Remote Sensing Visual Question Answering: Adapting Dual, Hybrid, and Encoder-Decoder Architectures

2026-06-17 · Timothy Agboada, Shikha Chandel, Yadav Raj Ghimire, Leila Hashemi-Beni

General AI

Visual Question Answering (VQA) in the Remote Sensing (RS) domain presents unique challenges due to the high resolution, multi scale object distribution, and semantic complexity of aerial imagery. While general domain Foundation Models have achieved remarkable success, their direct application to RSVQA is hindered by m…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

Learning User Simulators with Turing Rewards

2026-06-17 · Yingshan Susan Wang, Cedegao E. Zhang, Linlu Qiu, Zexue He, Pengyuan Li, Alex Pentland, Roger P. Levy, Yoon Kim

General AI

Learning to simulate human users in interactive settings could advance the training of agent assistants, evaluation of personalization systems, research in the social sciences, and more. Existing approaches generally do so by training a large language model (LLM) to match a single ground truth response, either by maxim…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

Rethinking Reward Supervision: Rubric-Conditioned Self-Distillation

2026-06-17 · Siyi Gu, Jialin Chen, Sophia Zhou, Arman Cohan, Rex Ying

General AI

Post-training of reasoning language models is commonly driven by supervised distillation and reinforcement learning with verifiable rewards. Distillation often relies on chain-of-thought annotations that are expensive to obtain and may themselves be noisy, incomplete, or partially incorrect; even when the final solutio…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

Data Intelligence Agents: Interpreting, Modeling, and Querying Enterprise Data via Autonomous Coding Agents

2026-06-17 · Anoushka Vyas, Aarushi Dhanuka, Sina Khoshfetrat Pakazad, Henrik Ohlsson

General AI

Production data integration is bottlenecked by repeated, lossy handoffs between data owners, engineers, and analysts who must collaboratively discover, structure, and query enterprise data. We present Data Intelligence Agents (DIA), a system of three agents (Data Interpreter, Schema Creator, and Query Generator) that c…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

HT-Bench: Benchmarking and Learning Dexterous Full-Hand Tactile Representations with Egocentric Vision

2026-06-17 · Yuzhe Huang, Jiaping Wu, Jiaming Jiang, Hezhe Lin, Aikebaier Aierken, Yunlong Wang, Kun Cheng, Ziyuan Jiao, Yuanxin Zhong

General AI

Establishing a universal benchmark for tactile representation learning in robotic manipulation remains challenging due to the diversity of tactile sensor designs, data formats, and robot embodiments. Rather than seeking to establish such, we explore a scalable and promising direction for future development: egocentric …

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.3

A Mixed-Reality Testbed for Autonomous Vehicles

2026-06-17 · H. M. Sabbir Ahmad, Ehsan Sabouni, Emrullah Celik, Zean Wan, Damola Ajeyemi, Christos G. Cassandras, Wenchao Li

General AI

We propose a mixed-reality, hardware-in-the-loop (HIL) testbed for autonomous vehicles that seamlessly integrates a physical testbed of mobile robots with a high-fidelity simulation environment. The virtual simulation enables the creation of diverse, safety-critical driving scenarios to validate state-of-the-art percep…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.5

Reinforcing Dual-Path Reasoning in Spatial Vision Language Models

2026-06-16 · Yatai Ji, An-Chieh Cheng, Yang Fu, Yukang Chen, Han Zhang, Zhaojing Yang, Wei Huang, Ka Chun Cheung, Song Han, Vidya Nariyambut Murali, Pavlo Molchanov, Jan Kautz, Simon See, Hongxu Yin, Ping Luo, Sifei Liu

General AI

Spatial VLMs have made substantial progress in geometric perception, yet complex spatial reasoning requiring multi-step inference over depth, distance, and scene relations remains challenging. Moreover, different spatial queries call for fundamentally different strategies: some are best addressed through purely linguis…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.5

Sumi: Open Uniform Diffusion Language Model from Scratch

2026-06-17 · Mengyu Ye, Keito Kudo, Wataru Ikeda, Ryosuke Matsuda, Keisuke Sakaguchi, Jun Suzuki

General AI

Diffusion models have become a promising alternative to autoregressive models. Among these, uniform diffusion language models (UDLMs) permit any token to be updated at any step, in principle enabling more flexible generation. However, no UDLM has yet been pretrained from scratch at both large parameter scale and large …

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

A Multi-Domain Benchmark for Detecting AI-Generated Text-Rich Images from GPT-Image-2

2026-06-17 · Yijin Wang, Shuyi Wang, Wenhan Zhang, Yuqi Ouyang

General AI

Text-rich images often contain privacy-sensitive, transactional, or decision-relevant information. As recent multimodal image generation models become increasingly capable of synthesizing realistic textual content and structured visual designs, detecting AI-generated text-rich images has become an important challenge f…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

Does VLA Even Know the Basics? Measuring Commonsense and World Knowledge Retention in Vision-Language-Action Models

2026-06-17 · Nikita Kachaev, Andrey Moskalenko, Matvey Skripkin, Nikita Kurlaev, Daria Pugacheva, Albina Burlova, Mikhail Kolosov, Denis Shepelev, Andrey Kuznetsov, Elena Tutubalina, Aleksandr I. Panov, Alexey K. Kovalev, Vlad Shakhuro

Research Track A · General AI

Embodied Vision-Language-Action (VLA) models are typically obtained by fine-tuning powerful pretrained VLMs on robotics data, yet it is unclear how much commonsense and factual knowledge they retain after adaptation. Failures on knowledge-sensitive tasks are ambiguous, conflating missing knowledge with poor generalizat…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

Trade-offs in Medical LLM Adaptation: An Empirical Study in French QA

2026-06-17 · Ikram Belmadani, Oumaima El Khettari, Carlos Ramisch, Frederic Bechet, Richard Dufour, Benoit Favre

General AI

The development of large language models (LLMs) has led to an increased focus on their adaptation to specialized domains and languages, yet the effectiveness of domain adaptation strategies remains unclear. We present a study of medical domain adaptation using French medical question-answering (QA) as a case study. We …

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.5

Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness

2026-06-17 · Zijian Wang, Hanqi Li, Ziyue Yang, Zijian Hu, Shenghan Zuo, Yunzhe Zhang, Da Ma, Danyu Luo, Chenrun Wang, Jing Peng, Tiancheng Huang, Sijia Guo, Huayang Wang, Zichen Zhu, Senyu Han, Yilu Cao, Kai Yu, Lu Chen

General AI

AI systems can increasingly automate scientific workflows, but the reasoning that links prior evidence, generated ideas, experiments and final claims often remains implicit inside model inference. Here we introduce Xcientist, a research harness that externalizes research synthesis and experimental validation into inspe…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.5

HANSEL: Extracting Breadcrumbs from Web Agent Trajectories for Interactive Verification

2026-06-17 · Yujin Zhang, Daye Nam

Research Track B · General AI

AI web agents can perform complex, multi-step tasks such as searching for products, comparing options, and making purchases on behalf of users. However, verifying the correctness of an agent's output remains difficult. Existing transparency mechanisms, including full trajectory logs, source links, screenshots, and LLM-…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

Invertible Neural Network Adapter for One-Step Flow Matching in Robot Manipulation

2026-06-17 · Yu Zhang, Kangyi Ji, Yongxiang Zou, Rongtao Xu, Feng Zheng, Long Cheng

General AI

This paper presents an invertible neural network adapter for general robotic manipulation, designed to generate precise high-dimensional actions conditioned on multimodal observations, including visual, linguistic, and proprioceptive inputs, through a one-step denoising process. Built upon a flow-matching formulation, …

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

Structured Inference with Large Language Gibbs

2026-06-17 · Sanghyeok Choi, Henry Gouk, Esmeralda S. Whitammer

General AI

The knowledge encoded in large language models (LLMs) can serve as a substrate for structured reasoning over variables describing a complex world, but accessing this knowledge in a probabilistically coherent manner poses a difficult inference problem. We propose Large Language Gibbs, a scheme for structured probabilist…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

X+Slides: Benchmarking Audience-Conditioned Slide Generation

2026-06-17 · Haodong Chen, Xuanhe Zhou, Wei Zhou, Xinyue Shao, Yanbing Zhu, Bo Wang, Jiawei Hong, Anya Jia, Fan Wu

General AI

Automatically generating slide decks from source documents is an important application of large language models (LLMs). Existing benchmarks primarily assess slide completeness and technical depth, while overlooking the target audience as a critical real-world factor. For instance, specialists demand rigorous proofs, wh…

Review
pending
Role
unreviewed
Read
now
arxiv Score 8.3

CodeSentinel: A Three-Layer Defense Against Indirect Prompt Injection in Code Contexts

2026-06-17 · Po-Han Cheng, Chia-Mu Yu, Ying-Dar Lin, Yu-Sung Wu, Wei-Bin Lee

General AI

Code large language models increasingly retrieve external code context from repositories, documentation, issue threads, and coding-agent environments, creating an indirect prompt-injection surface where attackers hide instructions in comments, strings, identifiers, or decoy code. We propose CodeSentinel, a three-layer …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 7.5

Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems

2026-06-04 · Shaoyang Xu, Jingshen Zhang, Long P. Hoang, Jinyuan Li, Wenxuan Zhang

General AI

Multicultural multi-agent systems are increasingly deployed in globally diverse settings, where different agents are grounded in different cultural backgrounds. Existing cultural evaluation focuses on value alignment: how closely a single agent matches a target culture. Yet alignment is a per-agent property and cannot …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 7.5

CEO-Bench: Can Agents Play the Long Game?

2026-06-16 · Haozhe Chen, Karthik Narasimhan, Zhuang Liu

General AI

Language model agents are becoming proficient executors at isolated, short-horizon tasks such as software engineering and customer service. Yet real-world challenges require a combination of sophisticated skills that remain largely untested in agents: (1) navigating long horizons amid uncertainty; (2) acquiring informa…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 7.5

Kairos: A Native World Model Stack for Physical AI

2026-06-16 · Kairos Team, Fei Wang, Shan You, Qiming Zhang, Tao Huang, Zuoyi Fu, Zhisheng Zheng, Yunlong Xi, Feng Lv, Xiaoming Wu, Zeyu Liu, Cong Wan, Pu Li, Ruiqing Yang, Xiaoou Li, Wei Wang, Kangkang Zhu, Yuwei Zhang, Shi Fu, Zheng Zhang, Xiaoning Wu, Xuzeng Fan, Dacheng Tao, Xiaogang Wang

General AI

World models are transitioning from passive visual generators to foundational, operational infrastructure for Physical AI: they must natively acquire world knowledge from heterogeneous experience, maintain persistent states over long horizons, and execute efficiently within real deployment constraints. We introduce Kai…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.5

Splaxel: Efficient Distributed Training of 3D Gaussian Splatting for Large-scale Scene Reconstruction via Pixel-level Communication

2026-06-17 · Wenqi Jia, Zhewen Hu, Ying Huang, Yu Gong, Stavros Kalafatis, Yuke Wang, Wei Niu, Chengming Zhang, Ang Li, Sheng Di, Yuede Ji, Bo Fang, Miao Yin

Research Track A

3D Gaussian Splatting (3DGS) enables high-fidelity and real-time 3D scene reconstruction, but scaling training to large-scale scenes requires optimizing hundreds of millions of Gaussians across multiple GPUs. Existing distributed approaches either partition scenes into isolated regions, causing global inconsistency, or…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Reference-Driven Multi-Speaker Audio Scene Generation from In-the-Wild Priors

2026-06-17 · Michael Finkelson, Daniel Segal, Eitan Richardson, Shahar Armon, Nani Goldring, Poriya Panet, Nir Zabari, Benjamin Brazowski, Or Patashnik, Yoav HaCohen

General AI

Existing multi-speaker dialogue systems bind speakers to utterances through structured supervision: per-turn tags, multi-stream transcriptions, or learnable speaker embeddings. These systems operate within speech-only pipelines that produce clean vocal sequences without the ambient texture of real conversations. We tak…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Risk Stratification for ICU Delirium using Pervasive Ambient Sensing Information

2026-06-17 · Jiaqing Zhang, Sabyasachi Bandyopadhyay, Miguel Contreras, Jessica Sena, Yuanfang Ren, Andrea Davidson, Ziyuan Guan, Tezcan Ozrazgat-Baslanti, Subhash Nerella, Azra Bihorac, Parisa Rashidi

General AI

Delirium is a common and serious complication in the Intensive Care Unit (ICU), associated with increased morbidity, prolonged hospital stays, and higher healthcare costs. Despite its prevalence, early prediction and prevention remain challenging. Environmental factors such as ambient sound and light may influence the …

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.5

Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding

2026-06-16 · Jingyuan Huang, Zuming Huang, Yucheng Shi, Tianze Yang, Xiaoming Zhai, Wei Chu, Ninghao Liu

General AI

Graphical user interface (GUI) grounding requires vision-language models (VLMs) to identify small target elements in high-resolution screenshots and predict precise screen coordinates. On-policy self-distillation (OPSD) is a promising post-training approach for this coordinate-sensitive task, since it provides dense to…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.5

Physics-IQ Verified

2026-06-17 · Tim Rädsch, Yuki M Asano, Hilde Kuehne, Stefan Bauer, Priyank Jaini, Robert Geirhos, Carsten T. Lüth

General AI

Video generative models ( VGMs) have become a new frontier that can be used not just for video generation but for a multitude of downstream tasks, including world modeling. To advance these tasks, a good video model must understand the physical reality of the world. Evaluating this understanding is an emerging field an…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

CABLE: Cloud-Assisted Bandwidth-efficient LMM-based Encoding for V2X Systems

2026-06-17 · Haohua Que, Zhipeng Bao, Qianyi Wu, Handong Yao

General AI

Cloud-hosted large multimodal models (LMMs) can provide strong open-vocabulary perception for Vehicle-to-Everything systems, but naively transmitting full-resolution frames from edge to cloud causes severe communication overhead and high cloud-side prefill latency. We present CABLE, a cloud-assisted bandwidth-efficient…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

Optimal scenario design for climate emulation

2026-06-17 · Christopher B. Womack, Shahine Bouabid, Andrei Sokolov, Popat Salunke, Glenn Flierl, Sebastian D. Eastham, Noelle E. Selin

General AI

As deep learning for physical systems continues to grow in popularity, efforts to improve generalizability have primarily focused on designing architectures that embed physical constraints. However, for machine-learning surrogate climate models (emulators), we show that the low structural diversity in existing scenario…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

Zero-Shot Long-Horizon Dexterous Manipulation via Multi-View 3D-Grounded VLM Reasoning

2026-06-17 · Jisoo Kim, Sangwon Baik, Taeksoo Kim, Sungjoo Kim, Junyoung Lee, Mingi Choi, Hanbyul Joo

General AI

We present a zero-shot framework for long-horizon dexterous manipulation that grounds language instructions into executable 3D task plans from calibrated multi-view RGB images. Rather than training an end-to-end policy, our system uses a vision-language model (VLM) to produce reference-frame task grounding and primitiv…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 4.3

Accelerating Network-Agent Dispersion: Territorial Behavior and Directionally Biased Lazy Random Walks

2026-06-17 · Li Zeng, Steve Alpern

General AI

Territorial behavior can greatly accelerate decentralized agent dispersion on networks. This paper studies a network-agent dispersion problem in which m autonomous agents move in discrete time on a connected graph and seek a configuration in which no two agents occupy the same node. We focus on the dispersion case m = …

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Robust and Interpretable Adaptation of Equivariant Materials Foundation Models via Sparsity-promoting Fine-tuning

2026-06-17 · Youngwoo Cho, Seunghoon Yi, Wooil Yang, Sungmo Kang, Young-woo Son, Jaegul Choo, Joonseok Lee, Soo Kyung Kim, Hongkee Yoon

General AI

Pre-trained materials foundation models, or machine learning interatomic potentials, leverage general physicochemical knowledge to effectively approximate potential energy surfaces. However, they often require domain-specific calibration due to physicochemical diversity as well as mismatches between practical computati…

Review
pending
Role
unreviewed
Read
later