Research Paper Cockpit

Daily Digest - 2026-06-16

Papers first seen in this daily snapshot.

Daily Archives

Quick jump into generated daily digests.

Research Workflow

Latest digest: 2026-07-04.

Papers

67 visible entries

arxiv Score 26.0

Are Online Skill and Memory Modules Always Worth Their Tokens? A Budget-Constrained Study of Web Agents

2026-06-12 · Sina Hajimiri, Masih Aminbeidokhti, Jose Dolz, Ismail Ben Ayed, Issam H. Laradji, Spandana Gella, Nicolas Gontier

Research Track B · General AI

Online web agents often augment a base actor with memory, workflow, or skill modules. These modules can improve performance, but they also consume test-time tokens, a cost rarely reported alongside the actor's inference cost. We study online augmentation, where this overhead is paid on every task, and re-evaluate its b…

Review
pending
Role
unreviewed
Read
now
arxiv Score 24.3

Context-Aware RL for Agentic and Multimodal LLMs

2026-06-15 · Peiyang Xu, Bangzheng Li, Sijia Liu, Karthik R. Narasimhan, Pramod Viswanath, Prateek Mittal, Xingyu Fu

General AI

Large language models (LLMs) often fail when answering requires identifying a small but decisive piece of evidence within a long or complex context, such as a single line in a tool trace or a subtle detail in an image. We propose ContextRL, a context-aware reinforcement learning (RL) method that improves long-horizon r…

Review
pending
Role
unreviewed
Read
now
arxiv Score 20.3

Gen-VCoT: Generative Visual Chain-of-Thought Reasoning via Diffusion-Based RGB Intermediate Representations

2026-06-15 · Zhiqiang Zhou, Junliang Dai, Xu ling

General AI

Multimodal large language models (MLLMs) excel at visual reasoning but rely on text-based chain-of-thought (CoT), lacking interpretable visual intermediates. Existing methods use opaque tokens or external tools, missing key properties. We propose Gen-VCoT, a framework using expert vision models to generate RGB images a…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.0

Bridging Geographic Bias in Urban Streetscape Inference via Lifelong Learning with Visual-Semantic Pivoting

2026-06-13 · Xinze Zhang

Research Track A · General AI

Visual perception of urban streetscapes underpins evidence-based decisions in landscape planning, public health, and place-making. Yet models trained on a few well-photographed metropolises systematically misjudge underrepresented districts, propagating geographic bias into downstream policy. We address this gap with H…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.0

KeepLoRA++: Continual Learning with Layer-Scaled Residual Gradient Adaptation

2026-06-15 · Mao-Lin Luo, Yi-Lin Zhang, Zi-Hao Zhou, Yankun Hong, Xialiang Tong, Mingxuan Yuan, Tong Wei, Min-Ling Zhang

Research Track A · General AI

Continual learning for pre-trained vision-language models requires balancing three competing objectives: retaining pre-trained knowledge, preserving knowledge from a sequence of learned tasks, and maintaining the plasticity to acquire new knowledge. This paper presents KeepLoRA++, balancing these objectives through a u…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.5

Learning New Tasks via Reusable Skills: Skill-Compositional Experts for Embodied Continual Learning

2026-06-14 · Shuaike Zhang, Shaokun Wang, Haoyu Tang, Jianlong Wu, Liqiang Nie

Research Track A · General AI

Embodied Continual Learning (ECL) aims to enable robots to continually acquire new manipulation tasks while retaining previously learned behaviors under closed-loop control. Compared with conventional continual learning, ECL suffers from more severe catastrophic forgetting. Feature drift accumulated under closed-loop c…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.3

Consensus-based Agentic Large Language Model Framework for Harmonized Tariff Schedule Code Classification

2026-06-15 · Truong Thanh Hung Nguyen, Khanh Van Quynh Nguyen, Hoang-Loc Cao, Tri Duong, Phuc Ho, Van Pham, Loc Nguyen, Hung Cao

General AI

Accurate Harmonized Tariff Schedule (HTS) code classification is essential for customs clearance, duty assessment, trade statistics, and regulatory compliance in maritime logistics. However, exact HTS classification remains challenging because product descriptions are often short, incomplete, or ambiguous, while correc…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.0

LabOSBench: Benchmarking Computer Use Agents for Scientific Instrument Control

2026-06-15 · Anqi Zou, Han Deng, Chengyu Zhang, Junquan Hu, Yu Wang, Yuxiang Xing, Aokai Zhang, Hanling Zhang, Zhaoyang Liu, Ben Fei, Zhihui Wang, Wanli Ouyang

Research Track B · General AI

Current computer-use benchmarks primarily focus on software operation tasks in virtualized systems, whereas scientific instrumentation scenarios require coordinated control over complex interfaces, and feedback-driven parameter adjustment. However, directly evaluating agents on physical high-precision instruments is im…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.3

Benchmarking LLM Agents on Meta-Analysis Articles from Nature Portfolio

2026-06-15 · Anzhe Xie, Weihang Su, Yujia Zhou, Yiqun Liu, Qingyao Ai

General AI

Meta-analysis is a demanding form of evidence synthesis that combines literature retrieval, PI/ECO-guided study selection, and statistical aggregation. Its structured, verifiable workflow makes it an ideal substrate for evaluating systematic scientific reasoning, yet existing benchmarks lack ground truth across the ful…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.3

DEEPRUBRIC: Evidence-Tree Rubric Supervision for Efficient Reinforcement Learning of Deep Research Agents

2026-06-15 · Minghang Zhu, Chuyang Wei, Junhao Xu, Yilin Cheng, Zhumin Chen, Jiyan He

General AI

Deep research agents synthesize long-form reports by searching and reasoning over retrieved evidence. Reinforcement learning with rubric-based rewards improves these agents by optimizing them against checkable criteria that translate report quality into reward signals, but its efficiency depends on whether those criter…

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.3

What Should a Streaming Video Model Remember?

2026-06-15 · Haonan Ge, Yiwei Wang, Hang Wu, Yujun Cai

Research Track A · General AI

Streaming video understanding models must answer queries at any moment during an ongoing stream, using only what they have observed so far and under fixed memory and computation budgets. Existing methods address this by adding memory banks, retrieval modules, or visual token compression to preserve long-range history. …

Review
pending
Role
unreviewed
Read
now
arxiv Score 17.0

Continual Backdoor Training in IoT/CPS

2026-06-12 · Oxana Salish, Kuniyilh S

Research Track A

Internet of Things (IoT) and Cyber-physical systems (CPS) increasingly rely on continual learning (CL) to adapt to evolving environments, device heterogeneity, and concept drift, thereby improving overall utility. While continual adaptation is essential for long-lived IoT deployments where data patterns evolve, it also…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.5

Medical Heuristic Learning: An LLM-Driven Framework for Interpretable and Auditable Clinical Decision Rules

2026-06-15 · Wei Xu, Ke Yang, Gang Luo, Keli Zheng, Lingyan Hu, Jing Wang, Kefeng Li

Research Track A · General AI

Predictive modeling for clinical tabular data is central to clinical decision support and therefore requires not only strong predictive performance but also transparent decision logic. Although deep learning and tree-based ensemble methods can achieve high accuracy, their black-box nature remains a major obstacle to cl…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.3

DYNA : Dynamic Episodic Memory Networks for Augmenting Large Language Models with Temporal Knowledge Graphs in Continuous Learning

2026-06-14 · Ali Sarabadani, Mahtab Tajvidiyan

Research Track A · General AI

Large Language Models (LLMs) struggle to incorporate new knowledge without forgetting or costly retraining. We propose DYNA, a lightweight framework that augments a frozen LLM with a temporal knowledge graph where events are nodes and temporal relations are directed, timestamped edges. The graph serves as an external, …

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.3

Compositional Reasoning Depth Predicts Clinical AI Failure: Empirical Evidence Consistent with Transformer Compositionality Limits in Electronic Health Record Question Answering

2026-06-15 · Sanjay Basu

General AI

Aggregate accuracy benchmarks conceal a systematic structure in how large language models fail at electronic health record (EHR) question answering: questions requiring more inferential steps produce disproportionately more errors. Motivated by theoretical results on transformer compositionality limits, we introduce a …

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.3

Exploring Extrinsic and Intrinsic Properties for Effective Reasoning with Code Interpreter

2026-06-15 · Patomporn Payoungkhamdee, Napat Laosaengpha, Jenta Wonglertsakul, Pittawat Taveekitworachai, Pume Tuchinda, Panjapong Poobanchuen, Ekapol Chuangsuwanich, Can Udomcharoenchaikit, Samuel Cahyawijaya, Peerat Limkonchotiwat, Sarana Nutanong

General AI

Reasoning with a Code Interpreter (CI) has emerged as an effective paradigm for enhancing the reasoning capabilities of large language models (LLMs) through executable computation and iterative verification. Despite its growing adoption, the behavioral properties underlying effective code reasoning remain largely under…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.0

Understanding Cross-Modal Contributions in Continual Vision-Language Models: A Theoretical Perspective

2026-06-12 · Salimeh Sekeh, Mary Wisell

Research Track A · General AI

Continual vision-language models are commonly addressed through sequential fine-tuning; however, although this paradigm enables adaptation to new environments (tasks), it inherently emphasizes the contribution of previously learned environments (tasks) at the expense of the stability required to preserve previously acq…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

LLM4RTL: Tool-Assisted LLM for RTL Generation

2026-06-13 · Jing Jin, Robert Chu, Ning Yan, Masood S. Mortazavi

General AI

Large language models (LLMs) have facilitated impressive progress in software engineering, code generation, tooling, and systems. Concurrently, a significant body of research has developed which explores a growing variety of methods and systems for applying LLMs to hardware and chip design (e.g., systems for RTL code g…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

Evolution & Foundation: AI Shares Creative Control

2026-06-15 · Dylan Banarse, Stephen Todd, William Latham, Frederic Fol Leymarie

General AI

This paper investigates the creative process of automated design and artistic evaluation using an evolutionary system. We consider how a multimodal artificial intelligence (AI) model can communicate and guide a combined generative and evolutionary computational system. This creates a framework for the evolution of aest…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.3

FraudSMSWalker: Benchmarking Agentic Large Language Models for SMS-to-Webpage Fraud Detection

2026-06-15 · Y. H. Zhou, Z. M. Ma, Y. J. Zhou, Y. T. Li, H. X. Xiang, Y. M. Cheng, T. L. Chen, K. J. Zhang, Z. H. Nan, J. H. Ni, Z. Wu, Q. Y. Pan, S. Zhang, S. Cheng, M. Y. Luo

Research Track B · General AI

SMS fraud is increasingly cross-channel: a message directs the user to a webpage, and the final risk depends on how the SMS claim aligns with the page content and requested user action. However, existing evaluations either focus on message-only smishing classification or expose URL and domain cues that allow models to …

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.3

LESS Is More: Mutual-Stability Sampling for Diffusion Language Models

2026-06-15 · Amr Mohamed, Guokan Shang, Michalis Vazirgiannis

General AI

Diffusion large language models (dLLMs) offer a promising alternative to autoregressive decoding by iteratively refining masked sequences, enabling parallel token updates and bidirectional conditioning. Their practical efficiency, however, is limited by sampling procedures that execute a fixed number of reverse denoisi…

Review
pending
Role
unreviewed
Read
now
huggingface Score 12.5

DreamX-World 1.0: A General-Purpose Interactive World Model

2026-06-15 · DreamX Team, Yancheng Bai, Rui Chen, Xiangxiang Chu, Rujing Dang, Hao Dou, Bingjie Gao, Qiwen Gu, Siyu Hong, Jiachen Lei, Geng Li, Jifan Li, Ruimin Lin, Qingfeng Shi, Bingze Song, Lei Sun, Jing Tang, Ruitian Tian, Jun Wang, Jiahong Wu, Pengfei Zhang, Shen Zhang, Jiashu Zhu

General AI

DreamX-World 1.0 is a general-purpose interactive text/image-to-video world model for controllable long-horizon generation. It supports camera navigation, revisits to previously observed regions, and promptable events across photorealistic, game-style, and stylized domains. Our data engine combines camera-accurate Unre…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.3

TokenPilot: Cache-Efficient Context Management for LLM Agents

2026-06-15 · Buqiang Xu, Zirui Xue, Dianmou Chen, Chenyang Fu, Chiyu Wu, Caiying Huang, Chen Jiang, Jizhan Fang, Xinle Deng, Yijun Chen, Yunzhi Yao, Xuehai Wang, Jin Shang, Gong Yu, Ningyu Zhang

General AI

As LLM agents are deployed in long-horizon sessions, context accumulation drives up inference costs. Existing approaches utilize text pruning or dynamic memory eviction to minimize token footprints; however, their unconstrained sequence mutations alter layouts, introducing prefix mismatches and cache invalidation. This…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.0

GRASP: Gradient-Aligned Sequential Parameter Transfer for Memory-Efficient Multi-Source Learning

2026-06-12 · Mary Isabelle Wisell, Nicholas Jacobs, Aayush Manandhar, Salimeh Yasaei Sekeh

Research Track A · General AI

Multi-source transfer learning faces a fundamental scalability bottleneck: existing approaches require either loading all K source models into memory simultaneously during parameter fusion, requiring O(K) memory, or deploying all models at inference time, making production deployment infeasible. We propose GRASP (Gradi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

A Multi-Center Benchmark for Abdominal Disease Diagnosis and Report Generation from Non-Contrast CT

2026-06-15 · Mariam Elbakry, Aliaa Sayed Sheha, Salma Hassan Tantawy, Aya Yassin, Concetto Spampinato, Karim Lekadir, Xiaomeng Li, Marawan Elbatel

General AI

Multiphasic contrast-enhanced CT (CECT) is widely used for abdominal lesion characterization, yet it carries inherent risks of contrast-induced nephropathy, escalates acquisition burden, and heavily contributes to radiologist workload. To address these challenges, we introduce a novel multi-center benchmark for multi-o…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.3

Selection Without Signal, Recovery Through Expression: A Measurement Study of Post-Hoc Falsification Operators for Frozen Small Code Models

2026-06-15 · Mehmet Iscan

General AI

Frozen small code models (<=1.5B parameters, run locally without fine-tuning) suit offline and privacy-constrained use, but often emit plausible-but-wrong programs. A natural remedy is a post-hoc operator that selects, verifies, repairs, or re-processes the model's samples without retraining; in principled form it is P…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.2

Where Did It Go Wrong? Process-Level Evaluation of Web Agents with Semantic State Tracking

2026-04-08 · Jiwan Chung, JiHyuk Byun, Vibhav Vineet, Seon Joo Kim

Research Track B · General AI

Web agents act through long interaction sequences, yet existing benchmarks evaluate only terminal success, discarding all process information and offering little guidance on improvement. In this work, we conduct a process-level analysis of web agents. We introduce WebStep, a benchmark of 1,800 task instances with contr…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 10.5

Minim: Privacy-Aware Minimal View for Agents via Trusted Local Sanitization

2026-06-11 · Hexuan Yu, Chaoyu Zhang, Heng Jin, Shanghao Shi, Ning Zhang, Y. Thomas Hou, Wenjing Lou

Research Track B · General AI

Modern LLM-powered autonomous agents increasingly rely on rich user interface (UI) state observations to achieve reliable action grounding in complex digital environments. However, many deployments transmit the full UI state to remote inference servers even when most elements are irrelevant to the current task, which c…

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.5

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

2026-06-12 · Chenxin Li, Zhengyao Fang, Zhengyang Tang, Pengyuan Lyu, Xingran Zhou, Xin Lai, Fei Tang, Liang Wu, Yiduo Guo, Weinong Wang, Junyi Li, Yi Zhang, Yang Ding, Huawen Shen, Sunqi Fan, Shangpin Peng, Zheng Ruan, Anran Zhang, Benyou Wang, Chengquan Zhang, Han Hu

General AI

Phone agents are increasingly expected to complete real mobile workflows rather than merely predict the next screen action. However, much of the current mobile-agent literature still evaluates agents primarily as GUI controllers that observe a screen, emit taps and swipes, and are scored by target app state. Real phone…

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.5

Who Flips? Self- and Cross-Model Counterarguments Reveal Answer Instability in LLMs

2026-06-14 · Nafiseh Nikeghbal, Amir Hossein Kargaran, Shaghayegh Kolli, Jana Diesner

General AI

Standard accuracy benchmarks are designed to test how closely large language models (LLMs) approach correct answers, but are not suitable for testing whether LLMs stick with a correct answer when that answer is challenged by a plausible counter-argument. We introduce a controlled protocol for evaluating answer stabilit…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

Bridging the Usability Gap: Lessons from Interpreting Studies for Machine Interpreting Design

2026-06-14 · Claudio Fantinuoli

General AI

Machine interpreting (MI), the live, real-time branch of speech translation, has achieved remarkable progress on standard benchmarks, with some systems approaching human parity on textual fidelity. Yet the user experience remains far inferior to interpreter-mediated communication, revealing what we term the \emph{accur…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

Agent trajectories as programs: fingerprinting and programming coding-agent behavior

2026-06-15 · Hamidah Oderinwale

General AI

Benchmark scores tell you what an agent got right; they do not tell you how it got there. In this work, we introduce methods for comparing agents procedurally in different contexts, where the model, tasks, and approaches vary. We compare ten agents and find that they are identifiable by their behavioral habits, which w…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

Bayesian Inference and Decision Audits for Public Archives of Frontier AI Evaluations

2026-06-15 · Yanan Long

General AI

Public AI evaluations are often read as terminal leaderboards, yet the underlying evidence is a selective time series shaped by reporting rules, benchmark revisions, and missingness. Repeated public archives for LiveBench and Open LLM Leaderboard v2 serve as the primary longitudinal record; LMArena provides a preferenc…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.3

Geometric Action Model for Robot Policy Learning

2026-06-15 · Jisang Han, Seonghu Jeon, Jaewoo Jung, René Zurbrügg, Honggyu An, Tifanny Portela, Marco Hutter, Marc Pollefeys, Seungryong Kim, Sunghwan Hong

General AI

Generalist robot policies must follow user instructions while reasoning about how objects, cameras, and robot actions interact in the 3D physical world. Recent vision-language-action models (VLAs) and video world-action models (WAMs) inherit strong semantic or temporal priors from large-scale foundation models, but the…

Review
pending
Role
unreviewed
Read
now
huggingface Score 9.5

Implicit Reasoning for Large Language Model-based Generative Recommendation

2026-06-15 · Yinhan He, Liam Collins, Bhuvesh Kumar, Jundong Li, Neil Shah, Donald Loveland

General AI

Large Language Models (LLMs) are increasingly adopted as backbones for Generative Recommendation (GR), promising access to pretrained world knowledge. Yet reliably invoking this knowledge for GR remains poorly understood. A key obstacle is that LLM-based GR typically represents items with Semantic IDs (SIDs), disruptin…

Review
pending
Role
unreviewed
Read
now
huggingface Score 9.5

PermaVid: Consistent Video Generation Across Edits via Disentangled Context Memory

2026-06-15 · Shuai Yang, Bingjie Gao, Ziwei Liu, Jiaqi Wang, Dahua Lin, Tong Wu

General AI

Consistent video generation under editing operations requires persistence: when edits modify scene appearance or layout, subsequent generations should remain coherent across time and viewpoints. However, existing memory designs struggle to maintain long-term consistency after such modifications, as stored contexts may …

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.3

A RAG-Enhanced Bi-Level Cognitive Orchestration Framework for LEO Satellite Networks

2026-06-13 · Yuhong Jiang, Zhishu Shen, Tong Yin, Qiushi Zheng, Yichao Jin, Fidan Mehmeti, Jiong Jin

General AI

The rapid growth of remote sensing data in Low Earth Orbit (LEO) satellite networks is increasingly constrained by limited downlink capacity to terrestrial networks. Satellite edge computing alleviates this pressure by enabling in-orbit data processing. However, it introduces a new challenge of spatio-temporal resource…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.3

R2RDreamer: 3D-aware Data Augmentation for Spatially-generalized 2D Manipulation Policies

2026-06-15 · Xiuwei Xu, Haowen Sun, Angyuan Ma, Yiwei Zhang, Zhenyu Wu, Xiaofeng Wang, Bingyao Yu, Zheng Zhu, Jie Zhou, Jiwen Lu

General AI

Spatial generalization is critical for imitation-learned manipulation policies, but achieving it typically requires scaling demonstrations across diverse object poses, robot configurations, and camera viewpoints. Data augmentation from a few source demonstrations offers a practical alternative to costly real-world coll…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.3

Scalable Circuit Learning for Interpreting Large Language Models

2026-06-15 · Naiyu Yin, Dennis Wei, Tian Gao, Amit Dhurandhar, Karthikeyan Natesan Ramamurthy, Yue Yu

General AI

A prominent research direction in mechanistic interpretability is learning sparse circuits over LLM components to reveal how they jointly produce model behavior. However, raw neurons are polysemantic, making learned circuits hard to interpret. Sparse autoencoder (SAE) features alleviate this, but their high dimensional…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.0

A Compositional Framework for Open-ended Intelligence

2026-06-13 · Ida Momennejad, Roberta Raileanu

Research Track A

Open-ended intelligence is the capacity to adapt to novel problems and environments that are substantially different from those in training. A mathematics of open-ended intelligence requires two pillars: first, a minimal set of representational primitives (e.g., states, actions) and algorithmic primitives (e.g., neares…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.0

SCAN: A Decision-Making Framework for Effective Task Allocation with Generative AI

2026-06-14 · Fendi Tsim, Alina Gutoreva

Research Track A

We introduce SCAN -- a human-centric decision-making framework to facilitate learners for effective task allocation with Generative Artificial Intelligence (GenAI) based on Vygotsky's Zone of Proximal Development and Metacognition. In SCAN, we systematize and formalize AI-human interaction by introducing a task-identif…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.5

JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence

2026-06-10 · Dingyu Yao, Junhao Zhou, Chenxu Yang, Chuanyu Qin, Haowen Hou, Zheming Liang, Congcong Wang, Yuhang Cao, Shenglong Ye, Shuai Xie, Shuhuan Gu, Haoyang Huang, Qingyi Si, Nan Duan, Jiaqi Wang

General AI

Many moments in the real world do not wait for a user to ask. A fire starts on a security monitor, an expression flickers across a video call, or a product a viewer wants flashes by in a livestream. Yet today's large models remain mostly turn-based by design: they answer only when addressed, and even video-call apps th…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

Clay-CNN Hybrids: Leveraging Geospatial Foundation Models as Auxiliary Context for Landslide Detection

2026-06-12 · Huong Binh Vu

General AI

Rapid post-event landslide mapping is essential for disaster response but remains difficult to automate due to extreme class imbalance. This study evaluates whether Clay v1.5, a Geospatial Foundation Model (GFM), can improve pixel-level landslide segmentation on the Landslide4Sense (L4S) benchmark, which contains 3,799…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

LatentGym: A Testbed For Cross-Task Experiential Learning With Controllable Latent Structure

2026-06-13 · Daksh Mittal, Tommaso Castellani, Thomson Yen, Naimeng Ye, Fangyu Wu, Minghui Chen, Tiffany Cai, Emmanouil Koukoumidis, William Zeng, Hongseok Namkoong

General AI

We envision continually learning agentic systems that become more useful over time: as they encounter sequences of related tasks, they should infer the hidden structure shared across those tasks and use it to improve future decisions. This cross-task experiential learning capability is pivotal in domains such as person…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

Toward the Whole Picture: Accumulative Fingerprint Mapping and Reconstruction for Small-Area Mobile Sensors

2026-06-14 · Xiongjun Guan, Jianjiang Feng, Jie Zhou

Research Track A · General AI

Small-area fingerprint sensing on mobile devices creates a fundamental mismatch between acquisition and recognition: each touch captures only a tiny, pose-varying local patch, while reliable biometric matching ultimately requires a stable and sufficiently complete fingerprint representation. Existing pipelines largely …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.3

ExpRL: Exploratory RL for LLM Mid-Training

2026-06-15 · Violet Xiang, Amrith Setlur, Chase Blagden, Nick Haber, Aviral Kumar

General AI

Sparse reward reinforcement learning (RL) has become a standard tool for improving LLM reasoning, but its success depends critically on the coverage present in the base model. In practice, models are often primed for RL through \emph{mid-training} on curated reasoning traces that teach useful primitive skills such as d…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.0

Tangram: Unlocking Non-Uniform KV Cache Compression for Efficient Multi-turn LLM Serving

2026-06-15 · Hyungmin Kim, Minsoo Kim, Hongseok Kim, Jungwook Choi

Research Track A · General AI

Multi-turn LLM serving accumulates dialogue history whose Key-Value (KV) cache grows with every turn and every user, quickly exceeding the model weights themselves and making memory -- not compute -- the binding constraint on throughput. Non-uniform KV compression, which allocates heterogeneous budgets across attention…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 7.5

Artificial Intelligence Index Report 2026

2026-04-14 · Sha Sajadieh, Loredana Fattorini, Raymond Perrault, Yolanda Gil, Vanessa Parli, Lapo Santarlasci, Juan Pava, Nestor Maslej, Russ Altman, Erik Brynjolfsson, Carla Brodley, Jack Clark, Virginia Dignum, Vipin Kumar, James Landay, Terah Lyons, James Manyika, Juan Carlos Niebles, Yoav Shoham, Elham Tabassi, Russell Wald, Toby Walsh, Dan Weld

General AI

Welcome to the ninth edition of the AI Index report. As AI continues to advance rapidly, the question becomes whether the systems built around it can keep up. Governance frameworks, evaluation methods, education systems, and the data infrastructure needed to track AI's impact are struggling to match the pace of the tec…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 7.5

Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

2026-06-02 · Sanket Badhe, Deep Shah

General AI

Advanced reasoning typically requires Chain-of-Thought prompting, which is accurate but incurs prohibitive latency and substantial test-time inference costs. The standard alternative, fine-tuning smaller models, often sacrifices interpretability while introducing significant resource and operational overhead. To addres…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.5

Open-World Video Segmentation

2026-06-14 · Qing Su, Kaiyang Li, Yuan Zhuang, Fei Miao, Shihao Ji

Research Track A · General AI

While video segmentation has advanced rapidly on short clips and closed-set benchmarks, open-world video segmentation remains largely unexplored. The challenge is twofold: (1) existing methods are not designed to support object discovery and identity maintenance in long videos of dynamic ego-motion, and (2) existing ev…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Beyond LoRA: Is Sparsity-Induced Adaptation Better?

2026-06-11 · Elijah Cadenhead, Cristian McGee, Xin Li, El Houcine Bergou, Aritra Dutta

General AI

Low-rank adaptation (LoRA) and its variants provide a memory- and compute-efficient alternative to full fine-tuning of pre-trained models. However, questions remain about the comparative generalizability of these approaches and how the structural restrictions on low-rank updates preserve effective adaptation performanc…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Data-Driven Decoding of Russell's Circumplex Model of Affect

2026-06-15 · Amdjed Belaref, Samir Sadok, Zineb Noumir, Renaud Seguier

General AI

Affective computing increasingly relies on deep learning to represent emotions, yet latent spaces often remain opaque, high-dimensional black boxes. This paper investigates whether Transformers' embeddings recover the geometric regularities of Russell's circumplex model. We unify two complementary experiments testing t…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

Exact Posterior Score Estimation for Solving Linear Inverse Problems

2026-06-15 · Abbas Mammadov, Ozgur Kara, Kaan Oktay, Iskander Azangulov, Adil Kaan Akan, Hyungjin Chung, James Matthew Rehg, Yee Whye Teh

General AI

Diffusion and flow-based models learn powerful data priors by training a denoiser to reverse Gaussian corruption. To use this prior to solve a linear inverse problem, one needs to sample from the posterior, but the score that the prior provides is the unconditional score, not the posterior score. Existing methods eithe…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.3

SoK: Security and Privacy of Foundation-Model-Powered Robots

2026-06-15 · Xueluan Gong, Chen Chen, Jinxin Liu, Qian Wang, Kwok-Yan Lam

General AI

Foundation models are reshaping robotics by enabling robots to interpret open-ended instructions, reason over multimodal contexts, and operate in complex, open-world environments. However, their integration also introduces security and privacy (S&P) risks that extend beyond the FMs themselves to embodied execution pipe…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.5

SP^3: Spherical Priors for Plug-and-Play Restoration

2026-06-15 · Sean Man, Ron Raphaeli, Matan Kleiner, Or Ronai

General AI

In this paper, we introduce SP^3, a novel Plug-and-Play algorithm that accelerates maximum a posteriori image restoration by replacing denoisers with Spherical Encoders (SE) as generative priors. SP^3 approximates the intractable proximal prior step by utilizing the SE tightly structured latent space as a robust projec…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

Human Universal Grasping

2026-06-15 · Kevin Yuanbo Wu, Tianxing Zhou, Isaac Tu, Billy Yan, Irmak Guzey, David Fouhey, Dandan Shan, Lerrel Pinto

General AI

Humans can grasp objects effortlessly, whereas multi-fingered robots are far from this level of generality. We argue that the most natural source of robot grasping data is from humans, who pick up thousands of objects every day. We present HUG, a flow-matching model that generates diverse human grasps for any user-spec…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.3

Phantoms and Disclosures: a Causal Framework for Auditing Synthetic Data

2026-06-15 · Kareem Amin, Rudrajit Das, Alessandro Epasto, Adel Javanmard, Dennis Kraft, Mónica Ribero, Sergei Vassilvitskii

General AI

The rapid adoption of generative AI and Large Language Models (LLMs) has spurred interest in synthetic data as a privacy-preserving alternative to sensitive real-world datasets. However, generating high-utility synthetic data often carries the risk of memorizing and regurgitating private information from the training c…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 5.5

Selective Control under Noisy Perception: Governance Failures Hidden by Aggregate Metrics in Modular Networks

2026-06-12 · Igor Itkin

General AI

A content-moderation system can score well on every standard accuracy metric and still cause real harm, if its mistakes fall on the few users who connect otherwise separate communities. We show this in an agent-based model where N=240 learning agents on a community-structured network each post harmless, productive, or …

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.3

SDS-LoRA: Overcoming Anisotropic Gradient Scaling in Low-Rank Adaptation

2026-06-15 · Junghun Oh, Sungyong Baik, Kyoung Mu Lee

General AI

Low-Rank Adaptation (LoRA) enables efficient adaptation of large pre-trained models to downstream tasks by parameterizing weight updates with low-rank matrices. In this paper, we investigate the limitations of the LoRA parameterization from a geometric perspective. Specifically, we show that when a full fine-tuning gra…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.3

Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences

2026-06-15 · Mingyang Li, Yurou Liu, Jieping Ye, Bing Su, Ji-Rong Wen, Zheng Wang

General AI

In this report, we present LOGOS (Language Of Generative Objects in Science), a scientific generative language model that unifies heterogeneous tasks across the natural sciences within a single autoregressive framework based on a shared scientific grammar. It encodes diverse scientific objects and their spatial interac…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 4.5

Memento: Reconstruct to Remember for Consistent Long Video Generation

2026-06-12 · Xuan Wei, Longbin Ji, Guan Wang, Xiangrui Liu, Zhenyu Zhang, Shuohuan Wang, Yu Sun, Qingqi Hong

General AI

Long-form video generation requires recurring subjects to remain consistent across various shots, viewpoints, motions, and scene transitions. Existing temporal decomposition methods improve scalability by generating videos shot by shot. However, they mainly focus on optimizing plausible next-shot continuations without …

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.5

GD^2PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy Optimization

2026-06-15 · Haotian Liu, Yihao Liu, Jingwei Ni, Siyuan Huang, Xinpeng Liu, Pengyu Cheng, Jiajun Song, Ruijin Ding, Junfeng Li, Zhechao Yu, Mengyu Zhou, Hongteng Xu, Xiaoxi Jiang, Guanjun Jiang

General AI

As LLMs advance, post-training reinforcement learning (RL) increasingly relies on multi-dimensional rewards to cultivate comprehensive capabilities. This shift demands new algorithms capable of optimizing diverse and potentially competing objectives simultaneously. To address this, existing methods such as Group reward…

Review
pending
Role
unreviewed
Read
later
huggingface Score 4.5

MMDiff: Extending Diffusion Transformers for Multi-Modal Generation

2026-06-15 · Yagmur Akarken, Orest Kupyn, Christian Rupprecht

General AI

Diffusion transformers have demonstrated remarkable generative capabilities, yet the rich perceptual representations computed across their denoising trajectory are discarded once the content is rendered. We present MMDiff, a framework that transforms a frozen diffusion transformer into a multi-modal generative system t…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Prompt2Effect: Training-Free Image-to-Video Model Specialization via LoRA Generation

2026-06-11 · Xiaomeng Yang, Yanyu Li, Gordon Guocheng Qian, Ivan Skorokhodov, Viacheslav Ivanov, Avalon Vinella, Xuan Zhang, Yanzhi Wang, Sergey Tulyakov, Anil Kag

General AI

Personalizing Image-to-Video (I2V) diffusion models with specific visual effects is increasingly demanded for high-end video generation. Current practice requires training a separate Low-Rank Adaptation (LoRA) module for each effect, incurring substantial data curation and iterative optimization costs that hinder inter…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

Texture-Shape Bias Balancing for Robust Synthetic-to-Real Semantic Segmentation in Automotive NIR Imagery

2026-06-13 · Felix Stillger, Ben Hamscher, Lukas Hahn, Annika Mütze, Tobias Meisen, Kira Maag

General AI

Semantic segmentation is a fundamental component of visual perception in modern automotive systems, enabling pixel-level scene understanding. Near-Infrared imaging (NIR) offers stable detection under difficult illumination conditions, but the development of domain-specific semantic segmentation models remains challengi…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

From Newtonian to Relativistic IAM: The Autonomous Principal as Reference Frame for Digital Identity

2026-06-15 · Philippe Page, Robert Mitwicki, Michal Pietrus

General AI

The 2023 paper \emph{Distributed Governance: a Principal-Agent Approach to Data Governance} arXiv:2308.07280 introduced the autonomous principal as the locus of transactional sovereignty in digital ecosystems. This follow-up, Part 2, advances a structural argument for why that model is not a normative preference but a …

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.3

The Value Axis: Language Models Encode Whether They're on the Right Track

2026-06-15 · Nick Jiang, Isaac Kauvar, Jack Lindsey

General AI

We investigate whether language models internally track the value of their current trajectory, defined as the likelihood that their ongoing strategy will achieve their goals. Using synthetic, in-context reinforcement learning data, we construct a "value" axis for Qwen3-8B. We find that activations along this axis disti…

Review
pending
Role
unreviewed
Read
later