Research Paper Cockpit

Daily Digest - 2026-06-30

Papers first seen in this daily snapshot.

Daily Archives

Quick jump into generated daily digests.

Research Workflow

Latest digest: 2026-07-04.

Papers

69 visible entries

arxiv Score 37.0

Neural Subspace Reallocation: Continual Learning as Retrieval-Based Subspace Memory Management

2026-06-29 · Byeong Hoon Yoon

Research Track A · General AI

We introduce Neural Subspace Reallocation (NSR), which reframes continual learning as memory management over parameter subspaces. Instead of treating Low-Rank Adaptation (LoRA) modules as disposable per-task adapters, NSR manages them as compressible, retrievable memory units on a frozen backbone through a recurring cy…

Review
pending
Role
unreviewed
Read
now
arxiv Score 27.0

Towards Continual Motion-Language Agents: LoRA Variants for Incremental Motion Understanding and Generation

2026-06-29 · Bertram Taetz, Hugo Albuquerque Cosme da Silva, Gabriele Bleser-Taetz

Research Track A · General AI

Motion-language agents must possess the bidirectional capability to both understand human movement (motion-to-text, M2T) and generate it from natural language (text-to-motion, T2M). While foundational models have achieved strong performance in static settings, autonomous agents operating in dynamic environments must co…

Review
pending
Role
unreviewed
Read
now
arxiv Score 24.5

Parametric Skills

2026-06-29 · Xuan Zhao, Haonan He, Qingyu Yang, Minglei Li, Jingqi Ye, Zelin Tan, Bo Wan, Peng Ye

Research Track A · General AI

Since intelligence fundamentally relies on efficient skill acquisition (Chollet, 2019), the ability to leverage skills is critical. For LLMs, skills, manually authored or extracted from task trajectories, are textual recipes encoding mature problem-solving experience and are critical to agentic capabilities. Despite wi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 22.8

Self-Evolving World Models for LLM Agent Planning

2026-06-29 · Xuan Zhang, Wenxuan Zhang, See-Kiong Ng, Yang Deng

General AI

World models offer a principled way to equip long-horizon LLM agents with foresight: predictions of action consequences before execution. However, unreliable foresight can be ignored, misused, or even degrade downstream decision-making. In this paper, we introduce WorldEvolver, a self-evolving world model framework tha…

Review
pending
Role
unreviewed
Read
now
arxiv Score 22.4

Empowering GUI Agents via Autonomous Experience Exploration and Hindsight Experience Utilization for Task Planning

2026-06-25 · Tianyi Men, Zhuoran Jin, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao

Research Track B · General AI

Multimodal web agents can assist humans in operating repetitive GUI tasks, where effective task planning is essential for decomposing complex tasks into executable actions. While small open source MLLMs are cost efficient and privacy preserving compared with commercial large models, they suffer from weak planning and l…

Review
pending
Role
unreviewed
Read
now
arxiv Score 21.5

The Forgetting-Retention Dilemma: Certified Unlearning Theory in Continual Learning

2026-06-29 · Yiting Hu, Lingjie Duan, Qian Zhang

Research Track A

Machine unlearning aims to eliminate the influence of specific data from trained models to safeguard privacy. However, this presents a significant challenge in the context of continual learning (CL), where models update sequentially on dynamic datasets. A major limitation is that current certified unlearning algorithms…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.8

Customized Generative AI Agent for Transportation Engineering Practice: A Development and Continued Pre-training Guideline

2026-06-27 · Dianwei Chen, Yuan-Zheng Lei, Zifan Zhang, Yuchen Liu, Xianfeng, Yang

General AI

Recent advancements in generative artificial intelligence (AI) and large language models (LLMs) have shown significant promise in automating complex reasoning, summarization, and question-answering tasks. However, the effectiveness of general-purpose LLMs in specialized engineering domains remains limited due to insuff…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.8

OmniCoT: A Benchmark for Global and Multi-Step Panoramic Reasoning

2026-06-29 · Haocong He, Chenfei Liao, Zichen Wen, Zihao Dongfang, Xu Zheng, Bin Ren, Chang Su, Zixin Zhang, Harold Haodong Chen, Hongfei Zhang, Weijia Li, Kailun Yang, Conghui He, Xuming Hu, Nicu Sebe, Linfeng Zhang

General AI

Multimodal Large Language Models (MLLMs) have demonstrated promising spatial reasoning capabilities, while these abilities remain underexplored in the emerging visual modality of panoramic imagery. The full 360°$\times$180° field of view of panoramas essentially supports complex global multi-step reasoning, which is al…

Review
pending
Role
unreviewed
Read
now
arxiv Score 19.4

LiMoDE: Rethinking Lifelong Robot Manipulation from a Mixture-of-Dynamic-Experts Perspective

2026-06-24 · Zhihao Gu, Lin Wang

Research Track A · General AI

Building a generalist robot that can leverage prior knowledge for continuous task adaptation remains a significant challenge. Previous works alleviate the catastrophic forgetting problem by parameter-efficient fine-tuning for single-task adaptation. However, they fail to extract reusable skills and model the interactio…

Review
pending
Role
unreviewed
Read
now
huggingface Score 19.4

TUA-Bench: A Benchmark for General-Purpose Terminal-Use Agents

2026-06-26 · Shoufa Chen, Luyuan Wang, Xuan Yang, Zhiheng Liu, Yuren Cong, Yuanfeng Ji, Feiyan Zhou, Xiaohui Zhang, Fanny Yang, Belinda Zeng

General AI

As large language models and harness frameworks continue to advance, agents operating in terminals are increasingly capable of performing a broader range of general computer-use tasks beyond coding. However, existing benchmarks do not adequately evaluate general-purpose terminal computer-use agents (TUAs): general comp…

Review
pending
Role
unreviewed
Read
now
huggingface Score 19.4

Video-MME-Logical: A Controlled Diagnostic Benchmark for Video Temporal-Logical Reasoning

2026-06-26 · Hohin Kwan, Hongyu Li, Ray Zhang, Manyuan Zhang, Xianghao Kong, Anyi Rao, Jiahao Xie, Si Liu

General AI

Recent interest in multimodal large language models (MLLMs) raises a central question: can they reason over dynamic visual evidence rather than merely recognize objects or events in individual frames? This ability, which we refer to as video temporal-logical reasoning, requires models to maintain, update, and compose e…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.8

Mandol: An Agglomerative Agent Memory System for Long-Term Conversations

2026-06-29 · Yuhan Zhang, Zhiyuan Guo, Ziheng Zeng, Wei Wang, Wentao Wu, Lijie Xu

General AI

Long-term conversational agents need to remember and query cross-session, multi-typed information with complex correlations. Existing agent memory systems rely on heterogeneous vector and graph databases, which fragment memory information and cause high cross-database I/O latency. For retrieval, common RAG-style method…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.8

Training Vision-Language-Action Models with Dense Embodied Chain-of-Thought Supervision

2026-06-29 · Haoyang Li, Guanlin Li, Youhe Feng, Chen Zhao, Zhuoran Wang, Yang Li, Qizhe Wei, Shifeng Bao, Haitao Shen, Yihan Zhao, Tong Yang, Jing Zhang

General AI

Cross-embodiment transfer in vision-language-action (VLA) models remains challenging because low-level state and action spaces differ fundamentally across robot platforms. We observe that the high-level cognitive process underlying manipulation, including scene perception, object identification, task planning, and sub-…

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.0

EVAF: A Test-Retest Protocol for Selective Parametric Consolidation

2026-06-29 · Haoliang Han

Research Track A · General AI

Long-running language agents need mechanisms for deciding which experiences should persist after the working context is gone. Retrieval systems can reinsert past text, but they do not by themselves show that an experience has been selectively consolidated into the model's own behavior. We introduce EVAF, an Echo-Valenc…

Review
pending
Role
unreviewed
Read
now
huggingface Score 17.2

OSWorld2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks

2026-06-28 · Mengqi Yuan, Zilong Zhou, Xinzhuang Xiong, Weiming Wu, Jiayang Sun, Jiamin Song, Kaiqian Cui, Bowen Wang, Haoyuan Wu, Yitong Li, Dunjie Lu, Haikong Lu, Qi Zhen, Xinyuan Wang, Jiaqi Deng, Yuhao Yang, Cheng Chen, Boyuan Zheng, Alex Su, Xiao Yu, Hao Zou, Saaket Agashe, Xing Han Lu, Manpreet Kaur, Zhengyang Qi, Vincent Sunn Chen, Frederic Sala, Dayiheng Liu, Junyang Lin, Zhou Yu, Yu Su, Siva Reddy, Xin Eric Wang, Peng Qi, Tianbao Xie, Tao Yu

Research Track B · General AI

Existing computer-use benchmarks fail to capture the realism, complexity, and long-horizon demands of real-world computer use, limiting their ability to reveal the limitations of frontier agents. We introduce OSWorld 2.0, a benchmark of 108 long-horizon computer-use workflows across everyday and professional tasks, des…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.8

SADL: What to Ignore? A Benchmark for Subject-Aware Distractor Localization

2026-06-29 · Cao-Tri Nguyen, Nguyen-Khoa Luong, Vinh-Tiep Nguyen, Minh-Triet Tran

General AI

Photographs frequently contain \emph{visual distractors} besides foregrounds and backgrounds of the intended subject, competing for attention and weakening composition. While modern editing tools streamline object removal, identifying which objects to remove remains a mostly manual process. Existing saliency models and…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.5

Rehearsed Multi-Agent Live Product Demonstrations with Real-Time Voice Question Answering

2026-06-29 · Rahul Khedar, Mayank Malhotra, Avinash Karn, Mouli V, Prakhar Mehrotra

Research Track B · General AI

Live product demonstrations are a recurring, high-cost activity in software organizations: a human presenter must select features, dispatch the corresponding interactions on a running application, narrate them coherently, and answer questions in real time. Existing automation addresses only fragments -- generalist brow…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.0

Few-Shot Domain Incremental Learning via Continual Vision-Language Consolidation

2026-06-29 · Naeem Paeedeh, Mahardhika Pratama, Wolfgang Mayer, Mukesh Prasad, Weiping Ding, Yew-Soon Ong

Research Track A · General AI

Existing domain-incremental learning (DIL) strategies call for massive amounts of data to adapt to new domains and suffer from the overfitting problem in the case of data scarcity. This paper puts forward a relatively uncharted problem, namely, few-shot domain incremental learning (FSDIL), taking into account the probl…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

GROW$^2$: Grounding Which and Where for Robot Tool Use

2026-06-29 · Yuhong Deng, Yuyao Liu, David Hsu

General AI

Can the robot use a plate to cut a cake if no knife is available? Tool use greatly expands robot capabilities, but to use tools creatively beyond their intended functions, the robot faces the challenge of $\textit{open-world affordance grounding}$: select an open-category object to act as a tool and localize its specif…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

Toward Secure and Reliable PDDL Formalization of Large Language Models with Planner-in-the-Loop Feedback

2026-06-29 · Jiamei Jiang, Jiajing Zhang, Feifei Mo, Linjing Li, Daniel Zeng

General AI

Planning often requires symbolic specifications that are both executable and verifiable. For large language models deployed in autonomous or decision-support systems, failures in such formalization may lead to unverifiable decisions, execution failures, or unsafe downstream behavior. We present NL-PDDL-Bench, a multi-d…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.0

Manufactured Confidence: How Memory Consolidation Turns Hearsay into Confident Facts

2026-06-28 · Alex Kwon

Research Track A · General AI

LLM agents carry conclusions across steps and sessions in compressed memory, and memory products (e.g., mem0, LangMem) rewrite conversation into stored "facts" that later steps trust. We show this rewriting manufactures confidence: across our constructed agent settings, a casual, hedged remark becomes a confident, date…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

Learning from Mistakes: Rollout-Retrieval Lifelong Policy Learning for Autonomous Driving

2026-06-29 · Cheng Gong, Haoyang Wang, Chao Lu, Zirui Li, Jianwei Gong

Research Track A · General AI

Autonomous driving policies should be able to improve continually as deployment exposes them to increasingly diverse and long-tail traffic situations. However, most learning-based policies are trained or fine-tuned on expert demonstrations and then rely largely on generalization to handle challenging closed-loop scenar…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

Uncertainty-Aware Generation and Decision-Making Under Ambiguity

2026-06-29 · Nico Daheim, Iryna Gurevych

General AI

With rapidly improving capabilities, Large Language Models (LLMs) are increasingly used in many complex real-world tasks. Beyond requiring in-depth knowledge and reasoning skills, many of these tasks exhibit a high degree of subjectivity and require that the outputs of the model can be trusted. While a lot of progress …

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.4

SKILL-DISCO: Distilling and Compiling Agent Traces into Reusable Procedural Skills

2026-06-25 · Zhongxin Guo, Danrui Qi, Hanwen Gu, Peng Cheng, Yongqiang Xiong

Research Track B · General AI

Agents often repeatedly solve similar task instances from scratch, leading to unnecessary reasoning cost and long execution traces. Prior work has explored workflow reuse and executable skill induction, but it remains unclear which task scenarios admit procedural skills and how the shared procedural structure should be…

Review
pending
Role
unreviewed
Read
now
huggingface Score 13.0

Agentic Abstention: Do Agents Know When to Stop Instead of Act?

2026-06-27 · Han Luo, Bingbing Wen, Lucy Lu Wang

General AI

LLM agents are expected to act over multiple turns, using search, browsing interfaces, and terminal tools to complete user goals. Yet not every goal is well specified or achievable in the available environment. In such cases, a reliable agent should recognize that further interaction is unlikely to help and abstain fro…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.0

Convergence of Continual Learning in Homogeneous Deep Networks

2026-06-29 · Matan Schliserman, Gon Buzaglo, Itay Evron, Daniel Soudry

Research Track A

We characterize weakly regularized continual classification in homogeneous models as sequential projections onto task margin sets. This result generalizes prior analyses restricted to either stationary (single-task) deep models or continual linear models. We show that global convergence generally fails, even for simple…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

Words Speak Louder Than Code: Investigating Cognitive Heuristics in LLM-Based Code Vulnerability Detection

2026-06-29 · Asif Shahriar, Hongyu Cai, Hadjer Benkraouda, Gang Wang, Z. Berkay Celik

General AI

Researchers and practitioners increasingly apply Large Language Models (LLMs) for automated vulnerability detection. Recent work has shown that LLMs are susceptible to the same cognitive heuristics that bias human judgment. Yet, no work has investigated whether these heuristics affect a model's assessment of code vulne…

Review
pending
Role
unreviewed
Read
now
huggingface Score 12.0

SafePyramid: A Hierarchical Benchmark for In-context Policy Guardrailing

2026-06-29 · Jiacheng Zhang, Haoyu He, Sen Zhang, Shen Wang, Xiaolei Xu, Yuhao Sun, Meng Shen, Feng Liu

General AI

In real-world applications, guardrails are often expected to identify unsafe user-model interactions according to application-specific safety policies, rather than relying on predefined risk taxonomies. In this work, we study this setting under the paradigm of in-context policy guardrailing, where guardrails predict sa…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.9

Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents

2026-06-25 · Minbyul Jeong

Research Track B · General AI

Web-agent benchmarks overwhelmingly measure depth -- pinning one obscure answer behind a chain of constraints -- while breadth, exhaustively enumerating a closed set and filling each item's attributes, is barely evaluated, especially outside English. Breadth is also hard to build: certifying that a gold set is complete…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.8

Attractor States Emerge in Multi-Turn LLM Conversations

2026-06-29 · Ting-Wen Ko, Jonas Geiping

General AI

Large language models (LLMs) are increasingly used in open-ended multi-agent settings, but the long-run dynamics of model--model interaction remain poorly understood. We study whether open-ended LLM discussions exhibit attractor-like behavior, i.e. topic-independent stable sets of behaviors which conversations settle i…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.8

Chronos: A Physics-Informed Full-History Framework for Non-Markovian Long-Horizon Manipulation

2026-06-29 · Yulin Zhou, Yimeng Wang, Nengyu Wang, Shaojia Xing, Shiyun Tu, Xiang Li, Jingkai Zhang, Ningbo Jiang, Yuankai Lin, Hua Yang, Xiangrui Zeng, Zhouping Yin

General AI

General-purpose robot policies should be modeled as dynamical systems, yet many VLA and generative imitation policies still rely on present observations or short windows. This Markovian shortcut fails in memory-dependent manipulation: identical observations can demand different actions after different histories. We pre…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.8

On the Internet, Nobody Knows You're an LLM Bot: Unmasking Web Agents with Multi-Layer Fingerprinting

2026-06-29 · Iliana Fayolle, Sihem Bouhenniche, Samuel Pélissier, Pierre Laperdrix, Clémentine Maurice, Walter Rudametkin

Research Track B · General AI

Since 2023, a new class of bots has emerged: Web Agents. They can automate complex tasks on the Web, going beyond traditional browser automation tools such as Selenium, Puppeteer, or Playwright. Leveraging large language models (LLMs), these agents are capable of solving anti-bot mechanisms, mimicking human behavior, a…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.8

SWE-INTERACT: Reimagining SWE Benchmarks as User-Driven Long-Horizon Coding Sessions

2026-06-29 · Mohit Raghavendra, Anisha Gunjal, Aakash Sabharwal, Yunzhong He

Research Track A · General AI

We introduce SWE-Interact, a new testbed for evaluating coding agents on multi-turn, interactive, user-driven software engineering tasks. Existing frontier SWE benchmarks typically provide complete requirements upfront and evaluate agents on autonomous implementation. In contrast, SWE-Interact places agents in a realis…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.8

Sequential Planning via Anchored Robotic Keypoints

2026-06-29 · Bryce Grant, Aryeh Rothenberg, Logan Senning, Zonghe Chua, Zach Patterson, Peng Wang

General AI

We present Sequential Planning via Anchored Robotic Keypoints, SPARK, a training-free neurosymbolic manipulation system that reaches 43.7% on six LIBERO-PRO position \& task cells, more than doubling CaP-Agent0 and Vision-Language-Action (VLA) baselines. CaP-Agent0, a multi-turn code-generation agent, achieves 18.2% by…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.5

DOPD: Dual On-policy Distillation

2026-06-29 · Xinlei Yu, Gen Li, Qingyi Si, Guibin Zhang, Yuqi Xu, Congcong Wang, Shuai Dong, Kaiwen Tuo, Xiangyu Zeng, Kaituo Feng, Qunzhong Wang, Yang Shi, Xiaobin Hu, Xiangyu Yue, Jiaqi Wang, Shuicheng Yan

Research Track A · General AI

On-policy distillation (OPD) offers superior capacity transfer by supervising student-sampled trajectories with dense token-level signals. To furnish high-quality supervision sources and thereby elevate the performance frontier of distillation, an intuitive direction is to infuse privileged information to either teache…

Review
pending
Role
unreviewed
Read
now
arxiv Score 11.5

Theory of Continual Learning Against Data Poisoning Attacks

2026-06-29 · Yiting Hu, Lingjie Duan

Research Track A · General AI

Continual learning (CL), where a model is trained on a sequence of data tasks, is increasingly being adopted across key fields such as large language models and image recognition, yet it remains highly vulnerable to data poisoning that triggers learning divergence or severe excess risk. Despite these threats, a princip…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.4

Trimming the Long-Tail of Visual World Modeling Evaluation

2026-06-23 · Bingxuan Li, Yining Hong, Cheng Qian, Hyeonjeong Ha, Jiateng Liu, Zhenhailong Wang, Yue Guo, Yunzhu Li, Heng Ji

General AI

Physical interactions follow a long-tailed distribution: a set of common and regular interactions dominates human experience and visual data, while a broad spectrum of rare and irregular interactions remains underrepresented. Although recent visual world models, including image and video generation models, achieve impr…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.4

TheoremGraph: Bridging Formal and Informal Mathematics

2026-06-24 · Simon Kurgan, Evan Wang, Eric Leonen, Sophie Szeto, Luke Alexander, Artemii Remizov, Jarod Alper, Giovanni Inchiostro, Vasily Ilin

General AI

Mathematical knowledge is organized around statements and their dependencies, but this structure is exposed unevenly: informal papers cite mostly at the document level, while formal libraries record fine-grained dependencies over a much smaller body of mathematics. We introduce TheoremGraph, a unified statement-level d…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.0

Learning Transferable Dynamics Priors from Action to World Modeling

2026-06-28 · Ze Huang, Jiahui Zhang, Hairuo Liu, Chenxi Zhang, Ran Cheng, Li Zhang

General AI

We study action-conditioned world modeling as a scalable way to learn transferable dynamics priors for robot learning. By pretraining a model to predict how actions drive visual scene evolution, the resulting world model captures reusable interaction dynamics beyond appearance-level video generation. Concretely, we pre…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 11.0

Walking in the Implicit: Interactive World Exploration via Neural Scene Representation

2026-06-29 · Zhiqi Li, Chengrui Dong, Zhenhua Du, Hangning Zhou, Cong Qiu, Hailong Qin, Mu Yang, Dongxu Wei, Peidong Liu

General AI

Interactive video generation systems for camera-controlled world exploration roll out growing sequences of latent video frames, entangling state transition with high-frequency observation synthesis. We propose Walking in the Implicit, a scene-centric paradigm that changes the rollout variable from frame latents to a fi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

BrainJanus: A Unified Model for Understanding and Generation across Brain, Vision, and Language

2026-06-29 · Haitao Wu, Qirui Zhang, Zhouheng Yao, Shangquan Sun, Qihao Zheng, Mianxin Liu, Chi Zhang, Wanli Ouyang, Chunfeng Song, Changqing Zhang, Jiamin Wu

General AI

Modeling the bidirectional correspondence between external sensory stimuli and internal neural activity has emerged as a critical frontier in neuroscience. However, existing approaches predominantly treat brain encoding and decoding as isolated tasks, relying heavily on unimodal alignment and external priors while over…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

LeVo 2: Stable and Melodious Song Generation via Hierarchical Representation Modeling and Progressive Post-Training

2026-06-29 · Shun Lei, Huaicheng Zhang, Dapeng Wu, Yaoxun Xu, Lishi Zuo, Wei Tan, Hangting Chen, Guangzheng Li, Jianwei Yu, Zhiyong Wu, Dong Yu

General AI

Full-length song generation must preserve coherence and musicality, render detailed vocal and accompaniment acoustics, and follow lyrics and prompts. Existing language model-based systems face a structural trade-off: mixed-token modeling preserves vocal-instrument coordination but obscures track-specific details, where…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

PyMETA: A Benchmark Dataset for Hierarchical Student Code Error Classification with Python-Interpreter-Based Labels

2026-06-29 · Chuyue Li, Ziqi Tang, Jingyi Wang, Yu Wu, Kazuma Hashimoto, Lingyu Gao

General AI

With the advancement of Large Language Models (LLMs), code error detection has extended beyond traditional IDE diagnostics to context-sensitive debugging in educational scenarios. However, existing approaches lack large-scale datasets, multi-error analysis, and unified error taxonomies. To address this, we introduce Py…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

UnfoldArt: Zero-Shot Recovery of Full Articulated 3D Objects from Text or Image

2026-06-29 · Mohamed el amine boudjoghra, Ivan Laptev, Angela Dai

General AI

Articulated 3D objects are essential for interactive environments in embodied AI, robotics, and virtual reality, but reconstructing their structure and motion from sparse observations remains challenging. Existing approaches remain largely constrained by lack of supervised data or lack the priors needed to reliably rec…

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.4

LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing

2026-06-25 · Xinyu Wang, Chongbo Zhao, Fangneng Zhan, Yue Ma

General AI

Streaming video editing has made rapid progress, yet practical deployment is still limited by two core issues: maintaining stable backgrounds and non-edited regions over time, and achieving the low latency required for real-time interactive scenarios. Meanwhile, recent streaming video generation methods are mostly deve…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 10.4

ZooClaw-FashionSigLIP2: Distilled Fine-tuning for Robust Fashion Retrieval

2026-06-26 · Siqiao Xue, Chunxue Xu

General AI

Adapting a foundation vision-language encoder to a specialized retrieval task creates a fundamental tradeoff: gains on the target distribution come at the cost of the foundation model's broad generalization, and fashion retrieval is a stringent instance of this problem. We present ZooClaw-FashionSigLIP2, a fashion-spec…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 10.0

DreamForge-World 0.1 Preview: A Low-Compute Real-Time Controllable World Model

2026-06-29 · Daniyel Ayupov, Artur Markov-Tsoy

General AI

We present DreamForge-World 0.1 Preview, a preview foundational world model for real-time interactive world simulation. The system adapts the LongLive 1 autoregressive video stack, itself derived from Wan2.1-T2V-1.3B, with a residual action pathway inspired by the Matrix-Game family. DreamForge-World 0.1 Preview focuse…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 10.0

The Surprising Effectiveness of Video Diffusion Models for Hand Motion Reconstruction

2026-06-29 · Yuxi Wang, Chengkai Jin, Yufei Liu, Wenqi Ouyang, Tianyi Wei, Zhiwei Zeng, Siyuan Huang, Zhiqi Shen, Xingang Pan

General AI

4D hand motion reconstruction from egocentric video is bottlenecked by clear limitations of existing methods: image-based pipelines depend on a detector that fails under heavy occlusion, while video-based methods rely on temporal modules learned only from scarce hand-pose annotations, a narrow signal insufficient to mo…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.8

AI Premium

2026-06-29 · Nicola Borri, Yukun Liu, Aleh Tsyvinski

General AI

Using 380 trillion tokens of realized AI consumption across more than four hundred large language models from the licensed proprietary OpenRouter dataset covering approximately 2 percent of current global monthly AI token consumption, we analyze how AI affects firms, markets, and workers. Leveraging the unprecedented s…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.8

C$^{2}$R: Cross-sample Consistency Regularization Mitigates Feature Splitting and Absorption in Sparse Autoencoders

2026-06-29 · Haoran Jin, Xiting Wang, Shijie Ren, Hong Xie, Defu Lian

General AI

Sparse Autoencoders (SAEs) are widely used to interpret large language models by decomposing activations into sparse, human-understandable features, but scaling to large dictionaries exposes fundamental challenges. Systematic studies reveal pervasive feature splitting that fragments coherent concepts into non-atomic la…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.8

Learning from Reliable Latent Prompts for Visual Recognition with Missing Modalities

2026-06-29 · Taixi Chen, Nancy Guo

General AI

Large-scale multimodal models (LMMs) have achieved superior performance in visual recognition by synergizing information across diverse, massive-scale paired modalities. In real-world scenarios, however, missing-modality inputs are ubiquitous, causing models optimized for modality-complete data to exhibit precipitous p…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.8

Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent

2026-06-29 · Lei Bai, Zongsheng Cao, Yang Chen, Zhiyao Cui, Shangheng Du, Yue Fan, Shiyang Feng, Zijie Guo, Haonan He, Liang He, Xiaohan He, Shuyue Hu, Yusong Hu, Songtao Huang, Yichen Jiang, Hao Li, Xin Li, Dahua Lin, Weihao Lin, Fenghua Ling, Dongrui Liu, Zhuo Liu, Runmin Ma, Chunjiang Mu, Haoyang Peng, Tianshuo Peng, Jinxin Shi, Luohe Shi, Boyuan Sun, Zelin Tan, Shengji Tang, Qianyi Wang, Yiming Wu, Yi Xie, Xiangchao Yan, Jingqi Ye, Peng Ye, Fangchen Yu, Jiakang Yuan, Bihao Zhan, Bo Zhang, Chen Zhang, Shufei Zhang, Shuaiyu Zhang, Wenlong Zhang, Yiqun Zhang, Junpeng Zhao, Zhijie Zhong, Bowen Zhou, Yuhao Zhou

General AI

We introduce Agents-A1, a 35B Mixture-of-Experts Agentic Model that reaches trillion-parameter-level performance by scaling the agent horizon. We investigate agent-horizon scaling from two perspectives: scaling long-horizon trajectories and scaling heterogeneous agent abilities. To support this goal, we build a long-ho…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 9.0

Energy-Aware Scheduling for Serverless LLM Serving on Shared GPUs

2026-06-29 · Tianyu Wang, Gourav Rattihalli, Aditya Dhakal, Longfei Shangguan, Dejan Milojicic

Research Track A

As LLM inference becomes a major cloud workload, its growing energy footprint makes cluster-wide energy optimization increasingly important. Serverless LLM serving helps platforms absorb traffic volatility by elastically sharing GPU resources across models, but this sharing also makes energy optimization difficult. Mul…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Fog Computing and Large Language Models: A vision for the mutual beneficiaries

2026-06-28 · Satish Narayana Srirama

General AI

Fog computing utilizes proximal computational resources for sensor data processing and actuation, and addresses the latency, network load, and privacy issues of cloud-centric Internet of Things. On the other hand, Large Language Models (LLMs) are a type of deep learning AI models, which are trained on enormous text dat…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Building Multi-Task Agentic LLMs via Two-Phase Distillation

2026-06-29 · Huaijie Wang, Shusheng Xu, Yi Wu, Kaifeng Lyu

General AI

A key step toward artificial general intelligence is to train models that can perform multiple tasks. In this paper, we study how to build such models by first training separate RL experts for individual tasks and then consolidating them via distillation, as an alternative to directly training a single model on mixed t…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

MESA: Prioritizing Vulnerable Communication Channels for Securing Multi-Agent Systems

2026-06-29 · Kunyang Li, Kyle Domico, Jonathan Gregory, Patrick McDaniel

General AI

Multi-agent systems (MAS) are increasingly used to automate complex, distributed workflows. However, their inter-agent communication channels introduce new attack surfaces that remain poorly understood and are difficult to defend against. In this paper, we address how defenders should prioritize limited security effort…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Open-Vocabulary and Referring Segmentation for 3D Gaussians Using 2D Detectors

2026-06-29 · Jameel Hassan, Yasiru Ranasinghe, Vishal Patel

General AI

3D Gaussian Splatting (3DGS) has emerged at the forefront of 3D scene reconstruction. Extending 3DGS with language-driven, open-vocabulary understanding has gained significant attention for real-world applications such as embodied AI. Recent methods achieve this by learning an instance feature attribute and assigning s…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Optimization Dynamics Imprint Semantic Specificity in Contrastive Embedding Norms

2026-06-29 · Ziwei Su, Junyu Ren, Victor Veitch

General AI

Contrastive embedding models trained with scale-invariant losses are typically paired with distance metrics like cosine similarity, effectively ignoring embedding magnitudes. However, surprisingly, empirical studies reveal that despite this, these "discarded" norms seem to correlate with semantic properties such as con…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Orca: The World is in Your Mind

2026-06-29 · Yihao Wang, Yuheng Ji, Mingyu Cao, Yanqing Shen, Runze Xiao, Huaihai Lyu, Senwei Xie, Euan Liu, Klara Tian, Tianfeng Long, Yichi Zhang, Zhengliang Cai, Ruike Chen, Jifan Zhao, Ruochuan Shi, Zihan Tang, Jing Lyu, Wenxing Tan, Ningbo Zhang, Yangtao Hu, Yuming Gao, Xiansheng Chen, Junkai Zhao, Congsheng Xu, Boan Zhu, Ziqi Wang, Yupu Feng, Qiongqiong Zhang, Yingli Zhao, Yulong Ao, Shaoxuan Xie, You Liu, Guocai Yao, Leiduo Zhang, Xiaodan Liu, Yunyan Zhang, Yance Jiao, Xinyan Yang, Jiaxing Wei, Xu Liu, Tengfei Pan, Shaokai Nie, Chunlei Men, Sen Cui, Xiaojie Jin, Hongyang Li, Jianlan Luo, Yao Mu, Yunchao Wei, Jun Yan, Hang Zhao, Xiaolong Zheng, Jiaming Li, Yonghua Lin, Tiejun Huang, Zhongyuan Wang, Pengwei Wang

General AI

We introduce Orca, an initial instantiation of a general world foundation model. Orca learns a unified world latent space from multimodal world signals and exposes it through multimodal readout interfaces. Rather than optimizing isolated next-token, next-frame, or next-action prediction, we are centered on Next-State-P…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Poller: Are LLMs Suitable for Evaluating the Poetry Understanding Task?

2026-06-29 · Shanshan Wang, Derek F. Wong, Jingming Yao, Lidia S. Chao

General AI

Traditional automatic evaluation methods have been shown to be unsuitable for modern Chinese poetry because of the distinct nature of this literary genre. Human evaluation remains reliable, but is expensive and not applicable to large-scale data. In this paper, we propose Poller (Poetry LLM Evaluator), a novel method l…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Set-Inclusive Uncertainty Modeling for Robust Brain Tumor Segmentation

2026-06-29 · Seunghun Baek, Jihwan Park, Jaeyoon Sim, Hoseok Lee, Seungjoo Lee, Won Hwa Kim

General AI

Multimodal MRI is essential for accurate brain tumor segmentation. However, acquiring all modalities at inference is often challenging in practice, which causes intrinsic uncertainty due to unavoidable information loss. Without modeling this uncertainty, existing methods encode incomplete evidence into deterministic re…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.4

Large-Scale Tunnel Air-Ground Collaboration With FLISP: Fast LiDAR-IMU Synchronized Path Planner

2026-06-25 · Fenghe Guo, Runjie Shen, Chenyang Sun, Junrui Zhang, Quanxi Zhan, Yongchun Wang, Junjie Zhang

General AI

Hydropower tunnel inspection is critical for infrastructure integrity yet remains inefficient and hazardous using manual methods. We propose FLISP (Fast LiDAR-IMU Synchronized Path Planner), a mapless planning framework for cooperative UGV-UAV inspection. Unlike traditional map-based paradigms, FLISP features three cor…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

Pessimism's Paradox: Conservative Offline Training Amplifies Reward Hacking During Online Adaptation in Reasoning Models

2026-06-29 · Subramanyam Sahoo, Aman Chadha, Vinija Jain, Divya Chaudhary

General AI

Conservative offline training is widely advocated as a safe foundation for subsequent online adaptation: if a policy stays close to well-supported behaviour, the argument goes, it is less likely to exploit imperfections in a learned reward model. We challenge this intuition empirically and mechanistically. We train a Q…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

Residual-Guided Expert Specialization for Incomplete Multimodal Learning

2026-06-29 · Seunghun Baek, Jihwan Park, Jaeyoon Sim, Minjae Jeong, Hoseok Lee, Won Hwa Kim

General AI

As real-world prediction systems often face missing modalities at inference, incomplete multimodal learning (IML) remains a practical challenge. While prior methods aim to learn representations robust to missing inputs, representations from incomplete modalities inevitably deviate from their full-modality counterparts …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.8

BaRA: Bayesian Adaptive Rank Allocation for Parameter-Efficient Fine-Tuning

2026-06-28 · Zhibin Duan, Yuhong Wang, Jiahong Fu, Zongsheng Yue, Bo Chen, Zongben Xu

General AI

While Low-rank adaptation (LoRA) enables highly efficient fine-tuning by constraining task-specific updates to fixed low-rank subspaces, this rigid design limits representational flexibility and often results in overconfident predictions and miscalibrated uncertainty, especially in low-data regimes. Recent Bayesian LoR…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.8

Mechanistically Eliciting Latent Behaviors in Language Models

2026-06-28 · Andrew Mack, Nina Panickssery, Alexander Matt Turner

General AI

We aim to discover diverse, generalizable perturbations of LLM internals that can surface hidden behavioral modes. Such perturbations could help reshape model behavior and systematically evaluate potential risks. We introduce Causal Perturbative Elicitation (CPE), an unsupervised method for discovering interpretable lo…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.8

MirrorPPR: Exemplar-Based Portrait Photo Retouching

2026-06-28 · Zhihong Liu, Zheng Li, Jiachun Jin, Siqi Kou, Yitao Jian, Fengpei Yu, Zhijie Deng

General AI

While text-guided image editing has made remarkable progress, it remains limited in structural portrait retouching. Textual descriptions struggle to convey fine-grained changes to facial features and body proportions. To address this gap, we introduce Exemplar-Based Portrait Photo Retouching, where the model is given a…

Review
pending
Role
unreviewed
Read
later
arxiv Score 5.8

VLK: Learning Humanoid Loco-Manipulation from Synthetic Interactions in Reconstructed Scenes

2026-06-29 · Yen-Jen Wang, Jiaman Li, Sirui Chen, Takara E. Truong, Pei Xu, Pieter Abbeel, Rocky Duan, Koushil Sreenath, Angjoo Kanazawa, Carmelo Sferrazza, Guanya Shi, Karen Liu

General AI

Perception-based humanoid loco-manipulation requires connecting egocentric observations and task instructions to whole-body motion. Learning this mapping requires synchronized egocentric images, language commands, and robot-compatible kinematic trajectories, yet no existing data source provides this complete tuple at s…

Review
pending
Role
unreviewed
Read
later
huggingface Score 5.4

How Good Can Linear Models Be for Time-Series Forecasting?

2026-06-25 · Lang Huang, Jinglue Xu, Luke Darlow

General AI

Time-series forecasting research has been moving steadily toward larger architectures, from specialized transformers to general-purpose foundation models, on the assumption that capacity is what unlocks accuracy. We take the opposite position: most of the gap can be closed at far lower cost by tuning preprocessing rath…

Review
pending
Role
unreviewed
Read
later