Research Paper Cockpit

Daily Digest - 2026-04-29

Papers first seen in this daily snapshot.

Daily Archives

Quick jump into generated daily digests.

Research Workflow

Latest digest: 2026-05-13.

Papers

48 visible entries

arxiv Score 24.0

TSN-Affinity: Similarity-Driven Parameter Reuse for Continual Offline Reinforcement Learning

2026-04-28 · Dominik Żurek, Kamil Faber, Marcin Pietron, Paweł Gajewski, Roberto Corizzo

Research Track A · General AI

Continual offline reinforcement learning (CORL) aims to learn a sequence of tasks from datasets collected over time while preserving performance on previously learned tasks. This setting corresponds to domains where new tasks arise over time, but adapting the model in live environment interactions is expensive, risky, …

Review
pending
Role
unreviewed
Read
now
arxiv Score 18.8

Odysseys: Benchmarking Web Agents on Realistic Long Horizon Tasks

2026-04-27 · Lawrence Keunho Jang, Jing Yu Koh, Daniel Fried, Ruslan Salakhutdinov

Research Track B · General AI

Existing web agent benchmarks have largely converged on short, single-site tasks that frontier models are approaching saturation on. However, real world web use consists of long-horizon, multi-site workflows. Common web navigation tasks, such as comparing products across different domains, planning trips across multipl…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.8

From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling

2026-04-28 · Jianghao Lin, Zi Ling, Chenyu Zhou, Tianyi Xu, Ruoqing Jiang, Zizhuo Wang, Dongdong Ge

General AI

Optimization modeling underpins real-world decision-making in logistics, manufacturing, energy, and public services, but reliably solving such problems from natural-language requirements remains challenging for current large language models (LLMs). In this paper, we propose \emph{Agora-Opt}, a modular agentic framework…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.8

RADD: Retrieval-Augmented Discrete Diffusion for Multi-Modal Knowledge Graph Completion

2026-04-28 · Guanglin Niu, Bo Li

General AI

Most multi-modal knowledge graph completion (MMKGC) models use one embedding scorer to do both retrieval over the full entity set and final decision making. We argue that this coupling is a core bottleneck: global high-recall search and local fine-grained disambiguation require different inductive biases. Therefore, we…

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.8

Recursive Multi-Agent Systems

2026-04-28 · Xiyuan Yang, Jiaru Zou, Rui Pan, Ruizhong Qiu, Pan Lu, Shizhe Diao, Jindong Jiang, Hanghang Tong, Tong Zhang, Markus J. Buehler, Jingrui He, James Zou

General AI

Recursive or looped language models have recently emerged as a new scaling axis by iteratively refining the same model computation over latent states to deepen reasoning. We extend such scaling principle from a single model to multi-agent systems, and ask: Can agent collaboration itself be scaled through recursion? To …

Review
pending
Role
unreviewed
Read
now
arxiv Score 16.8

Toward Multimodal Conversational AI for Age-Related Macular Degeneration

2026-04-28 · Ran Gu, Benjamin Hou, Mélanie Hébert, Asmita Indurkar, Yifan Yang, Emily Y. Chew, Tiarnán D. L. Keenan, Zhiyong Lu

General AI

Despite strong performance of deep learning models in retinal disease detection, most systems produce static predictions without clinical reasoning or interactive explanation. Recent advances in multimodal large language models (MLLMs) integrate diagnostic predictions with clinically meaningful dialogue to support clin…

Review
pending
Role
unreviewed
Read
now
arxiv Score 15.8

CoRE: Concept-Reasoning Expansion for Continual Brain Lesion Segmentation

2026-04-28 · Qianqian Chen, Anglin Liu, Jingyang Zhang, Yudong Zhang

Research Track A · General AI

Accurate brain lesion segmentation in MRI is vital for effective clinical diagnosis and treatment planning. Due to high annotation costs and strict data privacy regulations, universal models require employing Continual Learning (CL) to adapt to evolving clinical tasks without losing previously acquired knowledge. Howev…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.0

Co-Director: Agentic Generative Video Storytelling

2026-04-27 · Yale Song, Yiwen Song, Nick Losier, Nathan Hodson, Ye Jin, Rhyard Zhu, Yan Xu, Daniel Vlasic, Carina Claassen, Jasmine Leon, Khanh G. LeViet, Zack Chomyn, Joe Timmons, Brett Slatkin, Scott Penberthy, Tomas Pfister

General AI

While diffusion models generate high-fidelity video clips, transforming them into coherent storytelling engines remains challenging. Current agentic pipelines automate this via chained modules but suffer from semantic drift and cascading failures due to independent, handcrafted prompting. We present Co-Director, a hier…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.0

Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora

2026-04-27 · Chenkai Pan, Xinglong Xu, Yuhang Xu, Yujun Wu, Siyuan Li, Jintao Chen, Conghui He, Jingxuan Wei, Cheng Tan

General AI

Reliably transferring specialized human knowledge from text into large language models remains a fundamental challenge in artificial intelligence. Fine-tuning on domain corpora has enabled substantial capability gains, but the process operates without feedback: when a model fails on a domain task, there is no method to…

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.0

TCOD: Exploring Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents

2026-04-27 · Jiaqi Wang, Wenhao Zhang, Weijie Shi, Yaliang Li, James Cheng

General AI

On-policy distillation (OPD) has shown strong potential for transferring reasoning ability from frontier or domain-specific models to smaller students. While effective on static single-turn tasks, its behavior in multi-turn agent settings remains underexplored. In this work, we identify a key limitation of vanilla OPD …

Review
pending
Role
unreviewed
Read
now
huggingface Score 15.0

AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery

2026-04-28 · Lei Xiong, Kun Luo, Ziyi Xia, Wenbo Zhang, Jin-Ge Yao, Zheng Liu, Jingying Shao, Jianlyu Chen, Hongjin Qian, Xi Yang, Qian Yu, Hao Li, Chen Yue, Xiaan Du, Yuyang Wang, Yesheng Liu, Haiyu Xu, Zhicheng Dou

General AI

Autonomous scientific research is significantly advanced thanks to the development of AI agents. One key step in this process is finding the right scientific literature, whether to explore existing knowledge for a research problem, or to acquire evidence for verifying assumptions and supporting claims. To assess AI age…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

MARD: A Multi-Agent Framework for Robust Android Malware Detection

2026-04-28 · Xueying Zeng, Youquan Xian, Sihao Liu, Xudong Mou, Yanze Li, Lei Cui, Bo Li

Research Track A · General AI

With the rapid evolution of Android applications, traditional machine learning-based detection models suffer from concept drift. Additionally, they are constrained by shallow features, lacking deep semantic understanding and interpretability of decisions. Although Large Language Models (LLMs) demonstrate remarkable sem…

Review
pending
Role
unreviewed
Read
now
arxiv Score 14.8

SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents

2026-04-28 · Mengyao Du, Han Fang, Haokai Ma, Jiahao Chen, Kai Xu, Quanjun Yin, Ee-Chien Chang

Research Track B · General AI

Web agents have emerged as an effective paradigm for automating interactions with complex web environments, yet remain vulnerable to prompt injection attacks that embed malicious instructions into webpage content to induce unintended actions. This threat is further amplified for screenshot-based web agents, which opera…

Review
pending
Role
unreviewed
Read
now
huggingface Score 14.0

Step-Audio-R1.5 Technical Report

2026-04-28 · Yuxin Zhang, Xiangyu Tony Zhang, Daijiao Liu, Fei Tian, Yayue Deng, Jun Chen, Qingjian Lin, Haoyang Zhang, Yuxin Li, Jinglan Gong, Yechang Huang, Liang Zhao, Chengyuan Yao, Hexin Liu, Eng Siong Chng, Xuerui Yang, Gang Yu, Xiangyu Zhang, Daxin Jiang

General AI

Recent advancements in large audio language models have extended Chain-of-Thought (CoT) reasoning into the auditory domain, enabling models to tackle increasingly complex acoustic and spoken tasks. To elicit and sustain these extended reasoning chains, the prevailing paradigm -- driven by the success of text-based reas…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

ADEMA: A Knowledge-State Orchestration Architecture for Long-Horizon Knowledge Synthesis with LLMAgents

2026-04-28 · Zhou Hanlin, Chan Huah Yong

General AI

Long-horizon LLM tasks often fail not because a single answer is unattainable, but because knowledge states drift across rounds, intermediate commitments remain implicit, and interruption fractures the evolving evidence chain. This paper presents ADEMA as a knowledge-state orchestration architecture for long-horizon kn…

Review
pending
Role
unreviewed
Read
now
arxiv Score 13.8

SIEVES: Selective Prediction Generalizes through Visual Evidence Scoring

2026-04-28 · Hector G. Rodriguez, Marcus Rohrbach

General AI

Multimodal large language models (MLLMs) achieve ever-stronger performance on visual-language tasks. Even as traditional visual question answering benchmarks approach saturation, reliable deployment requires satisfying low error tolerances in real-world out-of-distribution (OOD) scenarios. Precisely, selective predicti…

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

CGU-ILALab at FoodBench-QA 2026: Comparing Traditional and LLM-based Approaches for Recipe Nutrient Estimation

2026-04-28 · Wei-Chun Chen, Yu-Xuan Chen, I-Fang Chung, Ying-Jia Lin

General AI

Accurate nutrient estimation from unstructured recipe text is an important yet challenging problem in dietary monitoring, due to ambiguous ingredient terminology and highly variable quantity expressions. We systematically evaluate models spanning a wide range of representational capacity, from lexical matching methods …

Review
pending
Role
unreviewed
Read
now
arxiv Score 12.8

MarkIt: Training-Free Visual Markers for Precise Video Temporal Grounding

2026-04-28 · Pengcheng Fang, Yuxia Chen, Xiaohao Cai

General AI

Video temporal grounding (VTG) aims to localize the start and end timestamps of the event described by a given query within an untrimmed video. Despite the strong open-world video understanding and recognition ability of video language large models (Vid-LLMs), outputting precise temporal grounding information remains c…

Review
pending
Role
unreviewed
Read
now
huggingface Score 12.0

GoClick: Lightweight Element Grounding Model for Autonomous GUI Interaction

2026-04-27 · Hongxin Li, Yuntao Chen, Zhaoxiang Zhang

Research Track B · General AI

Graphical User Interface (GUI) element grounding (precisely locating elements on screenshots based on natural language instructions) is fundamental for agents interacting with GUIs. Deploying this capability directly on resource-constrained devices like mobile phones is increasingly critical for GUI agents requiring lo…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.0

Seeing Isn't Believing: Uncovering Blind Spots in Evaluator Vision-Language Models

2026-04-23 · Mohammed Safi Ur Rahman Khan, Sanjay Suryanarayanan, Tushar Anand, Mitesh M. Khapra

General AI

Large Vision-Language Models (VLMs) are increasingly used to evaluate outputs of other models, for image-to-text (I2T) tasks such as visual question answering, and text-to-image (T2I) generation tasks. Despite this growing reliance, the reliability of these Evaluator VLMs remains under explored. In this work, we system…

Review
pending
Role
unreviewed
Read
now
huggingface Score 11.0

AutoGUI-v2: A Comprehensive Multi-Modal GUI Functionality Understanding Benchmark

2026-04-27 · Hongxin Li, Xiping Wang, Jingran Su, Zheng Ju, Yuntao Chen, Qing Li, Zhaoxiang Zhang

Research Track B · General AI

Autonomous agents capable of navigating Graphical User Interfaces (GUIs) hold the potential to revolutionize digital productivity. However, achieving true digital autonomy extends beyond reactive element matching; it necessitates a predictive mental model of interface dynamics and the ability to foresee the "digital wo…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses

2026-04-28 · Jiahang Lin, Shichun Liu, Chengjun Pan, Lizhi Lin, Shihan Dou, Xuanjing Huang, Hang Yan, Zhenhua Han, Tao Gui

General AI

Harnesses have become a central determinant of coding-agent performance, shaping how models interact with repositories, tools, and execution environments. Yet automating harness engineering is hard: a heterogeneous action space, sparse and noisy evaluation signal, multi-million-token trajectories, and edits whose effec…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models

2026-04-28 · Ajmain Inqiad Alam, Palash Roy, Chanchal K. Roy, Banani Roy, Kevin A. Schneider

General AI

The accelerating adoption of Large Language Models (LLMs) in software engineering (SE) has brought with it a silent crisis: unsustainable computational cost. While these models demonstrate remarkable capabilities in different SE tasks, they are unmanageably large, slow to deploy, memory-intensive, and carbon-heavy. Thi…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios

2026-04-28 · Jinxiang Meng, Shaoping Huang, Fangyu Lei, Jingyu Guo, Haoxiang Liu, Jiahao Su, Sihan Wang, Yao Wang, Enrui Wang, Ye Yang, Hongze Chai, Jinming Lv, Anbang Yu, Huangjing Zhang, Yitong Zhang, Yiming Huang, Zeyao Ma, Shizhu He, Jun Zhao, Kang Liu

General AI

Real-world data visualization (DV) requires native environmental grounding, cross-platform evolution, and proactive intent alignment. Yet, existing benchmarks often suffer from code-sandbox confinement, single-language creation-only tasks, and assumption of perfect intent. To bridge these gaps, we introduce DV-World, a…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

Explainable AI for Jet Tagging: A Comparative Study of GNNExplainer, GNNShap, and GradCAM for Jet Tagging in the Lund Jet Plane

2026-04-28 · Pahal D. Patel, Sanmay Ganguly

General AI

Graph neural networks such as ParticleNet and transformer based networks on point clouds such as ParticleTransformer achieve state-of-the-art performance on jet tagging benchmarks at the Large Hadron Collider, yet the physical reasoning behind their predictions remains opaque. We present different methods, i.e. perturb…

Review
pending
Role
unreviewed
Read
now
arxiv Score 10.8

From Threads to Trajectories: A Multi-LLM Pipeline for Community Knowledge Extraction from GitHub Issue Discussions

2026-04-28 · Nazia Shehnaz Joynab, Soneya Binta Hossain

General AI

Resolution of complex post-production issues in large-scale open-source software (OSS) projects requires significant cognitive effort, as developers need to go through long, unstructured and fragmented issue discussion threads before that. In this paper, we present SWE-MIMIC-Bench, an issue trajectory dataset generated…

Review
pending
Role
unreviewed
Read
now
huggingface Score 10.0

BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate

2026-04-28 · Arnon Mazza, Elad Levi

General AI

Deploying guardrails for custom policies remains challenging, as generic safety models fail to capture task-specific requirements, while prompting LLMs suffers from inconsistent boundary-case performance and high inference costs. Training custom classifiers achieves both accuracy and efficiency, yet demands substantial…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.8

How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum

2026-04-28 · Chu-Cheng Lin, Eugene Ie

General AI

Adapting reasoning models to new tasks during post-training with only output-level supervision stalls under reinforcement learning from verifiable rewards (RLVR) when the initial success probability $p_0$ is small. Using the Tsallis $q$-logarithm, we define a loss family $J_Q$ that interpolates between RLVR (at $q{=}0$…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.8

Learning Generalizable Multimodal Representations for Software Vulnerability Detection

2026-04-28 · Zeming Dong, Yuejun Guo, Qiang Hu, Yao Zhang, Maxime Cordy, Hao Liu, Mike Papadakis, Yongqiang Lyu

General AI

Source code and its accompanying comments are complementary yet naturally aligned modalities-code encodes structural logic while comments capture developer intent. However, existing vulnerability detection methods mostly rely on single-modality code representations, overlooking the complementary semantic information em…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.8

Three Models of RLHF Annotation: Extension, Evidence, and Authority

2026-04-28 · Steve Coyne

General AI

Preference-based alignment methods, most prominently Reinforcement Learning with Human Feedback (RLHF), use the judgments of human annotators to shape large language model behaviour. However, the normative role of these judgments is rarely made explicit. I distinguish three conceptual models of that role. The first is …

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.8

Towards Agentic Investigation of Security Alerts

2026-04-28 · Even Eilertsen, Vasileios Mavroeidis, Gudmund Grov

General AI

Security analysts are overwhelmed by the volume of alerts and the low context provided by many detection systems. Early-stage investigations typically require manual correlation across multiple log sources, a task that is usually time-consuming. In this paper, we present an experimental, agentic workflow that leverages…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.8

Variational Neural Belief Parameterizations for Robust Dexterous Grasping under Multimodal Uncertainty

2026-04-28 · Clinton Enwerem, Shreya Kalyanaraman, John S. Baras, Calin Belta

General AI

Contact variability, sensing uncertainty, and external disturbances make grasp execution stochastic. Expected-quality objectives ignore tail outcomes and often select grasps that fail under adverse contact realizations. Risk-sensitive POMDPs address this failure mode, but many use particle-filter beliefs that scale poo…

Review
pending
Role
unreviewed
Read
now
arxiv Score 9.3

QCalEval: Benchmarking Vision-Language Models for Quantum Calibration Plot Understanding

2026-04-28 · Shuxiang Cao, Zijian Zhang, Abhishek Agarwal, Grace Bratrud, Niyaz R. Beysengulov, Daniel C. Cole, Alejandro Gómez Frieiro, Elena O. Glen, Hao Hsu, Gang Huang, Raymond Jow, Greshma Shaji, Tom Lubowe, Ligeng Zhu, Luis Mantilla Calderón, Nicola Pancotti, Joel Pendleton, Brandon Severin, Charles Etienne Staub, Sara Sussman, Antti Vepsäläinen, Neel Rajeshbhai Vora, Yilun Xu, Varinia Bernales, Daniel Bowring, Elica Kyoseva, Ivan Rungger, Giulia Semeghini, Sam Stanwyck, Timothy Costa, Alán Aspuru-Guzik, Krysta Svore

Research Track A · General AI

Quantum computing calibration depends on interpreting experimental data, and calibration plots provide the most universal human-readable representation for this task, yet no systematic evaluation exists of how well vision-language models (VLMs) interpret them. We introduce QCalEval, the first VLM benchmark for quantum …

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers

2026-04-28 · Jan Dubiński, Jan Betley, Anna Sztyber-Betley, Daniel Tan, Owain Evans

General AI

Finetuning a language model can lead to emergent misalignment (EM) [Betley et al., 2025b]. Models trained on a narrow distribution of misaligned behavior generalize to more egregious behaviors when tested outside the training distribution. We study a set of interventions proposed to reduce EM. We confirm that these int…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Luminol-AIDetect: Fast Zero-shot Machine-Generated Text Detection based on Perplexity under Text Shuffling

2026-04-28 · Lucio La Cava, Andrea Tagarelli

General AI

Machine-generated text (MGT) detection requires identifying structurally invariant signals across generation models, rather than relying on model-specific fingerprints. In this respect, we hypothesize that while large language models excel at local semantic consistency, their autoregressive nature results in a specific…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 8.8

Semi-Markov Reinforcement Learning for City-Scale EV Ride-Hailing with Feasibility-Guaranteed Actions

2026-04-28 · An Nguyen, Hoang Nguyen, Phuong Le, Hung Pham, Cuong Do, Laurent El Ghaoui

General AI

We study city-scale control of electric-vehicle (EV) ride-hailing fleets where dispatch, repositioning, and charging decisions must respect charger and feeder limits under uncertain, spatially correlated demand and travel times. We formulate the problem as a hex-grid semi-Markov decision process (semi-MDP) with mixed a…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.0

A Systematic Post-Train Framework for Video Generation

2026-04-28 · Zeyue Xue, Siming Fu, Jie Huang, Shuai Lu, Haoran Li, Yijun Liu, Yuming Li, Xiaoxuan He, Mengzhao Chen, Haoyang Huang, Nan Duan, Ping Luo

General AI

While large-scale video diffusion models have demonstrated impressive capabilities in generating high-resolution and semantically rich content, a significant gap remains between their pretraining performance and real-world deployment requirements due to critical issues such as prompt sensitivity, temporal inconsistency…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 8.0

Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models

2026-04-28 · Jiayi Guo, Linqing Wang, Jiangshan Wang, Yang Yue, Zeyu Liu, Zhiyuan Zhao, Qinglin Lu, Gao Huang, Chunyu Wang

General AI

Unified multimodal models (UMMs) integrate visual understanding and generation within a single framework. For text-to-image (T2I) tasks, this unified capability allows UMMs to refine outputs after their initial generation, potentially extending the performance upper bound. Current UMM-based refinement methods primarily…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 7.8

The Nonverbal Syntax Framework: An Evidence-Based Tiered System for Inferring Learner States from Observable Behavioral Cues

2026-04-28 · Sherzod Turaev, Mary John, Jaloliddin Rustamov, Zahiriddin Rustamov, Saja Aldabet, Nazar Zaki, Khaled Shuaib

General AI

Understanding learners' cognitive and affective states underpins adaptive educational systems and effective teaching. Although research links nonverbal cues to internal states, no framework calibrates them to evidence. We present the Nonverbal Syntax Framework, drawn from a systematic review of 908 studies and 17,043 c…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 6.8

A systematic literature Review for Transformer-based Software Vulnerability detection

2026-04-27 · Fiza Naseer, Javed Ali Khan, Muhammad Yaqoob, Alexios Mylonas, Ishaya Gambo

General AI

Context: Software vulnerabilities pose significant security threats to software systems, especially as software is increasingly used across many areas of daily life, including health, government, and finance. Recently, transformer-based models have demonstrated promising results in automatic software vulnerability iden…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.0

IAM: Identity-Aware Human Motion and Shape Joint Generation

2026-04-28 · Wenqi Jia, Zekun Li, Abhay Mittal, Chengcheng Tang, Chuan Guo, Lezi Wang, James Matthew Rehg, Lingling Tao, Size An

General AI

Recent advances in text-driven human motion generation enable models to synthesize realistic motion sequences from natural language descriptions. However, most existing approaches assume identity-neutral motion and generate movements using a canonical body representation, ignoring the strong influence of body morpholog…

Review
pending
Role
unreviewed
Read
soon
huggingface Score 6.0

Toward Scalable Terminal Task Synthesis via Skill Graphs

2026-04-28 · Zhiyuan Fan, Tinghao Yu, Yuanjun Cai, Jiangtao Guan, Yun Yang, Dingxin Hu, Jiang Zhou, Xing Wu, Zhuo Han, Feng Zhang, Lilin Wang

General AI

Terminal agents have demonstrated strong potential for autonomous command-line execution, yet their training remains constrained by the scarcity of high-quality and diverse execution trajectories. Existing approaches mitigate this bottleneck by synthesizing large-scale terminal task instances for trajectory sampling. H…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 5.8

From Syntax to Emotion: A Mechanistic Analysis of Emotion Inference in LLMs

2026-04-28 · Bangzhao Shu, Arinjay Singh, Mai ElSherief

General AI

Large language models (LLMs) are increasingly used in emotionally sensitive human-AI applications, yet little is known about how emotion recognition is internally represented. In this work, we investigate the internal mechanisms of emotion recognition in LLMs using sparse autoencoders (SAEs). By analyzing sparse featur…

Review
pending
Role
unreviewed
Read
soon
arxiv Score 4.8

Dynamic Decision Learning: Test-Time Evolution for Abnormality Grounding in Rare Diseases

2026-04-27 · Jun Li, Mingxuan Liu, Jiazhen Pan, Che Liu, Wenjia Bai, Cosmin I. Bercea, Julia A. Schnabel

General AI

Clinical abnormality grounding for rare diseases is often hindered by data scarcity, making supervised fine-tuning impractical and single-pass inference highly unstable. We propose Dynamic Decision Learning (DDL), a framework that enables frozen large vision-language models (LVLMs) to refine their decisions across both…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.8

Personalized Multi-Interest Modeling for Cross-Domain Recommendation to Cold-Start Users

2026-04-28 · Xiaodong Li, Jiawei Sheng, Jiangxia Cao, Xinghua Zhang, Wenyuan Zhang, Yong Sun, Shirui Pan, Zhihong Tian, Tingwen Liu

General AI

Cross-domain recommendation (CDR) has demonstrated to be an effective solution for alleviating the user cold-start issue. By leveraging rich user-item interactions available in a richly informative source domain, CDR could improve the recommendation performance for cold-start users in the target domain. Previous CDR ap…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.8

Pythia: Toward Predictability-Driven Agent-Native LLM Serving

2026-04-28 · Shan Yu, Junyi Shu, Yuanjiang Ni, Kun Qian, Xue Li, Yang Wang, Jinyuan Zhang, Ziyi Xu, Shuo Yang, Lingjun Zhu, Ennan Zhai, Qingda Lu, Jiarong Xing, Youyou Lu, Xin Jin, Xuanzhe Liu, Harry Xu

General AI

As LLM applications grow more complex, developers are increasingly adopting multi-agent architectures to decompose workflows into specialized, collaborative components, introducing structure that constrains agent behavior and exposes useful semantic predictability. Unlike traditional LLM serving, which operates under h…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.8

Slice Agent: Identifying and Isolating Slices in Shared Open Radio Unit

2026-04-28 · Felipe Arnholda, Flavio Rocha, Lucio Prade, Cristiano Bonato Both

General AI

Network Slice as a Service (NSaaS) is a key enabler of Beyond Fifth Generation (5G) and Sixth Generation (6G) networks, supporting next-generation applications such as extended reality (XR), immersive services, and the tactile Internet. These networks must provide native support for slice-aware services across the enti…

Review
pending
Role
unreviewed
Read
later
arxiv Score 4.8

Towards Unified Multi-task EEG Analysis with Low-Rank Adaptation

2026-04-28 · Sicheng Dai, Kai Chen, Hongwang Xiao, Shan Yu, Qiwei Ye

General AI

Recent self-supervised pre-training methods for electroencephalogram (EEG) have shown promising results. However, the pre-trained models typically require full fine-tuning on each downstream task individually to achieve good performance. In practical applications involving multiple tasks, utilizing a separate model for…

Review
pending
Role
unreviewed
Read
later