Daily - 2026-05-04

arxiv Score 21.2

Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

2026-05-01 · Derong Xu, Shuochen Liu, Pengfei Luo, Pengyue Jia, Yingyi Zhang, Yi Wen, Yimin Deng, Wenlin Zhang, Enhong Chen, Xiangyu Zhao, Tong Xu

General AI

Large language model (LLM) agents require long-term user memory for consistent personalization, but limited context windows hinder tracking evolving preferences over long interactions. Existing memory systems mainly rely on static, hand-crafted update rules; although reinforcement learning (RL)-based agents learn memor…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 19.2

Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

2026-05-01 · Siyuan Huang, Xiaoye Qu, Yafu Li, Tong Zhu, Zefeng He, Muxin Fu, Daizong Liu, Wei-Long Zheng, Yu Cheng

General AI

While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Dilution" phenomenon, where the accumulation of textual history expands the attention partition function, causing visual attention to decay inversely with generated sequence lengt…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.2

CustomDancer: Customized Dance Recommendation by Text-Dance Retrieval

2026-05-01 · Yawen Qin, Ke Qiu, Qin Zhang

General AI

Dance serves as both a cultural cornerstone and a medium for personal expression, yet the rapid growth of online dance content has made personalized discovery increasingly difficult. Text-based dance retrieval offers a natural interface for users to search with choreographic intent, but it remains underexplored because…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 17.2

RunAgent: Interpreting Natural-Language Plans with Constraint-Guided Execution

2026-05-01 · Arunabh Srivastava, Mohammad A., Khojastepour, Srimat Chakradhar, Sennur Ulukus

General AI

Humans solve problems by executing targeted plans, yet large language models (LLMs) remain unreliable for structured workflow execution. We propose RunAgent, a multi-agent plan execution platform that interprets natural-language plans while enforcing stepwise execution through constraints and rubrics. RunAgent bridges …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.2

Can Coding Agents Reproduce Findings in Computational Materials Science?

2026-05-01 · Ziyang Huang, Yi Cao, Ali K. Shargh, Jing Luo, Ruidong Mei, Mohd Zaki, Zhan Liu, Wyatt Bunstine, William Jurayj, Somdatta Goswami, Tyrel McQueen, Michael Shields, Jaafar El-Awady, Paulette Clancy, Benjamin Van Durme, Nicholas Andrews, William Walden, Daniel Khashabi

General AI

Large language models are increasingly deployed as autonomous coding agents and have achieved remarkably strong performance on software engineering benchmarks. However, it is unclear whether such success transfers to computational scientific workflows, where tasks require not only strong coding ability, but also the ab…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 16.2

Make Your LVLM KV Cache More Lightweight

2026-05-01 · Xihao Chen, Yangyang Guo, Roger Zimmermann

General AI

Key-Value (KV) cache has become a de facto component of modern Large Vision-Language Models (LVLMs) for inference. While it enhances decoding efficiency in Large Language Models (LLMs), its direct adoption in LVLMs introduces substantial GPU memory overhead due to the large number of vision tokens processed during the …

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 15.2

Self-Adaptive Multi-Agent LLM-Based Security Pattern Selection for IoT Systems

2026-05-01 · Saeid Jamshidi, Foutse Khomh, Carol Fung, Kawser Wazed Nafi

General AI

The adoption of Internet of Things (IoT) systems at the network edge of smart architectures is increasing rapidly, intensifying the need for security mechanisms that are both adaptive and resource-efficient. In such environments, runtime defence mechanisms are no longer limited to detection alone but become a resource-…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 15.0

From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills

2026-04-27 · Qiliang Liang, Hansi Wang, Zhong Liang, Yang Liu

General AI

LLM agents increasingly rely on reusable skills, capability packages that combine instructions, control flow, constraints, and tool calls. In most current agent systems, however, skills are still represented by text-heavy artifacts, including SKILL.md-style documents and structured records whose machine-usable evidence…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 13.4

Learning to Act and Cooperate for Distributed Black-Box Consensus Optimization

2026-05-01 · Zi-Bo Qin, Feng-Feng Wei, Tai-You Chen, Wei-Neng Chen

General AI

Distributed blackbox consensus optimization is a fundamental problem in multi-agent systems, where agents must improve a global objective using only local objective queries and limited neighbor communication. Existing methods largely rely on handcrafted update rules and static cooperation patterns, which often struggle…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.2

Let ViT Speak: Generative Language-Image Pre-training

2026-05-01 · Yan Fang, Mengcheng Lan, Zilong Huang, Weixian Lei, Yunqing Zhao, Yujie Zhong, Yingchen Yu, Qi She, Yao Zhao, Yunchao Wei

General AI

In this paper, we present \textbf{Gen}erative \textbf{L}anguage-\textbf{I}mage \textbf{P}re-training (GenLIP), a minimalist generative pretraining framework for Vision Transformers (ViTs) designed for multimodal large language models (MLLMs). To better align vision encoders with the autoregressive nature of LLMs, GenLI…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.2

Robust Multimodal Recommendation via Graph Retrieval-Enhanced Modality Completion

2026-05-01 · Yuan Li, Jun Hu, Jiaxin Jiang, Bryan Hooi, Bingsheng He

General AI

Multimodal data plays a critical role in web-based recommendation systems, where information from diverse modalities such as vision and text enhances representation learning. However, real-world multimodal datasets often suffer from modality incompleteness due to sensor failures, annotation scarcity, or privacy constra…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 13.2

When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models

2026-05-01 · Sailesh Panda, Pritam Kadasi, Abhishek Upperwal, Mayank Singh

General AI

Large language models (LLMs) often achieve strong performance on reasoning benchmarks, but final-answer accuracy alone does not show whether they faithfully execute the procedure specified in a prompt. We study this question through a controlled diagnostic benchmark for procedural execution, where models are given a st…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.9

SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters

2026-05-01 · Dongxin Guo, Jikun Wu, Siu Ming Yiu

Research Track B · General AI

AI agents execute tens to hundreds of chained LLM calls per task, yet GPU schedulers treat each call as independent, discarding gigabytes of intermediate state between steps and inflating end-to-end latency by 3-8x. We argue that this request-level abstraction is fundamentally mismatched to compound AI workloads, and p…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 12.2

Generating Statistical Charts with Validation-Driven LLM Workflows

2026-05-01 · Pavlin G. Poličar, Andraž Pevcin, Blaž Zupan

General AI

Generating diverse, readable statistical charts from tabular data remains challenging for LLMs, as many failures become apparent after rendering and are not detectable from data or code alone. Existing chart datasets also rarely provide fully aligned artifacts, such as executable code, dataset context, and question-ans…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.4

Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

2026-05-01 · Indraneil Paul, Glavaš Glavas, Iryna Gurevych

General AI

Reward models (RMs) have become an indispensable fixture of the language model (LM) post-training playbook, enabling policy alignment and test-time scaling. Research on the application of RMs in code generation, however, has been comparatively sparse, with existing work largely focusing on execution feedback. This choi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.2

FinSafetyBench: Evaluating LLM Safety in Real-World Financial Scenarios

2026-05-01 · Yutao Hou, Yihan Jiang, Yuhan Xie, Jian Yang, Liwen Zhang, Hailiang Huang, Guanhua Chen, Yun Chen

General AI

Large language models (LLMs) are increasingly applied in financial scenarios. However, they may produce harmful outputs, including facilitating illegal activities or unethical behavior, posing serious compliance risks. To systematically evaluate LLM safety in finance, we propose FinSafetyBench, a bilingual (English-Chi…

Review: pending
Role: unreviewed
Read: now

Open source Details

arxiv Score 11.2

Position: agentic AI orchestration should be Bayes-consistent

2026-05-01 · Theodore Papamarkou, Pierre Alquier, Matthias Bauer, Wray Buntine, Andrew Davison, Gintare Karolina Dziugaite, Maurizio Filippone, Andrew Y. K. Foong, Vincent Fortuin, Dimitris Fouskakis, Jes Frellsen, Eyke Hüllermeier, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Nikita Kotelevskii, Salem Lahlou, Yingzhen Li, Fang Liu, Clare Lyle, Thomas Möllenhoff, Konstantina Palla, Maxim Panov, Yusuf Sale, Kajetan Schweighofer, Artem Shelmanov, Siddharth Swaroop, Martin Trapp, Willem Waegeman, Andrew Gordon Wilson, Alexey Zaytsev

General AI

LLMs excel at predictive tasks and complex reasoning tasks, but many high-value deployments rely on decisions under uncertainty, for example, which tool to call, which expert to consult, or how many resources to invest. While the usefulness and feasibility of Bayesian approaches remain unclear for LLM inference, this p…

Review: pending
Role: unreviewed
Read: now

Open source Details

huggingface Score 11.0

AnalogRetriever: Learning Cross-Modal Representations for Analog Circuit Retrieval

2026-04-25 · Yihan Wang, Lei Li, Yao Lai, Jing Wang, Yan Lu

General AI

Analog circuit design relies heavily on reusing existing intellectual property (IP), yet searching across heterogeneous representations such as SPICE netlists, schematics, and functional descriptions remains challenging. Existing methods are largely limited to exact matching within a single modality, failing to capture…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 10.2

HyCOP: Hybrid Composition Operators for Interpretable Learning of PDEs

2026-05-01 · Jinpai Zhao, Nishant Panda, Yen Ting Lin, Eirik Valseth, Diane Oyen, Clint Dawson

General AI

We introduce HyCOP, a modular framework that learns parametric PDE solution operators by composing simple modules (advection, diffusion, learned closures, boundary handling) in a query-conditioned way. Rather than learning a monolithic map, HyCOP learns a policy over short programs - which module to apply and for how l…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 9.4

Online Self-Calibration Against Hallucination in Vision-Language Models

2026-05-01 · Minghui Chen, Chenxu Yang, Hengjie Zhu, Dayan Wu, Zheng Lin, Qingyi Si

General AI

Large Vision-Language Models (LVLMs) often suffer from hallucinations, generating descriptions that include visual details absent from the input image. Recent preference alignment methods typically rely on supervision distilled from stronger models such as GPT. However, this offline paradigm introduces a Supervision-Pe…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 9.4

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

2026-05-01 · Houyuan Chen, Hong Li, Xianghao Kong, Tianrui Zhu, Shaocong Xu, Weiqing Xiao, Yuwei Guo, Chongjie Ye, Lvmin Zhang, Hao Zhao, Anyi Rao

General AI

Recent progress has shown that video diffusion models (VDMs) can be repurposed for diverse multimodal graphics tasks. However, existing methods often train separate models for each problem setting, which fixes the input-output mapping and limits the modeling of correlations across modalities. We present UniVidX, a unif…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.2

Modeling Subjective Urban Perception with Human Gaze

2026-05-01 · Lin Che, Xi Wang, Marc Pollefeys, Konrad Schindler, Martin Raubal, Peter Kiefer

General AI

Urban perception describes how people subjectively evaluate urban environments, shaping how cities are experienced and understood. Existing computational approaches primarily model urban perception directly from street view images, but largely ignore the human perceptual process through which such judgments are formed.…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 8.2

When RAG Chatbots Expose Their Backend: An Anonymized Case Study of Privacy and Security Risks in Patient-Facing Medical AI

2026-05-01 · Alfredo Madrid-García, Miguel Rujas

General AI

Background: Patient-facing medical chatbots based on retrieval-augmented generation (RAG) are increasingly promoted to deliver accessible, grounded health information. AI-assisted development lowers the barrier to building them, but they still demand rigorous security, privacy, and governance controls. Objective: To re…

Review: pending
Role: unreviewed
Read: soon

Open source Details

huggingface Score 8.0

Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling

2026-04-26 · Zhen Ye, Xu Tan, Aoxiong Yin, Hongzhan Lin, Guangyan Zhang, Peiwen Sun, Yiming Li, Chi-Min Chan, Wei Ye, Shikun Zhang, Wei Xue

General AI

Joint audio-video generation models have shown that unified generation yields stronger cross-modal coherence than cascaded approaches. However, existing models couple modalities throughout denoising via pervasive attention, treating high-level semantics and low-level details in a fully entangled manner. This is subopti…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.2

EASE: Federated Multimodal Unlearning via Entanglement-Aware Anchor Closure

2026-05-01 · Zihao Ding, Beining Wu, Jun Huang

General AI

Federated Multimodal Learning (FML) trains multimodal models across decentralized clients while keeping their image-text pairs private. However, joint embedding training entangles forgotten knowledge across both modalities and client gradient subspaces, hindering federated unlearning. Previous federated unlearning appr…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.2

GMGaze: MoE-Based Context-Aware Gaze Estimation with CLIP and Multiscale Transformer

2026-05-01 · Xinyuan Zhao, Yihang Wu, Ahmad Chaddad, Sarah A. Alkhodair, Reem Kateb

General AI

Gaze estimation methods commonly use facial appearances to predict the direction of a person gaze. However, previous studies show three major challenges with convolutional neural network (CNN)-based, transformer-based, and contrastive language-image pre-training (CLIP)-based methods, including late fusion of image feat…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 7.2

Meritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Values

2026-05-01 · Shradha Sharma, Swapnil Dhamal, Shweta Jain

General AI

We propose a new framework for meritocratic fairness in budgeted combinatorial multi-armed bandits with full-bandit feedback (BCMAB-FBF). Unlike semi-bandit feedback, the contribution of individual arms is not received in full-bandit feedback, making the setting significantly more challenging. To compute arm contributi…

Review: pending
Role: unreviewed
Read: soon

Open source Details

arxiv Score 6.2

Deep Kernel Learning for Stratifying Glaucoma Trajectories

2026-05-01 · Bruce Rushing, Angela Danquah, Alireza Namazi, Arjun Dirghangi, Heman Shakeri

General AI

Effectively stratifying patient risk in chronic diseases like glaucoma is a major clinical challenge. Clinicians need tools to identify patients at high risk of progression from sparse and irregularly-sampled electronic health records (EHRs). We propose a novel deep kernel learning (DKL) architecture that leverages a G…

Review: pending
Role: unreviewed
Read: later

Open source Details

arxiv Score 6.2

NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search

2026-05-01 · Sizhe Tang, Zuyuan Zhang, Mahdi Imani, Tian Lan

General AI

Monte Carlo Tree Search (MCTS) scales poorly in cooperative multi-agent domains because expansion must consider an exponentially large set of joint actions, severely limiting exploration under realistic search budgets. We propose NonZero, which keeps multi-agent MCTS tractable by running surrogate-guided selection over…

Review: pending
Role: unreviewed
Read: later

Open source Details

arxiv Score 6.2

PhysEdit: Physically-Consistent Region-Aware Image Editing via Adaptive Spatio-Temporal Reasoning

2026-05-01 · Guandong Li, Mengxia Ye

General AI

Image editing instructions are heterogeneous: a color swap, an object insertion, and a physical-action edit all demand different spatial coverage and different reasoning depth, yet existing reasoning-based editors apply a single fixed inference recipe to every instruction. We argue that adaptivity along both the spatia…

Review: pending
Role: unreviewed
Read: later

Open source Details

huggingface Score 5.4

Learning while Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies

2026-05-01 · Yi Wang, Xinchen Li, Pengwei Xie, Pu Yang, Buqing Nie, Yunuo Cai, Qinglin Zhang, Chendi Qu, Jeffrey Wu, Jianheng Song, Xinlin Ren, Jingshun Huang, Mingjie Pan, Siyuan Feng, Zhi Chen, Jianlan Luo

General AI

Generalist robot policies increasingly benefit from large-scale pretraining, but offline data alone is insufficient for robust real-world deployment. Deployed robots encounter distribution shifts, long-tail failures, task variations, and human correction opportunities that fixed demonstration datasets cannot fully capt…

Review: pending
Role: unreviewed
Read: later

Open source Details

arxiv Score 5.2

Penalized Likelihood for Dyadic Network Formation Models with Degree Heterogeneity

2026-05-01 · Zizhong Yan, Jingrong Li, Yi Zhang

General AI

Estimating network formation models with degree heterogeneity raises two problems in empirical networks. First, agents that send no links, receive no links, or link to all remaining agents can make the fixed-effects MLE fail to exist. Trimming these agents changes the estimation sample and induces selection bias. Secon…

Review: pending
Role: unreviewed
Read: later

Open source Details

arxiv Score 5.2

Posterior Augmented Flow Matching

2026-05-01 · George Stoica, Sayak Paul, Matthew Wallingford, Vivek Ramanujan, Abhay Nori, Winson Han, Ali Farhadi, Ranjay Krishna, Judy Hoffman

General AI

Flow matching (FM) trains a time-dependent vector field that transports samples from a simple prior to a complex data distribution. However, for high-dimensional images, each training sample supervises only a single trajectory and intermediate point, yielding an extremely sparse and high-variance training signal. This …

Review: pending
Role: unreviewed
Read: later

Open source Details

arxiv Score 5.2

Simpson's paradox explains the ubiquity of nonlinear, threshold, and complex contagions

2026-05-01 · Laurent Hébert-Dufresne, Antoine Allard, Jean-Gabriel Young, William H. W. Thompson, Guillaume St-Onge

General AI

Complex contagions describe systems where the probability or rate of contagious transmission is a nonlinear function of the exposure to contagious agents. These models were first studied theoretically but have since been used to capture effects such as nonconformism, social reinforcement or peer pressure in empirical d…

Review: pending
Role: unreviewed
Read: later

Open source Details

arxiv Score 5.2

Unsupervised Denoising of Real Clinical Low Dose Liver CT with Perceptual Attention Networks

2026-05-01 · Jingxi Pu, Tonghua Liu, Zhilin Guan, Siqiao Li, Yang Ming, Zheng Cong, Wei Zhang, Fangwei Li

General AI

With the development of deep learning, medical image processing has been widely used to assist clinical research. This paper focuses on the denoising problem of low-dose computed tomography using deep learning. Although low-dose computed tomography reduces radiation exposure to patients, it also introduces more noise, …

Review: pending
Role: unreviewed
Read: later

Open source Details

Daily Digest - 2026-05-04

Daily Archives

Research Workflow

Papers

Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

CustomDancer: Customized Dance Recommendation by Text-Dance Retrieval

RunAgent: Interpreting Natural-Language Plans with Constraint-Guided Execution

Can Coding Agents Reproduce Findings in Computational Materials Science?

Make Your LVLM KV Cache More Lightweight

Self-Adaptive Multi-Agent LLM-Based Security Pattern Selection for IoT Systems

From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills

Learning to Act and Cooperate for Distributed Black-Box Consensus Optimization

Let ViT Speak: Generative Language-Image Pre-training

Robust Multimodal Recommendation via Graph Retrieval-Enhanced Modality Completion

When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models

SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters

Generating Statistical Charts with Validation-Driven LLM Workflows

Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

FinSafetyBench: Evaluating LLM Safety in Real-World Financial Scenarios

Position: agentic AI orchestration should be Bayes-consistent

AnalogRetriever: Learning Cross-Modal Representations for Analog Circuit Retrieval

HyCOP: Hybrid Composition Operators for Interpretable Learning of PDEs

Online Self-Calibration Against Hallucination in Vision-Language Models

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

Modeling Subjective Urban Perception with Human Gaze

When RAG Chatbots Expose Their Backend: An Anonymized Case Study of Privacy and Security Risks in Patient-Facing Medical AI

Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling

EASE: Federated Multimodal Unlearning via Entanglement-Aware Anchor Closure

GMGaze: MoE-Based Context-Aware Gaze Estimation with CLIP and Multiscale Transformer

Meritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Values

Deep Kernel Learning for Stratifying Glaucoma Trajectories

NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search

PhysEdit: Physically-Consistent Region-Aware Image Editing via Adaptive Spatio-Temporal Reasoning

Learning while Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies

Penalized Likelihood for Dyadic Network Formation Models with Degree Heterogeneity

Posterior Augmented Flow Matching

Simpson's paradox explains the ubiquity of nonlinear, threshold, and complex contagions

Unsupervised Denoising of Real Clinical Low Dose Liver CT with Perceptual Attention Networks

No papers match the current view