arxiv
Score 35.5
2026-04-15 · Noureddine Kermiche
Research Track A · General AI
Catastrophic forgetting remains a primary hurdle in sequential task learning for artificial neural networks. We propose a silicon-native modular architecture that achieves structural parameter isolation using Task-Specific Experts and a distributed, outlier-based Gatekeeper. Moving beyond traditional sequential consoli…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 30.5
2026-03-12 · Jiaheng Hu, Jay Shim, Chen Tang, Yoonchang Sung, Bo Liu, Peter Stone, Roberto Martin-Martin
Research Track A · General AI
Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual learning suggests that naive Sequential Fine-Tuning (Seq. FT) leads to catastrophi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 30.0
2026-04-17 · Alexandra Dragomir, Ioana Pintilie, Antonio Barbalau, Marius Dragoi, Florin Brad, Cristian Daniel Paduraru, Alexandru Tifrea, Elena Burceanu, Radu Tudor Ionescu
Research Track A · General AI
Adapter-based methods have become a cost-effective approach to continual learning (CL) for Large Language Models (LLMs), by sequentially learning a low-rank update matrix for each task. To mitigate catastrophic forgetting, state-of-the-art approaches impose constraints on new adapters with respect to the previous ones,…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 29.0
2026-04-09 · Xing Han Lù, Siva Reddy
Research Track B · General AI
Frontier LLMs can navigate complex websites, but their cost and reliance on third-party APIs make local deployment impractical. We introduce Agent-as-Annotators, a framework that structures synthetic trajectory generation for web agents by analogy to human annotation roles, replacing the Task Designer, Annotator, and S…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 28.0
2026-03-20 · Zijian Lu, Yiping Zuo, Yupeng Nie, Xin He, Weibei Fan, Chen Dai, Lianyong Qi, Shi Jin
Research Track B · General AI
Despite rapid progress in multimodal GUI agents, reusable skill acquisition remains difficult because on-demand generated skills often leave action semantics, state assumptions, and success criteria implicit. This makes them brittle to execution errors, hard to verify, and difficult to repair. We present ContractSkill,…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 27.3
2026-04-22 · Pavel Salovskii, Iuliia Gorshkova
General AI
This paper presents a hybrid architecture for intelligent systems in which large language models (LLMs) are extended with an external ontological memory layer. Instead of relying solely on parametric knowledge and vector-based retrieval (RAG), the proposed approach constructs and maintains a structured knowledge graph …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 26.8
2026-03-31 · Yinuo Liu, Zi Qian, Heng Zhou, Jiahao Zhang, Yajie Zhang, Zhihang Li, Mengyu Zhou, Erchao Zhao, Xiaoxi Jiang, Guanjun Jiang
General AI
Interleaved text-and-image generation represents a significant frontier for Multimodal Large Language Models (MLLMs), offering a more intuitive way to convey complex information. Current paradigms rely on either image generation or retrieval augmentation, yet they typically treat the two as mutually exclusive paths, fa…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 26.4
2026-05-01 · Beining Wu, Zihao Ding, Jun Huang
Research Track A · General AI
While current federated multimodal continual learning over mixture-of-experts low-rank adaptation (MoE-LoRA) is built on the unverified assumption that routing isolates task-specific knowledge into disjoint experts, we argue that routing operates per-sample, while forgetting accumulates across the task sequence, and gr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 26.0
2026-04-10 · Xingyu Shao, Zhiqiang Yan, Liangzheng Sun, Mengfan He, Chao Chen, Jinhui Zhang, Chunyu Li, Ziyang Meng
Research Track A · General AI
Robust geo-localization in changing environmental conditions is critical for long-term aerial autonomy. While visual place recognition (VPR) models perform well when airborne views match the training domain, adapting them to shifting distributions during sequential missions triggers catastrophic forgetting. Existing co…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 26.0
2026-04-23 · Nicolae Filat, Ahmed Hussain, Konstantinos Kalogiannis, Elena Burceanu
Research Track A · General AI
Streaming Continual Learning (CL) typically converts a continuous stream into a sequence of discrete tasks through temporal partitioning. We argue that this temporal taskification step is not a neutral preprocessing choice, but a structural component of evaluation: different valid splits of the same stream can induce d…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 25.8
2026-05-07 · Hanxiang Chao, Yihan Bai, Rui Sheng, Tianle Li, Yushi Sun
Research Track A · General AI
Large Language Model (LLM) agents are increasingly expected to maintain coherent, long-term personalized memory, yet current benchmarks primarily measure static fact retrieval, overlooking the ability to revise stored beliefs when new evidence emerges. We identify a critical and underexplored failure mode, Implicit Con…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 25.5
2026-04-15 · Xiaohua Wang, Muzhao Tian, Yuqi Zeng, Zisu Huang, Jiakang Yuan, Bowen Chen, Jingwen Xu, Mingbo Zhou, Wenhao Liu, Muling Wu, Zhengkang Guo, Qi Qian, Yifei Wang, Feiran Zhang, Ruicheng Yin, Shihan Dou, Changze Lv, Tao Chen, Kaitao Song, Xu Tan, Tao Gui, Xiaoqing Zheng, Xuanjing Huang
General AI
Reinforcement Learning from Human Feedback (RLHF) and related alignment paradigms have become central to steering large language models (LLMs) and multimodal large language models (MLLMs) toward human-preferred behaviors. However, these approaches introduce a systemic vulnerability: reward hacking, where models exploit…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 25.5
2026-05-12 · Rishabh Tiwari, Kusha Sareen, Lakshya A Agrawal, Joseph E. Gonzalez, Matei Zaharia, Kurt Keutzer, Inderjit S Dhillon, Rishabh Agarwal, Devvrit Khatri
Research Track A · General AI
Large language models (LLMs) are trained for downstream tasks by updating their parameters (e.g., via RL). However, updating parameters forces them to absorb task-specific information, which can result in catastrophic forgetting and loss of plasticity. In contrast, in-context learning with fixed LLM parameters can chea…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 25.3
2026-04-17 · Dian Shao, Zhengzheng Xu, Peiyang Wang, Like Liu, Yule Wang, Jieqi Shi, Jing Huo
General AI
UAV vision-language navigation (VLN) requires an agent to navigate complex 3D environments from an egocentric perspective while following ambiguous multi-step instructions over long horizons. Existing zero-shot methods remain limited, as they often rely on large base models, generic prompts, and loosely coordinated mod…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 25.3
2026-04-21 · Shuai Wang, Hongyi Zhu, Jia-Hong Huang, Yixian Shen, Chengxi Zeng, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring
General AI
Understanding artworks requires multi-step reasoning over visual content and cultural, historical, and stylistic context. While recent multimodal large language models show promise in artwork explanation, they rely on implicit reasoning and internalized knowl- edge, limiting interpretability and explicit evidence groun…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 25.0
2026-04-27 · Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma
Research Track A · General AI
Continual learning for large language models is typically evaluated through accuracy retention under sequential fine-tuning. We argue that this perspective is incomplete, because uncertainty reliability can degrade earlier and more sharply than top-1 performance. We study this empirically by measuring conformal coverag…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 25.0
2026-05-12 · Xinrui Wang, Shao-Yuan Li, Bartłomiej Twardowski, Alexandra Gomez-Villa, Songcan Chen
Research Track A · General AI
Online Continual Learning (OCL) aims to learn from endless non\text{-}stationary data streams, yet most existing methods assume a flat label space and overlook the hierarchical organization of real\text{-}world concepts that evolves both horizontally (sibling classes) and vertically (coarse or fine categories). To bett…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 25.0
2026-05-12 · Patryk Krukowski, Jacek Tabor, Przemysław Spurek, Marek Śmieja, Łukasz Struski
Research Track A · General AI
Data-free continual learning (DFCIL) relies on model inversion to synthesize pseudo-samples and mitigate catastrophic forgetting. However, existing inversion methods are fundamentally limited by a simplifying assumption: they model feature distributions using diagonal covariance, effectively ignoring correlations that …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 24.8
2026-05-11 · Shijue Huang, Hangyu Guo, Chenxin Li, Junting Lu, Xinyu Geng, Zhaochen Su, Zhenyu Li, Shuang Chen, Hongru Wang, Yi R. Fung
General AI
Multimodal deep search requires an agent to solve open-world problems by chaining search, tool use, and visual reasoning over evolving textual and visual context. Two bottlenecks limit current systems. First, existing tool-use harnesses treat images returned by search, browsing, or transformation as transient outputs, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 24.5
2026-03-16 · Zhaohui Geoffrey Wang
Research Track A · General AI
A critical failure mode of current lifelong agents is not lack of knowledge, but the inability to decide how to reason. When an agent encounters "Is this coin fair?" it must recognize whether to invoke frequentist hypothesis testing or Bayesian posterior inference - frameworks that are epistemologically incompatible. M…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 24.5
2026-04-12 · Mikhail Menschikov, Dmitry Evseev, Victoria Dochkina, Ruslan Kostoev, Ilia Perepechkin, Petr Anokhin, Nikita Semenov, Evgeny Burnaev
General AI
Personalizing language models by effectively incorporating user interaction history remains a central challenge in the development of adaptive AI systems. While large language models (LLMs), combined with Retrieval-Augmented Generation (RAG), have improved factual accuracy, they often lack structured memory and fail to…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 24.5
2026-04-16 · Cuong Hoang, Le-Minh Nguyen
Research Track A · General AI
The proliferation of financial misinformation poses a severe threat to market stability and investor trust, misleading market behavior and creating critical information asymmetry. Detecting such misleading narratives is inherently challenging, particularly in real-world scenarios where external evidence or supplementar…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 24.3
2026-05-12 · Seokwon Jung, Alexander Rubinstein, Arnas Uselis, Sangdoo Yun, Seong Joon Oh
General AI
LLM-based agents increasingly operate in persistent environments where they must store, update, and reason over information across many sessions. While prior benchmarks evaluate only single-entity updates, MEME defines six tasks spanning the full space defined by the multi-entity and evolving axes, including three not …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 24.2
2026-04-30 · Jing Zhang, Wentao Jiang, Tao Huang, Zhiwei Wang, Jianxin Liu, Jian Chen, Ping Ye, Gang Wang, Zengmao Wang, Bo Du, Dacheng Tao
General AI
Ultrasound interpretation requires both precise lesion localization and holistic clinical reasoning, yet existing methods typically excel at only one of these capabilities: specialized detectors offer strong localization but limited reasoning, whereas multimodal large language models (MLLMs) provide flexible reasoning …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 24.0
2026-04-28 · Dominik Żurek, Kamil Faber, Marcin Pietron, Paweł Gajewski, Roberto Corizzo
Research Track A · General AI
Continual offline reinforcement learning (CORL) aims to learn a sequence of tasks from datasets collected over time while preserving performance on previously learned tasks. This setting corresponds to domains where new tasks arise over time, but adapting the model in live environment interactions is expensive, risky, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 23.8
2026-03-07 · Yunteng Tan, Zhi Gao, Xinxiao Wu
Research Track B · General AI
Large language model-based web agents have shown strong potential in automating web interactions through advanced reasoning and instruction following. While retrieval-based memory derived from historical trajectories enables these agents to handle complex, long-horizon tasks, current methods struggle to generalize acro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 23.8
2026-04-06 · Shu Wang, Edwin Yu, Oscar Love, Tom Zhang, Tom Wong, Steve Scargall, Charles Fan
General AI
Large Language Model (LLM) agents require persistent memory to maintain personalization, factual continuity, and long-horizon reasoning, yet standard context-window and retrieval-augmented generation (RAG) pipelines degrade over multi-session interactions. We present MemMachine, an open-source memory system that integr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 23.5
2026-04-14 · Jagadeesh Rachapudi, Ritali Vatsi, Praful Hambarde, Amit Shukla
Research Track A · General AI
Recent advances in deep learning underscore the need for systems that can not only acquire new knowledge through Continual Learning (CL) but also remove outdated, sensitive, or private information through Machine Unlearning (MU). However, while CL methods are well-developed, MU techniques remain in early stages, creati…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 23.5
2026-05-06 · Alaa Zniber, Ouassim Karrakchou, Mounir Ghogho
Research Track A · General AI
Early-exiting neural networks enable adaptive inference by allowing inputs to exit at intermediate classifiers, reducing computation for easy samples while maintaining high accuracy. In practice, exits can be trained sequentially by incrementally adding them to a shared backbone; however, this sequential training can c…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 23.3
2026-04-21 · Josue Torres-Fonseca, Naihao Deng, Yinpei Dai, Shane Storks, Yichi Zhang, Rada Mihalcea, Casey Kennington, Joyce Chai
General AI
Multimodal Large Language Models are increasingly adopted as autonomous agents in interactive environments, yet their ability to proactively address safety hazards remains insufficient. We introduce SafetyALFRED, built upon the embodied agent benchmark ALFRED, augmented with six categories of real-world kitchen hazards…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 23.3
2026-04-27 · Soyeon Kim, Cheongwoong Kang, Myeongjin Lee, Eun-Chul Chang, Jaedeok Lee, Jaesik Choi
General AI
The development of practical (multimodal) large language model assistants for Korean weather forecasters is hindered by the absence of a multidimensional, expert-level evaluation framework grounded in authoritative sources. To address this, we introduce K-MetBench, a diagnostic benchmark grounded in national qualificat…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 23.2
2026-04-29 · GLM-V Team, :, Wenyi Hong, Xiaotao Gu, Ziyang Pan, Zhen Yang, Yuting Wang, Yue Wang, Yuanchang Yue, Yu Wang, Yanling Wang, Yan Wang, Xijun Liu, Wenmeng Yu, Weihan Wang, Wei Li, Shuaiqi Duan, Sheng Yang, Ruiliang Lv, Mingdao Liu, Lihang Pan, Ke Ning, Junhui Ji, Jinjiang Wang, Jing Chen, Jiazheng Xu, Jiale Zhu, Jiale Cheng, Ji Qi, Guobing Gan, Guo Wang, Cong Yao, Zijun Dou, Zihao Zhou, Zihan Wang, Zhiqi Ge, Zhijie Li, Zhenyu Hou, Zhao Xue, Zehui Wang, Zehai He, Yusen Liu, Yukuo Cen, Yuchen Li, Yuan Wang, Yijian Lu, Yanzi Wang, Yadong Xue, Xinyu Zhang, Xinyu Liu, Wenkai Li, Tianyu Tong, Tianshu Zhang, Shengdong Yan, Qinkai Zheng, Mingde Xu, Licheng Bao, Jiaxing Xu, Jiaxin Fan, Jiawen Qian, Jiali Chen, Jiahui Lin, Haozhi Zheng, Haoran Wang, Haochen Li, Fan Yang, Dan Zhang, Chuangxin Zhao, Chengcheng Wu, Boyan Shi, Bowei Jia, Baoxu Wang, Peng Zhang, Debing Liu, Bin Xu, Juanzi Li, Minlie Huang, Yuxiao Dong, Jie Tang, V Team
General AI
We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environments, agentic capability depends not only on language reasoning, but also on the ability to perceive, interpret, and act over heterogeneous contexts such as images, video…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 23.0
2026-03-13 · Hongyang Chen, Zhongwu Sun, Hongfei Ye, Kunchi Li, Xuemin Lin
Research Track A · General AI
Continual learning (CL) has emerged as a pivotal paradigm to enable large language models (LLMs) to dynamically adapt to evolving knowledge and sequential tasks while mitigating catastrophic forgetting-a critical limitation of the static pre-training paradigm inherent to modern LLMs. This survey presents a comprehensiv…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 23.0
2026-03-31 · Michael Chertkov
Research Track A · General AI
An agent that operates sequentially must incorporate new experience without forgetting old experience, under a fixed memory budget. We propose a framework in which memory is not a parameter vector but a stochastic process: a Bridge Diffusion on a replay interval $[0,1]$, whose terminal marginal encodes the present and …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 23.0
2026-04-06 · Jingyang Qiao, Weicheng Meng, Yu Cheng, Zhihang Lin, Zhizhong Zhang, Xin Tan, Jingyu Gong, Kun Shao, Yuan Xie
General AI
Deep research agents (DRAs) integrate LLM reasoning with external tools. Memory systems enable DRAs to leverage historical experiences, which are essential for efficient reasoning and autonomous evolution. Existing methods rely on retrieving similar trajectories from memory to aid reasoning, while suffering from key li…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 23.0
2026-04-08 · Radu Negulescu
Research Track A · General AI
Catastrophic forgetting is not an engineering failure. It is a mathematical consequence of storing knowledge as global parameter superposition. Existing methods, such as regularization, replay, and frozen subnetworks, add external mechanisms to a shared-parameter substrate. None derives retention from the learning dyna…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.8
2026-03-20 · Taiyi Wang, Sian Gooding, Florian Hartmann, Oriana Riva, Edward Grefenstette
Research Track B · General AI
Large language model (LLM)-based agents have emerged as powerful autonomous controllers for digital environments, including mobile interfaces, operating systems, and web browsers. Web navigation, for example, requires handling dynamic content and long sequences of actions, making it particularly challenging. Existing L…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.8
2026-04-02 · Srivaths Ranganathan, Abhishek Dharmaratnakar, Anushree Sinha, Debanshu Das
General AI
Video recommender systems are among the most popular and impactful applications of AI, shaping content consumption and influencing culture for billions of users. Traditional single-model recommenders, which optimize static engagement metrics, are increasingly limited in addressing the dynamic requirements of modern pla…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.8
2026-04-09 · Wenbo Hu, Xin Chen, Yan Gao-Tian, Yihe Deng, Nanyun Peng, Kai-Wei Chang
General AI
Group Relative Policy Optimization (GRPO) has emerged as the de facto Reinforcement Learning (RL) objective driving recent advancements in Multimodal Large Language Models. However, extending this success to open-source multimodal generalist models remains heavily constrained by two primary challenges: the extreme vari…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.8
2026-05-07 · Bodong Du, Bowen Liu, Yang Yu, Xinpeng Ding, Zhiheng Wu, Shuning Wang, Shuo Nie, Naiming Liu, Qifeng Chen, Yangqiu Song, Xiaomeng Li
General AI
Medical multimodal large language models (MLLMs) have advanced image understanding and short-video analysis, but real clinical review often requires full-procedure video understanding. Unlike general long videos, medical procedures contain highly redundant anatomical views, while decisive evidence is temporally sparse,…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 22.5
2026-03-20 · Chang Nie, Chaoyou Fu, Yifan Zhang, Haihua Yang, Caifeng Shan
General AI
Multimodal Large Language Models (MLLMs) serve as daily assistants for millions. However, their ability to generate responses aligned with individual preferences remains limited. Prior approaches enable only static, single-turn personalization through input augmentation or output alignment, and thus fail to capture use…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 22.5
2026-04-20 · Xinping Lei, Xinyu Che, Junqi Xiong, Chenchen Zhang, Yukai Huang, Chenyu Zhou, Haoyang Huang, Minghao Liu, Letian Zhu, Hongyi Ye, Jinhua Hao, Ken Deng, Zizheng Zhan, Han Li, Dailin Li, Yifan Yao, Ming Sun, Zhaoxiang Zhang, Jiaheng Liu
General AI
Large language models are rapidly evolving into interactive coding agents capable of end-to-end web coding, yet existing benchmarks evaluate only narrow slices of this capability, typically text-conditioned generation with static-correctness metrics, leaving visual fidelity, interaction quality, and codebase-level reas…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.5
2026-05-08 · Donguk Kwon, Dongha Lee
Research Track B · General AI
Web agents perceive web pages through an observation space, yet its granularity has remained an underexamined design choice. Existing work treats observation at the same element-level granularity as the action space, leaving the page's functional organization implicit and forcing the agent to infer it from element-leve…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 22.4
2026-05-05 · You Qin, Kai Liu, Shengqiong Wu, Kai Wang, Shijian Deng, Yapeng Tian, Junbin Xiao, Yazhou Xing, Yinghao Ma, Bobo Li, Roger Zimmermann, Lei Cui, Furu Wei, Jiebo Luo, Hao Fei
General AI
Audio-Visual Intelligence (AVI) has emerged as a central frontier in artificial intelligence, bridging auditory and visual modalities to enable machines that can perceive, generate, and interact in the multimodal real world. In the era of large foundation models, joint modeling of audio and vision has become increasing…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.3
2026-04-22 · Yubo Jiang, Yitong An, Xin Yang, Abudukelimu Wuerkaixi, Xuxin Cheng, Fengying Xie, Zhiguo Jiang, Cao Liu, Ke Zeng, Haopeng Zhang
General AI
We introduce V-tableR1, a process-supervised reinforcement learning framework that elicits rigorous, verifiable reasoning from multimodal large language models (MLLMs). Current MLLMs trained solely on final outcomes often treat visual reasoning as a black box, relying on superficial pattern matching rather than perform…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.3
2026-05-12 · Runhui Huang, Jie Wu, Rui Yang, Zhe Liu, Hengshuang Zhao
General AI
In this paper, we propose AlphaGRPO, a novel framework that applies Group Relative Policy Optimization (GRPO) to AR-Diffusion Unified Multimodal Models (UMMs) to enhance multimodal generation capabilities without an additional cold-start stage. Our approach unlocks the model's intrinsic potential to perform advanced re…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.2
2026-04-29 · Mingji Ge, Qirui Chen, Zeqian Li, Weidi Xie
General AI
Long-term video understanding requires interpreting complex temporal events and reasoning over procedural activities. While instructional video corpora, like HowTo100M, offer rich resources for model training, they present significant challenges, including noisy ASR transcripts and inconsistent temporal alignments betw…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.0
2026-04-06 · Satyam Goyal, Anirudh Kanchi, Garv Shah, Prakhar Gupta
Research Track A · General AI
Large Language Models (LLMs) are typically static after training, yet real-world applications require continual adaptation to new knowledge without degrading existing capabilities. Standard approaches to updating models, like full finetuning or parameter-efficient methods (e.g., LoRA), face a fundamental trade-off: cat…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.0
2026-04-07 · Guruprasad Viswanathan Ramesh, Asmit Nayak, Basieem Siddique, Kassem Fawaz
Research Track B · General AI
Web agents automate browser tasks, ranging from simple form completion to complex workflows like ordering groceries. While current benchmarks evaluate general-purpose performance~(e.g., WebArena) or safety against malicious actions~(e.g., SafeArena), no existing framework assesses an agent's ability to successfully exe…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.0
2026-04-22 · Noah Flynn
Research Track A · General AI
Large language models (LLMs) often exhibit performance disparities across languages, with naive multilingual fine-tuning frequently degrading performance due to negative cross-lingual interference. To address this, we introduce COMPASS (COntinual Multilingual PEFT with Adaptive Semantic Sampling), a novel data-centric …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.0
2026-05-07 · Md Anwar Hossen, Fatema Siddika, Juan Pablo Munoz, Tanya Roosta, Ali Jannesari
Research Track A · General AI
Large language models (LLMs) can acquire new capabilities through fine-tuning, but continual adaptation often leads to catastrophic forgetting. We propose CRAFT, a continual learning framework that avoids updating model weights by instead learning low-rank interventions on hidden representations. CRAFT proceeds in thre…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.9
2026-04-29 · Fazle Elahi Faisal, Qianhui Wu, Baolin Peng, Jianfeng Gao
Research Track B · General AI
Recent advances in multimodal large language models (LLMs) have revolutionized web agents that can automate complex tasks on websites. However, their accuracy remains limited by the scarcity of high-quality web trajectory training data. Existing automatic trajectory generation methods suffer from incomplete website cov…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.8
2026-03-23 · Shoubin Yu, Lei Shu, Antoine Yang, Yao Fu, Srinivas Sunkara, Maria Wang, Jindong Chen, Mohit Bansal, Boqing Gong
Research Track B · General AI
Multimodal AI agents are increasingly automating complex real-world workflows that involve online web execution. However, current web-agent benchmarks suffer from a critical limitation: they focus entirely on web-based interaction and perception, lacking grounding in the user's real-world physical surroundings. This li…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.8
2026-04-09 · Shilin Yan, Jintao Tong, Hongwei Xue, Xiaojun Tang, Yangyang Wang, Kunyu Shi, Guannan Zhang, Ruixuan Li, Yixiong Zou
General AI
The advent of agentic multimodal models has empowered systems to actively interact with external environments. However, current agents suffer from a profound meta-cognitive deficit: they struggle to arbitrate between leveraging internal knowledge and querying external utilities. Consequently, they frequently fall prey …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 21.5
2026-04-13 · Junfu Pu, Yuxin Chen, Teng Wang, Ying Shan
General AI
Current multimodal large language models (MLLMs) have demonstrated remarkable capabilities in short-form video understanding, yet translating long-form cinematic videos into detailed, temporally grounded scripts remains a significant challenge. This paper introduces the novel video-to-script (V2S) task, aiming to gener…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 21.5
2026-04-18 · Chuanyu Qin, Chenxu Yang, Qingyi Si, Naibin Gu, Dingyu Yao, Zheng Lin, Peng Fu, Nan Duan, Jiaqi Wang
General AI
Reinforcement learning from verifiable rewards (RLVR) has demonstrated remarkable effectiveness in improving the reasoning capabilities of large language models. As models evolve into natively multimodal architectures, extending RLVR to video understanding becomes increasingly important yet remains largely unexplored, …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 21.5
2026-04-22 · Xiyang Wu, Zongxia Li, Guangyao Shi, Alexander Duffy, Tyler Marques, Matthew Lyle Olson, Tianyi Zhou, Dinesh Manocha
General AI
Long horizon interactive environments are a testbed for evaluating agents skill usage abilities. These environments demand multi step reasoning, the chaining of multiple skills over many timesteps, and robust decision making under delayed rewards and partial observability. Games are a good testbed for evaluating agent …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.5
2026-04-27 · Kevin McKee, Thomas Hazy, Yicong Zheng, Zacharie Bugaud, Thomas Miconi
Research Track A · General AI
Block-sequential continual learning demands that a single model both protect prior solutions from catastrophic forgetting and efficiently infer at inference time which prior solution matches the current input without task labels. We present Functional Task Networks (FTN), a parameter-isolation method inspired by struct…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.5
2026-05-06 · Yazheng Liu, Yuxuan Wan, Rui Xu, Xi Zhang, Sihong Xie, Hui Xiong
Research Track A · General AI
Large language models (LLMs) often suffer from catastrophic forgetting in continual learning: after learning new tasks sequentially, they perform worse on earlier tasks. Existing methods mitigate catastrophic forgetting by data replay, parameter freezing, or regularization. However, these methods lack semantic awarenes…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.4
2026-05-03 · Matteo Gambella, Fabrizio Pittorino, Manuel Roveri
Research Track A · General AI
Neural Architecture Search (NAS) has emerged as a powerful framework for automatically discovering neural architectures that balance accuracy and efficiency. However, as AI transitions from static benchmarks to real-world deployment, the traditional focus on hardware-aware efficiency is no longer sufficient. We observe…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.3
2026-04-13 · Keyang Zhong, Junlin Xie, Hefeng Wu, Haofeng Li, Guanbin Li
General AI
Vision-language models (VLMs) have shown impressive capabilities in perceptual tasks, yet they degrade in complex multi-hop reasoning under multiplayer game settings with imperfect and deceptive information. In this paper, we study a representative multiplayer task, Murder Mystery Games, which require inferring hidden …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.3
2026-04-14 · Zhaofen Wu, Hanrong Zhang, Fulin Lin, Wujiang Xu, Xinran Xu, Yankai Chen, Henry Peng Zou, Shaowen Chen, Weizhi Zhang, Xue Liu, Philip S. Yu, Hongwei Wang
Research Track A · General AI
To sustain coherent long-term interactions, Large Language Model (LLM) agents must navigate the tension between acquiring new information and retaining prior knowledge. Current unified stream-based memory systems facilitate context updates but remain vulnerable to interference from transient noise. Conversely, discrete…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.3
2026-04-20 · Xingchen Xiao, Heyan Huang, Runheng Liu, Jincheng Xie
General AI
Large language models (LLMs) are widely used in retrieval-augmented generation (RAG) to incorporate external knowledge at inference time. However, when retrieved contexts are noisy, incomplete, or heterogeneous, a single generation process often struggles to reconcile evidence effectively. We propose \textbf{MASS-RAG},…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.3
2026-04-24 · Chih-Ting Liao, Xi Xiao, Chunlei Meng, Zhangquan Chen, Yitong Qiao, Weilin Zhou, Tianyang Wang, Xu Zheng, Xin Cao
General AI
Multimodal large language models (MLLMs) have advanced static visual--spatial reasoning, yet they often fail to preserve long-horizon spatial coherence in embodied settings where beliefs must be continuously revised from egocentric observations under environmental change. We introduce SpaMEM (Spatial Memory from Action…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.3
2026-05-12 · Yuangong Chen, Wai Keung Wong, Jiaxing Li, Ioannis Patras, Xu Zheng
General AI
Multimodal Large Language Models (MLLMs) show strong visual perception, yet remain limited in reasoning about space under changing viewpoints. We study this challenge as Perspective-Conditioned Spatial Reasoning (PCSR) in 360-degree omnidirectional images, where broad scene coverage reduces ambiguity from partial obser…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.3
2026-05-12 · Alireza Nadali, Patrick Cooper, Ashutosh Trivedi, Alvaro Velasquez
General AI
We introduce KV-Fold, a simple, training-free long-context inference protocol that treats the key-value (KV) cache as the accumulator in a left fold over sequence chunks. At each step, the model processes the next chunk conditioned on the accumulated cache, appends the newly produced keys and values, and passes the enl…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.2
2026-04-29 · Happy Bhati
General AI
The arrival of large language models (LLMs) capable of multi-step reasoning, tool use, and long-horizon planning has produced a qualitative shift in software engineering. Where earlier code-completion tools such as GitHub Copilot operated at the granularity of a line or function, modern agentic systems -- Claude Code, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.2
2026-05-01 · Derong Xu, Shuochen Liu, Pengfei Luo, Pengyue Jia, Yingyi Zhang, Yi Wen, Yimin Deng, Wenlin Zhang, Enhong Chen, Xiangyu Zhao, Tong Xu
General AI
Large language model (LLM) agents require long-term user memory for consistent personalization, but limited context windows hinder tracking evolving preferences over long interactions. Existing memory systems mainly rely on static, hand-crafted update rules; although reinforcement learning (RL)-based agents learn memor…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.2
2026-05-01 · Jinpai Zhao, Albert Cerrone, Joannes Westerink, Clint Dawson
General AI
Single-agent systems (SAS) have become the default pattern for LLM-driven scientific workflows, but routing planning, tool use, and synthesis through a single context window comes with a well-known cost: as tool specifications and observational traces accumulate, the effective context available for each decision shrink…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 21.0
2026-03-26 · Jie Zhu, Yimin Tian, Boyang Li, Kehao Wu, Zhongzhi Liang, Junhui Li, Xianyin Zhang, Lifan Guo, Feng Chen, Yong Liu, Chi Zhang
General AI
This paper introduces FinMCP-Bench, a novel benchmark for evaluating large language models (LLMs) in solving real-world financial problems through tool invocation of financial model context protocols. FinMCP-Bench contains 613 samples spanning 10 main scenarios and 33 sub-scenarios, featuring both real and synthetic us…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 21.0
2026-03-30 · Yuqi Zhou, Sunhao Dai, Changle Qu, Liang Pang, Jun Xu, Ji-Rong Wen
General AI
Information retrieval (IR) systems have traditionally been designed and trained for human users, with learning-to-rank methods relying heavily on large-scale human interaction logs such as clicks and dwell time. With the rapid emergence of large language model (LLM) powered search agents, however, retrieval is increasi…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 21.0
2026-03-30 · Bin Zhu, Qianghuai Jia, Tian Lan, Junyang Ren, Feng Gu, Feihu Jiang, Longyue Wang, Zhao Xu, Weihua Luo
General AI
Deep research agents autonomously conduct open-ended investigations, integrating complex information retrieval with multi-step reasoning across diverse sources to solve real-world problems. To sustain this capability on long-horizon tasks, reliable verification is critical during both training and inference. A major bo…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 21.0
2026-04-09 · Chuzhan Hao, Wenfeng Feng, Guochao Jiang, Guofeng Quan, Guohua Liu, Yuewei Zhang
General AI
Reinforcement learning (RL) has become an effective approach for advancing the reasoning capabilities of large language models (LLMs) through the strategic integration of external search engines. However, current RL-based search agents often rely on a process of stochastic exploration guided by carefully crafted outcom…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.0
2026-04-14 · Zhaoyang Wang, Qianhui Wu, Xuchao Zhang, Chaoyun Zhang, Wenlin Yao, Fazle Elahi Faisal, Baolin Peng, Si Qin, Suman Nath, Qingwei Lin, Chetan Bansal, Dongmei Zhang, Saravan Rajmohan, Jianfeng Gao, Huaxiu Yao
Research Track B · General AI
Autonomous web agents powered by large language models (LLMs) have shown promise in completing complex browser tasks, yet they still struggle with long-horizon workflows. A key bottleneck is the grounding gap in existing skill formulations: textual workflow skills provide natural language guidance but cannot be directl…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.0
2026-04-19 · Liangzu Peng, Uday Kiran Reddy Tadipatri, Ziqing Xu, Eric Eaton, René Vidal
Research Track A · General AI
Continual learning (CL) is concerned with learning multiple tasks sequentially without forgetting previously learned tasks. Despite substantial empirical advances over recent years, the theoretical development of CL remains in its infancy. At the heart of developing CL theory lies the challenge that the data distributi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.0
2026-04-20 · Lingfeng Zhang, yongan sun, Jinpeng Hu, Hui Ma, yang ying, Kuien Liu, Zenglin Shi, Meng Wang, Yongan Sun, Yang Ying
Research Track B · General AI
Recent advancements in large language models (LLMs) have empowered autonomous web agents to execute natural language instructions directly on real-world webpages. However, existing agents often struggle with complex tasks involving dynamic interactions and long-horizon execution due to rigid planning strategies and hal…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.0
2026-04-22 · Yingjie Gu, Bo Xiong, Yijuan Guo, Chao Li, Xiaojing Zhang, Liqiang Wang, Pengcheng Ren, Qi Sun, Jingyao Ma, Shidang Shi
Research Track A · General AI
For LLM agents, memory management critically impacts efficiency, quality, and security. While much research focuses on retention, selective forgetting--inspired by human cognitive processes (hippocampal indexing/consolidation theory and Ebbinghaus forgetting curve)--remains underexplored. We argue that in resource-cons…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.8
2026-04-01 · Haiyang Guo, Yichen Shi, Fei Zhu, Wenzhuo Liu, Hongbo Zhao, Fanhu Zeng, Shijie Ma, Da-Han Wang, Xu-Yao Zhang
Research Track A · General AI
Video Large Language Models (Video-LLMs) require continual learning to adapt to non-stationary real-world data. However, existing benchmarks fall short of evaluating modern foundation models: many still rely on models without large-scale pre-training, and prevailing benchmarks typically partition a single dataset into …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.8
2026-04-07 · Md Shamimul Islam, Luis G. Jaimes, Ayesha S. Dina
Research Track A · General AI
Network Intrusion Detection Systems (NIDS) face important limitations. Signature-based methods are effective for known attack patterns, but they struggle to detect zero-day attacks and often miss modified variants of previously known attacks, while many machine learning approaches offer limited interpretability. These …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.8
2026-05-07 · Tianle Wang, Zhaoyang Wang, Guangchen Lan, Xinpeng Wei, Sipeng Zhang, Guanwen Qiu, Abulhair Saparov
General AI
Reinforcement learning (RL) has been applied to improve large language model (LLM) reasoning, yet the systematic study of how training scales with task difficulty has been hampered by the lack of controlled, scalable environments. We introduce ScaleLogic, a synthetic logical reasoning framework that offers independent …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.8
2026-05-11 · Debashis Guha
Research Track A · General AI
Every adaptive learning system must alternate between two operations: consolidating what it already knows and expanding into new evidence. We propose \emph{Consolidation-Expansion Operator Mechanics} (OpMech), a framework that makes this structure precise. The central object is the \emph{order-gap} $\Ogap(θ; e)$, the d…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.5
2026-03-13 · Orit Shahnovsky, Rotem Dror
Research Track B · General AI
Developing autonomous agents for web-based tasks is a core challenge in AI. While Large Language Model (LLM) agents can interpret complex user requests, they often operate as black boxes, making it difficult to diagnose why they fail or how they plan. This paper addresses this gap by formally treating web tasks as sequ…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 20.5
2026-04-13 · CocoaBench Team, Shibo Hao, Zhining Zhang, Zhiqi Liang, Tianyang Liu, Yuheng Zha, Qiyue Gao, Jixuan Chen, Zilong Wang, Zhoujun Cheng, Haoxiang Zhang, Junli Wang, Hexi Jin, Boyuan Zheng, Kun Zhou, Yu Wang, Feng Yao, Licheng Liu, Yijiang Li, Zhifei Li, Zhengtao Han, Pracha Promthaw, Tommaso Cerruti, Xiaohan Fu, Ziqiao Ma, Jingbo Shang, Lianhui Qin, Julian McAuley, Eric P. Xing, Zhengzhong Liu, Rupesh Kumar Srivastava, Zhiting Hu
General AI
LLM agents now perform strongly in software engineering, deep research, GUI automation, and various other applications, while recent agent scaffolds and models are increasingly integrating these capabilities into unified systems. Yet, most evaluations still test these capabilities in isolation, which leaves a gap for m…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 20.5
2026-04-16 · Qianqian Xie, Qingheng Xiong, He Zhu, Tiantian Xia, Xueming Han, Fanyu Meng, Jiakai Wang, Zhiqi Bai, Chengkang Jiang, Zhaohui Wang, Yubin Guo, Yuqing Wen, Jiayang Mao, Zijie Zhang, Shihao Li, Yanghai Wang, Yuxiang Ren, Junlan Feng, Jiaheng Liu
General AI
Deep Research Agents (DRAs) aim to solve complex, long-horizon research tasks involving planning, retrieval, multimodal understanding, and report generation, yet their evaluation remains challenging due to dynamic web environments and ambiguous task definitions. We propose DR^{3}-Eval, a realistic and reproducible benc…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 20.5
2026-04-16 · Jun Wang, Shuo Tan, Zelong Sun, Tiancheng Gu, Yongle Zhao, Ziyong Feng, Kaicheng Yang, Cewu Lu
General AI
Retrieval-Augmented Generation (RAG) extends Large Vision-Language Models (LVLMs) with external visual knowledge. However, existing visual RAG systems typically rely on generic retrieval signals that overlook the fine-grained visual semantics essential for complex reasoning. To address this limitation, we propose UniDo…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.5
2026-04-24 · Yuelin Hu, Zhenbo Yu, Zhengxue Cheng, Wei Liu, Li Song
Research Track A · General AI
Many continual-learning methods modify gradients upstream (e.g., projection, penalty rescaling, replay mixing) while treating Adam as a neutral backend. We show this composition has a hidden failure mode. In a high-overlap, non-adaptive 8-domain continual LM, all shared-routing projection baselines collapse close to va…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.5
2026-05-06 · Andreas Pattichis, Constantine Dovrolis
Research Track A · General AI
LLMs are trained once, then deployed into a world that never stops changing. External memory compensates for this, but most systems manage it explicitly rather than letting it adapt on its own. Biological memory works differently: coupled multi-timescale dynamics make new associations immediately usable, strengthen wha…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.3
2026-04-14 · Benjamin Stern, Peter Nadel
General AI
LLM agents with persistent memory store information as flat factual records, providing little context for temporal reasoning, change tracking, or cross-session aggregation. Inspired by the drawing effect [3], we introduce dual-trace memory encoding. In this method, each stored fact is paired with a concrete scene trace…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.3
2026-04-21 · Md Nayem Uddin, Kumar Shubham, Eduardo Blanco, Chitta Baral, Gengyu Wang
Research Track A · General AI
Personalized agents that interact with users over long periods must maintain persistent memory across sessions and update it as circumstances change. However, existing benchmarks predominantly frame long-term memory evaluation as fact retrieval from past conversations, providing limited insight into agents' ability to …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.2
2026-04-30 · Eyon Jang, Damon Falck, Joschka Braun, Nathalie Kirch, Achu Menon, Perusha Moodley, Scott Emmons, Roland S. Zimmermann, David Lindner
General AI
Reinforcement learning (RL) has become essential to the post-training of large language models (LLMs) for reasoning, agentic capabilities and alignment. Successful RL relies on sufficient exploration of diverse actions by the model during training, which creates a potential failure mode: a model could strategically alt…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 20.0
2026-04-09 · Chonghan Qin, Xiachong Feng, Weitao Ma, Xiaocheng Feng, Lingpeng Kong
General AI
Existing memory benchmarks for LLM agents evaluate explicit recall of facts, yet overlook implicit memory where experience becomes automated behavior without conscious retrieval. This gap is critical: effective assistants must automatically apply learned procedures or avoid failed actions without explicit reminders. We…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.0
2026-04-20 · Pourya Shamsolmoali, Masoumeh Zareapoor, Eric Granger, William A. P. Smith, Yue Lu
Research Track A · General AI
In continual learning, the primary challenge is to learn new information without forgetting old knowledge. A common solution addresses this trade-off through regularization, penalizing changes to parameters critical for previous tasks. In most cases, this regularization term is directly added to the training loss and o…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.0
2026-05-07 · Mei Wu, Wenchao Weng, Wenxin Su, Wenjie Tang, Wei Zhou
Research Track A · General AI
In recent years, the integration of non-topological space modeling with temporal learning methods has emerged as an effective approach for capturing spatio-temporal information in non-Euclidean graphs. However, most existing methods rely on static underlying graph structures, which are inadequate for capturing the cont…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.0
2026-05-12 · Rodney A Sanchez, Ferat Sahin, Alex Ororbia, Jamison Heard
Research Track A · General AI
Advancements in reinforcement learning have produced a variety of complex and useful intrinsic driving forces; crucially, these drivers operate under a direct conditioning paradigm. This form of conditioning limits our agents' capacity by restricting how they learn from the environment as well as from others. Off-polic…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.9
2026-05-04 · Masafumi Oyamada, Kunihiro Takeoka, Kosuke Akimoto, Ryoma Obara, Masafumi Enomoto, Haochen Zhang, Daichi Haraguchi, Takuya Tamura
Research Track B · General AI
What if a browser agent could learn your work simply by watching you do it? We present cotomi Act, a browser-based computer-using agent that combines reliable multi-step task execution with persistent organizational knowledge learned from user behavior. For execution, an agent scaffold with adaptive lazy observation, v…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.8
2026-04-06 · Shiek Ruksana, Sailesh Kiran Kurra, Thipparthi Sanjay Baradwaj
General AI
Large Language Models (LLMs) have shown strong performance across a wide range of natural language processing tasks; however, their effectiveness is highly dependent on prompt design, structure, and embedded reasoning signals. Conventional prompt engineering methods largely rely on heuristic trial-and-error processes, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.8
2026-04-09 · Ziwei Zhou, Zeyuan Lai, Rui Wang, Yifan Yang, Zhen Xing, Yuqing Yang, Qi Dai, Lili Qiu, Chong Luo
General AI
Text-to-Audio-Video (T2AV) generation is rapidly becoming a core interface for media creation, yet its evaluation remains fragmented. Existing benchmarks largely assess audio and video in isolation or rely on coarse embedding similarity, failing to capture the fine-grained joint correctness required by realistic prompt…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.8
2026-04-09 · Boer Zhang, Mingyan Wu, Dongzhuoran Zhou, Yuqicheng Zhu, Wendong Fan, Puzhen Zhang, Zifeng Ding, Guohao Li, Yuan He
Research Track B · General AI
Deep research requires reasoning over web evidence to answer open-ended questions, and it is a core capability for AI agents. Yet many deep research agents still rely on implicit, unstructured search behavior that causes redundant exploration and brittle evidence aggregation. Motivated by Anthropic's "think" tool parad…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.5
2026-03-30 · Tiantian Wang, Xiang Xiang, Simon S. Du
Research Track A · General AI
In federated healthcare systems, Federated Class-Incremental Learning (FCIL) has emerged as a key paradigm, enabling continuous adaptive model learning among distributed clients while safeguarding data privacy. However, in practical applications, data across agent nodes within the distributed framework often exhibits n…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.5
2026-04-06 · Yuwen Zhai, Runze Li, Liang Wang, Nian Shi, Liwu Xu, Wei Zhang, Ran Lin, Bo Xu, Benlei Cui
Research Track B · General AI
Evaluating GUI agents presents a distinct challenge: trajectories are long, visually grounded, and open-ended, yet evaluation must be both accurate and interpretable. Existing approaches typically apply a single holistic judgment over the entire action-observation sequence-a strategy that proves unreliable on long-hori…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.5
2026-05-12 · Minjong Cheon
Research Track A · General AI
Catastrophic forgetting remains the central obstacle in continual learning (CL): parameters shared across tasks interfere with one another, and existing regularization methods such as EWC and SI apply uniform penalties without awareness of which input region a parameter serves. We propose KAN-CL, a continual learning f…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.5
2026-05-12 · Neha Verma, Nikhil Mehta, Shao-Chuan Wang, Naijing Zhang, Alicia Tsai, Li Wei, Lukasz Heldt, Lichan Hong, Ed Chi, Xinyang Yi
Research Track A · General AI
Despite the rapid advancements in large language model (LLM) development, fine-tuning them for specific tasks often results in the catastrophic forgetting of their general, language-based reasoning abilities. This work investigates and addresses this challenge in the context of the Generative Retrieval (GenRetrieval) t…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.4
2026-04-29 · Qisheng Hu, Quanyu Long, Wenya Wang
Research Track A · General AI
Memory-augmented LLM agents offer an appealing shortcut to continual learning: rather than updating model parameters, they accumulate experience in external memory, seemingly sidestepping the stability-plasticity dilemma of parametric learning. We show that this challenge does not disappear but resurfaces at the memory…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 19.4
2026-04-30 · Zihao Li, Jiaru Zou, Feihao Fang, Xuying Ning, Mengting Ai, Tianxin Wei, Sirui Chen, Xiyuan Yang, Jingrui He
General AI
Agentic large language model systems have demonstrated strong capabilities. However, their reliance on language as the universal interface fundamentally limits their applicability to many real-world problems, especially in scientific domains where domain-specific foundation models have been developed to address special…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.3
2026-04-13 · Fei Tang, Zhiqiong Lu, Boxuan Zhang, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen
Research Track B · General AI
GUI agents drive applications through their visual interfaces instead of programmatic APIs, interacting with arbitrary software via taps, swipes, and keystrokes, reaching a long tail of applications that CLI-based agents cannot. Yet progress in this area is bottlenecked less by modeling capacity than by the absence of …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.3
2026-04-13 · Xiaozhe Li, Tianyi Lyu, Yizhao Yang, Liang Shan, Siyi Yang, Ligao Zhang, Zhuoyi Huang, Qingwen Liu, Yang Li
Research Track B · General AI
Large Language Models (LLMs) struggle with long-horizon tasks due to the "context bottleneck" and the "lost-in-the-middle" phenomenon, where accumulated noise from verbose environments degrades reasoning over multi-turn interactions. To address this issue, we introduce a symbiotic framework that decouples context manag…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.3
2026-04-13 · Haojie Bai, Aimin Li, Ruoyu Yao, Xiongwei Zhao, Tingting Zhang, Xing Zhang, Lin Gao, and Jun Ma
General AI
Closed-loop cooperative driving requires planners that generate realistic multimodal multi-agent trajectories while improving safety and traffic efficiency. Existing diffusion planners can model multimodal behaviors from demonstrations, but they often exhibit weak scene consistency and remain poorly aligned with closed…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.3
2026-04-16 · Ke Xu, Yuhao Wang, Yu Wang
General AI
Recent advancements in LLM agents are gradually shifting from reactive, text-based paradigms toward proactive, multimodal interaction. However, existing benchmarks primarily focus on reactive responses, overlooking the complexities of proactive intervention and monitoring. To bridge this gap, we introduce ProVoice-Benc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.3
2026-04-16 · Yan Li, Zezi Zeng, Yifan Yang, Yuqing Yang, Ning Liao, Weiwei Guo, Lili Qiu, Mingxi Cheng, Qi Dai, Zhendong Wang, Zhengyuan Yang, Xue Yang, Ji Li, Lijuan Wang, Chong Luo
Research Track B · General AI
The rapid progress of Artificial Intelligence Generated Content (AIGC) tools enables images, videos, and visualizations to be created on demand for webpage design, offering a flexible and increasingly adopted paradigm for modern UI/UX. However, directly integrating such tools into automated webpage generation often lea…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.3
2026-04-22 · Jiahao Xie, Alessio Tonioni, Nathalie Rauschmayr, Federico Tombari, Bernt Schiele
General AI
Reinforcement learning (RL) with verifiable rewards (RLVR) has demonstrated the great potential of enhancing the reasoning abilities in multimodal large language models (MLLMs). However, the reliance on language-centric priors and expensive manual annotations prevents MLLMs' intrinsic visual understanding and scalable …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.3
2026-05-12 · Chen Li, Xiaoling Hu, Songzhu Zheng, Jiawei Zhou, Chao Chen
General AI
Large language models (LLMs) often produce answers with high certainty even when they are incorrect, making reliable confidence estimation essential for deployment in real-world scenarios. Verbalized confidence, where models explicitly state their confidence in natural language, provides a flexible and user-facing unce…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.2
2026-04-29 · Zhixin Han, Yanzhi Zhang, Chuyang Wei, Maohang Gao, Xiawei Yue, Kefei Chen, Yu Zhuang, Haoxiang Guan, Jiyan He, Jian Li, Yitong Duan, Yu Shi, Mengting Hu, Shuxin Zheng
General AI
Live future prediction refers to the task of making predictions about real-world events before they unfold. This task is increasingly studied using large language model-based agent systems, and it is important for building agents that can continually learn from real-world. Just as interactive environments have often dr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.2
2026-04-30 · Bo Zhang, Tzu-Yen Ma, Zichen Tang, Junpeng Ding, Zirui Wang, Yizhuo Zhao, Peilin Gao, Zijie Xi, Zixin Ding, Haiyang Sun, Haocheng Gao, Yuan Liu, Liangjia Wang, Yiling Huang, Yujie Wang, Yuyue Zhang, Ronghui Xi, Yuanze Li, Jiacheng Liu, Zhongjun Yang, Haihong E
General AI
We introduce AEGIS, A holistic benchmark for Evaluating forensic analysis of AI-Generated academic ImageS. Compared to existing benchmarks, AEGIS features three key advances: (1) Domain-Specific Complexity: covering seven academic categories with 39 fine-grained subtypes, exposing intrinsic forensic difficulty, where e…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.2
2026-04-30 · Jialu Shen, Han Lyu, Suyang Zhong, Hanzheng Li, Haoyi Tao, Nan Wang, Changhong Chen, Xi Fang
General AI
Spectra are a prevalent yet highly information-dense form of scientific imagery, presenting substantial challenges to multimodal large language models (MLLMs) due to their unstructured and domain-specific characteristics. Here we introduce SpecVQA, a professional scientific-image benchmark for evaluating multimodal mod…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.2
2026-05-01 · Siyuan Huang, Xiaoye Qu, Yafu Li, Tong Zhu, Zefeng He, Muxin Fu, Daizong Liu, Wei-Long Zheng, Yu Cheng
General AI
While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Dilution" phenomenon, where the accumulation of textual history expands the attention partition function, causing visual attention to decay inversely with generated sequence lengt…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.2
2026-05-02 · Zebin Guo, Weidong Geng, Ruichen Mao
General AI
Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by grounding responses in external knowledge during inference. However, conventiona RAG systems under-perform on structured tabular data, largely due to coarse retrieval granularity and insufficient table semantic comprehension. To address these…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.0
2026-05-06 · Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera, Stjepan Picek, Saraga Sakthidharan
Research Track A · General AI
The open-source ecosystem has accelerated the democratization of Large Language Models (LLMs) through the public distribution of specialized Low-Rank Adaptation (LoRA) modules. However, integrating these third-party adapters often induces catastrophic forgetting of the base model's foundational safety alignment. Restor…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.8
2026-03-15 · Mohamed Aghzal, Gregory J. Stein, Ziyu Yao
Research Track B · General AI
Large language model (LLM) web agents are increasingly used for web navigation but remain far from human reliability on realistic, long-horizon tasks. Existing evaluations focus primarily on end-to-end success, offering limited insight into where failures arise. We propose a hierarchical planning framework to analyze w…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.8
2026-03-26 · Ünsal Öztürk, Hatef Otroshi Shahreza, Sébastien Marcel
General AI
Multimodal Large Language Models (MLLMs) have recently been explored as face verification systems that determine whether two face images are of the same person. Unlike dedicated face recognition systems, MLLMs approach this task through visual prompting and rely on general visual and reasoning abilities. However, the d…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.8
2026-03-26 · Cristian Lupascu, Alexandru Lupascu
Research Track A · General AI
Large Language Model based agents increasingly operate in high stakes, multi turn settings where factual grounding is critical, yet their memory systems typically rely on flat key value stores or plain vector retrieval with no mechanism to track the provenance or trustworthiness of stored knowledge. We present Elephant…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.8
2026-03-30 · Ziqi Miao, Haonan Jia, Lijun Li, Chen Qian, Yuan Xiong, Wenting Yan, Jing Shao
General AI
Reinforcement learning with verifiable rewards (RLVR) has substantially enhanced the reasoning capabilities of multimodal large language models (MLLMs). However, existing RLVR approaches typically rely on outcome-driven optimization that updates both perception and reasoning using a shared reward based solely on the fi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.8
2026-03-31 · Md Saad, Sajjad Hussain, Mohd Suhaib
General AI
This paper introduces a new hybrid framework that combines Reinforcement Learning (RL) and Large Language Models (LLMs) to improve robotic manipulation tasks. By utilizing RL for accurate low-level control and LLMs for high level task planning and understanding of natural language, the proposed framework effectively co…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.8
2026-04-27 · Lawrence Keunho Jang, Jing Yu Koh, Daniel Fried, Ruslan Salakhutdinov
Research Track B · General AI
Existing web agent benchmarks have largely converged on short, single-site tasks that frontier models are approaching saturation on. However, real world web use consists of long-horizon, multi-site workflows. Common web navigation tasks, such as comparing products across different domains, planning trips across multipl…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.8
2026-05-07 · Hailey Onweller, Elias Lumer, Austin Huber, Pia Ramchandani, Vamse Kumar Subbiah, Corey Feld
General AI
Large language models (LLMs) power deep research agents that synthesize information from hundreds of web sources into cited reports, yet these citations cannot be reliably verified. Current approaches either trust models to self-cite accurately, risking bias, or employ retrieval-augmented generation (RAG) that does not…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 18.7
2026-04-23 · Qijun Han, Haoqin Tu, Zijun Wang, Haoyue Dai, Yiyang Zhou, Nancy Lau, Alvaro A. Cardenas, Yuhui Xu, Ran Xu, Caiming Xiong, Zeyu Zheng, Huaxiu Yao, Yuyin Zhou, Cihang Xie
Research Track B · General AI
Autonomous GUI agents face two fundamental challenges: early stopping, where agents prematurely declare success without verifiable evidence, and repetitive loops, where agents cycle through the same failing actions without recovery. We present VLAA-GUI, a modular GUI agentic framework built around three integrated comp…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.5
2026-01-12 · Jihong Wang, Jiamu Zhou, Weiming Zhang, Weiwen Liu, Zhuosheng Zhang, Xingyu Lou, Weinan Zhang, Huarong Deng, Jun Wang
Research Track B · General AI
With the advancement of vision-language models, web automation has made significant progress. However, deploying autonomous agents in real-world settings remains challenging, primarily due to site heterogeneity, where generalist models lack domain-specific priors for diverse interfaces, and long-horizon instability, ch…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.5
2026-03-09 · Jingbo Yang, Bairu Hou, Wei Wei, Yujia Bao, Shiyu Chang
Research Track B · General AI
Modern agents powered by thinking LLMs achieve high accuracy through long chain-of-thought reasoning but incur substantial inference costs. While many LLMs now support configurable reasoning levels (e.g., high/medium/low), static strategies are often ineffective: using low-effort modes at every step leads to significan…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.5
2026-04-03 · Wei Zou, Mingwen Dong, Miguel Romero Calvo, Shuaichen Chang, Jiang Guo, Dongkyu Lee, Xing Niu, Xiaofei Ma, Yanjun Qi, Jiarong Jiang
Research Track B · General AI
Memory makes LLM-based web agents personalized, powerful, yet exploitable. By storing past interactions to personalize future tasks, agents inadvertently create a persistent attack surface that spans websites and sessions. While existing security research on memory assumes attackers can directly inject into memory stor…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 18.5
2026-04-09 · Makanjuola Ogunleye, Eman Abdelrahman, Ismini Lourentzou
General AI
Large multimodal models are increasingly used as the reasoning core of embodied agents operating in 3D environments, yet they remain prone to hallucinations that can produce unsafe and ungrounded decisions. Existing inference-time hallucination mitigation methods largely target 2D vision-language settings and do not tr…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 18.5
2026-04-14 · Yunhui Jang, Lu Zhu, Jake Fawkes, Alisandra Kaye Denton, Dominique Beaini, Emmanuel Noutahi
General AI
Large language models (LLMs) have recently gained significant attention as a promising approach to accelerate scientific discovery. However, their application in open-ended scientific domains such as biology remains limited, primarily due to the lack of factually grounded and actionable explanations. To address this, w…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 18.5
2026-04-17 · Rohit Sinha, Aditya Kanade, Sai Srinivas Kancheti, Vineeth N Balasubramanian, Tanuja Ganu
General AI
Multimodal large language models (MLLMs) have achieved impressive progress on vision language benchmarks, yet their capacity for visual cognitive and visuospatial reasoning remains less understood. We introduce "Mind's Eye", a multiple-choice benchmark of eight visuo-cognitive tasks inspired by classic human intelligen…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 18.5
2026-04-22 · Muzhi Zhu, Shunyao Jiang, Huanyi Zheng, Zekai Luo, Hao Zhong, Anzhou Li, Kaijun Wang, Jintao Rong, Yang Liu, Hao Chen, Tao Lin, Chunhua Shen
General AI
Spatial intelligence is essential for multimodal large language models, yet current benchmarks largely assess it only from an understanding perspective. We ask whether modern generative or unified multimodal models also possess generative spatial intelligence (GSI), the ability to respect and manipulate 3D spatial cons…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 18.5
2026-04-22 · Juyong Jiang, Chenglin Cai, Chansung Park, Jiasi Shen, Sunghun Kim, Jianguo Li, Yue Wang
General AI
While Large Language Models (LLMs) excel at function-level code generation, project-level tasks such as generating functional and visually aesthetic multi-page websites remain highly challenging. Existing works are often limited to single-page static websites, while agentic frameworks typically rely on multi-turn execu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.4
2026-05-01 · Steven Tang, Xinze Xiong, Anna Hakhverdyan, Andrew Patterson, Jacob Adkins, Jiamin He, Esraa Elelimy, Parham Mohammad Panahi, Martha White, Adam White
Research Track A · General AI
In continual reinforcement learning (CRL), good performance requires never-ending learning, acting, and exploration in a big, partially observable world. Most CRL experiments have focused on loss of plasticity -- the inability to keep learning -- in one-off experiments where some unobservable non-stationarity is added …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 18.4
2026-05-01 · Ziwen Zhao, Menglin Yang
General AI
Retrieval-augmented generation (RAG) enhances large language models with external knowledge, and tree-based RAG organizes documents into hierarchical indexes to support queries at multiple granularities. However, existing Tree-RAG methods designed for single-document retrieval face critical challenges in scaling to cro…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 18.4
2026-05-03 · Zhuofeng Li, Haoxiang Zhang, Cong Wei, Pan Lu, Ping Nie, Yi Lu, Yuyang Bai, Shangbin Feng, Hangxiao Zhu, Ming Zhong, Yuyu Zhang, Jianwen Xie, Yejin Choi, James Zou, Jiawei Han, Wenhu Chen, Jimmy Lin, Dongfu Jiang, Yu Zhang
General AI
Modern retrieval systems, whether lexical or semantic, expose a corpus through a fixed similarity interface that compresses access into a single top-k retrieval step before reasoning. This abstraction is efficient, but for agentic search, it becomes a bottleneck: exact lexical constraints, sparse clue conjunctions, loc…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 18.4
2026-05-04 · Ruoqi Liu, Imran Q. Mohiuddin, Austin J. Schoeffler, Kavita Renduchintala, Ashwin Nayak, Prasantha L. Vemu, Shivam C. Vedak, Kameron C. Black, John L. Havlik, Isaac Ogunmola, Stephen P. Ma, Roopa Dhatt, Jonathan H. Chen
General AI
We introduce PhysicianBench, a benchmark for evaluating LLM agents on physician tasks grounded in real clinical setting within electronic health record (EHR) environments. Existing medical agent benchmarks primarily focus on static knowledge recall, single-step atomic actions, or action intent without verifiable execut…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-04-12 · Cheng-Yen Li, Xuanjun Chen, Claire Lin, Wei-Yu Chen, Wenhua Nie, Hung-Yi Lee, Jyh-Shing Roger Jang
Research Track A · General AI
Large Language Models (LLMs) struggle with knowledge-intensive tasks due to hallucinations and fragmented reasoning over dispersed information. While Retrieval-Augmented Generation (RAG) grounds generation in external sources, existing methods often treat evidence as isolated units, failing to reconstruct the logical c…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-04-13 · Mihir Prabhudesai, Aryan Satpathy, Yangmin Li, Zheyang Qin, Nikash Bhardwaj, Amir Zadeh, Chuan Li, Katerina Fragkiadaki, Deepak Pathak
General AI
We have witnessed remarkable advances in LLM reasoning capabilities with the advent of DeepSeek-R1. However, much of this progress has been fueled by the abundance of internet question-answer (QA) pairs, a major bottleneck going forward, since such data is limited in scale and concentrated mainly in domains like mathem…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-04-13 · Artem Gadzhiev, Andrew Kislov
General AI
Providing AI agents with reliable long-term memory that does not hallucinate remains an open problem. Current approaches to memory for LLM agents -- sliding windows, summarization, embedding-based RAG, and flat fact extraction -- each reduce token cost but introduce catastrophic information loss, semantic drift, or unc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-04-14 · Sophia Sirko-Galouchenko, Monika Wysoczanska, Andrei Bursuc, Nicolas Thome, Spyros Gidaris
General AI
Multimodal large language models (MLLMs) perform well on many vision-language tasks but often struggle with vision-centric problems that require fine-grained visual reasoning. Recent evidence suggests that this limitation arises not from weak visual representations, but from under-utilization of visual information duri…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-04-14 · Yulin Chen, Tri Cao, Haoran Li, Yue Liu, Yibo Li, Yufei He, Le Minh Khoi, Yangqiu Song, Shuicheng Yan, Bryan Hooi
Research Track B · General AI
Web agents powered by vision-language models (VLMs) enable autonomous interaction with web environments by perceiving and acting on both visual and textual webpage content to accomplish user-specified tasks. However, they are highly vulnerable to prompt injection attacks, where adversarial instructions embedded in HTML…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-04-18 · Pollawat Hongwimol, Haoning Shang, Chutong Wang, Zhichao Wan, Yi Gao, Yuanming Li, Lin Gui, Wenhao Sun, Cheng Yu
Research Track A · General AI
Product attribute extraction in e-commerce is bottlenecked by ontologies that are inconsistent, incomplete, and costly to maintain. We present AutoPKG, a multi-agent Large Language Model (LLM) framework that automatically constructs a Product-attribute Knowledge Graph (PKG) from multimodal product content. AutoPKG indu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-04-20 · Terry Leitch
General AI
We present a systematic evaluation of large language model families -- spanning both proprietary cloud APIs and locally-hosted open-source models -- on two purpose-built benchmarks for System Dynamics AI assistance: the \textbf{CLD Leaderboard} (53 tests, structured causal loop diagram extraction) and the \textbf{Discu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-04-20 · Shaden Alshammari, Kevin Wen, Abrar Zainal, Mark Hamilton, Navid Safaei, Sultan Albarakati, William T. Freeman, Antonio Torralba
General AI
Mathematical problem solving remains a challenging test of reasoning for large language and multimodal models, yet existing benchmarks are limited in size, language coverage, and task diversity. We introduce MathNet, a high-quality, large-scale, multimodal, and multilingual dataset of Olympiad-level math problems toget…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-04-20 · Harish Santhanalakshmi Ganesan
General AI
Persistent memory is the bottleneck separating stateless chatbots from long-running agentic systems. Retrieval-augmented generation (RAG) over flat vector stores fragments facts into chunks, loses cross-session identity, and has no first-class notion of supersession or contradiction. Recent bitemporal knowledge-graph s…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-04-21 · Anton Kolonin, Alexey Glushchenko, Evgeny Bochkov, Abhishek Saxena
General AI
Evaluating the reasoning capabilities of Large Language Models (LLMs) for complex, quantitative financial tasks is a critical and unsolved challenge. Standard benchmarks often fail to isolate an agent's core ability to parse queries and orchestrate computations. To address this, we introduce a novel evaluation methodol…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-04-21 · Jing Jin, Hao Liu, Yan Bai, Yihang Lou, Zhenke Wang, Tianrun Yuan, Juntong Chen, Yongkang Zhu, Fanhu Zeng, Xuanyu Zhu, Yige Xu
General AI
Multimodal large language models (MLLMs) have shown promising reasoning abilities, yet evaluating their performance in specialized domains remains challenging. STEM reasoning is a particularly valuable testbed because it provides highly verifiable feedback, but existing benchmarks often permit unimodal shortcuts due to…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-05-12 · Yanting Miao, Yutao Sun, Dexin Wang, Mengyu Zhou, Pascal Poupart, Lei Lv, Qi Zhao, Li Wang, Hao Li, Xiaoxi Jiang, Guanjun Jiang
General AI
Visual latent reasoning lets a multimodal large language model (MLLM) create intermediate visual evidence as continuous tokens, avoiding external tools or image generators. However, existing methods usually follow an output-as-input latent paradigm and yield unstable gains. We identify evidence for a feature-space mism…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-05-12 · Tom Sander, Hongyan Chang, Tomáš Souček, Tuan Tran, Valeriu Lacatusu, Sylvestre-Alvise Rebuffi, Alexandre Mourachko, Surya Parimi, Christophe Ropers, Rashel Moritz, Vanessa Stark, Hady Elsahar, Pierre Fernandez
General AI
We introduce TextSeal, a state-of-the-art watermark for large language models. Building on Gumbel-max sampling, TextSeal introduces dual-key generation to restore output diversity, along with entropy-weighted scoring and multi-region localization for improved detection. It supports serving optimizations such as specula…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.2
2026-04-30 · Binyan Xu, Xilin Dai, Kehuan Zhang
Research Track A · General AI
Current agentic memory systems (vector stores, retrieval-augmented generation, scratchpads, and context-window management) do not implement memory: they implement lookup. We argue that treating lookup as memory is a category error with provable consequences for agent capability, long-term learning, and security. Retrie…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.2
2026-04-30 · Sudong Wang, Weiquan Huang, Xiaomin Yu, Zuhao Yang, Hehai Lin, Keming Wu, Chaojun Xiao, Chen Chen, Wenxuan Wang, Beier Zhu, Yunjian Zhang, Chengwei Qin
General AI
The standard post-training recipe for large multimodal models (LMMs) applies supervised fine-tuning (SFT) on curated demonstrations followed by reinforcement learning with verifiable rewards (RLVR). However, SFT introduces distributional drift that neither preserves the model's original capabilities nor faithfully matc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.2
2026-05-03 · Arash Ahmadi, Sarah Sharif, Yaser, Banad
General AI
Mathematical reasoning is a key benchmark for large language models. Reinforcement learning is a standard post-training mechanism for improving the reasoning capabilities of large language models, yet performance remains sensitive to the design of the reward function that drives policy optimization. This paper introduc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.2
2026-05-04 · Chenchen Zhang
General AI
As large language model (LLM) agents evolve from isolated tool users into coordinated teams, reinforcement learning (RL) must optimize not only individual actions but also how work is spawned, delegated, communicated, aggregated, and stopped. This paper studies RL for LLM-based multi-agent systems through orchestration…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.0
2026-03-15 · Xudong Wang, Gan Li, Zhiyu Liu, Yao Wang, Lianqing Liu, Zhi Han
Research Track A · General AI
Deploying vision-and-language navigation (VLN) agents requires adaptation across diverse scenes and environments, but fine-tuning on a specific scenario often causes catastrophic forgetting in others, which severely limits flexible long-term deployment. We formalize this challenge as the all-day multi-scenes lifelong V…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 18.0
2026-03-26 · Dingjie Song, Tianlong Xu, Yi-Fan Zhang, Hang Li, Zhiling Yan, Xing Fan, Haoyang Li, Lichao Sun, Qingsong Wen
General AI
Assessing student handwritten scratchwork is crucial for personalized educational feedback but presents unique challenges due to diverse handwriting, complex layouts, and varied problem-solving approaches. Existing educational NLP primarily focuses on textual responses and neglects the complexity and multimodality inhe…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 18.0
2026-03-29 · Shijian Wang, Jiarui Jin, Runhao Fu, Zexuan Yan, Xingjian Wang, Mengkang Hu, Eric Wang, Xiaoxi Li, Kangning Zhang, Li Yao, Wenxiang Jiao, Xuelian Cheng, Yuan Lu, Zongyuan Ge
General AI
Research agents have recently achieved significant progress in information seeking and synthesis across heterogeneous textual and visual sources. In this paper, we introduce MuSEAgent, a multimodal reasoning agent that enhances decision-making by extending the capabilities of research agents to discover and leverage st…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 18.0
2026-03-29 · Shi Qiu, Junyi Deng, Yiwei Deng, Haoran Dong, Jieyu Fu, Mao Li, Zeyu Li, Zhaolong Zhang, Huiwen Zheng, Leidong Bao, Anqi Lv, Zihan Mo, Yadi Niu, Yiyang Peng, Yu Tian, Yili Wang, Ziyu Wang, Zi-Yu Wang, Jiashen Wei, Liuheng Wu, Aoran Xue, Leyi Yang, Guanglu Yuan, Xiarui Zhan, Jingjun Zhang, Zifan Zheng, Pengfei Liu, Linrui Zhen, Kaiyang Li, Qichang Li, Ziheng Zhou, Guo-En Nian, Yunwei Xiao, Qing-Hong Cao, Linjie Dai, Xu Feng, Peng Gao, Ying Gu, Chang Liu, Jia Liu, Ming-xing Luo, Yan-Qing Ma, Liang-You Peng, Huichao Song, Shufeng Wang, Chenxu Wang, Tao Wang, Yi-Nan Wang, Chengyin Wu, Pengwei Zhao, Hua Xing Zhu
General AI
AI agents powered by large language models exhibit strong reasoning and problem-solving capabilities, enabling them to assist scientific research tasks such as formula derivation and code generation. However, whether these agents can reliably perform end-to-end reproduction from real scientific papers remains an open q…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 18.0
2026-04-14 · Zhiyuan Zeng, Jiameng Huang, Zhangyue Yin, Jiashuo Liu, Ziniu Li, Bingrui Li, Yuhao Wu, Yining Zheng, Ge Zhang, Wenhao Huang, Xipeng Qiu
General AI
Reinforcement learning with verifiable rewards (RLVR) has become a central paradigm for improving reasoning and code generation in large language models, and GRPO-style training is widely adopted for its simplicity and effectiveness. However, an important design choice remains underexplored: how token-level policy grad…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.0
2026-04-18 · Zhaokang Liao, Yingguo Gao, Yi Yang, Yongheng Hu, Jingting Ding
Research Track A · General AI
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising approach to improve the reasoning abilities of Large Language Models (LLMs). Among RLVR algorithms, Group Relative Policy Optimization (GRPO) and its variants have demonstrated strong performance and high training efficiency. However, GRPO…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.0
2026-04-24 · Meng Chu, Xuan Billy Zhang, Kevin Qinghong Lin, Lingdong Kong, Jize Zhang, Teng Tu, Weijian Ma, Ziqi Huang, Senqiao Yang, Wei Huang, Yeying Jin, Zhefan Rao, Jinhui Ye, Xinyu Lin, Xichen Zhang, Qisheng Hu, Shuai Yang, Leyang Shen, Wei Chow, Yifei Dong, Fengyi Wu, Quanyu Long, Bin Xia, Shaozuo Yu, Mingkang Zhu, Wenhu Zhang, Jiehui Huang, Haokun Gui, Haoxuan Che, Long Chen, Qifeng Chen, Wenxuan Zhang, Wenya Wang, Xiaojuan Qi, Yang Deng, Yanwei Li, Mike Zheng Shou, Zhi-Qi Cheng, See-Kiong Ng, Ziwei Liu, Philip Torr, Jiaya Jia
Research Track B · General AI
As AI systems move from generating text to accomplishing goals through sustained interaction, the ability to model environment dynamics becomes a central bottleneck. Agents that manipulate objects, navigate software, coordinate with others, or design experiments require predictive environment models, yet the term world…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 18.0
2026-05-09 · Liangqi Yuan, Wenzhi Fang, Shiqiang Wang, Christopher G. Brinton
General AI
Large language model (LLM) agents face a structural tension: cloud agents provide strong reasoning but expose user data, while on-device agents preserve privacy at the cost of overall capability. Existing device-cloud designs treat this boundary as a compute split rather than a trust boundary suited to agentic workload…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.0
2026-05-12 · Xuhao Hu, Xi Zhang, Haiyang Xu, Kyle Qiao, Jingyi Yang, Xuanjing Huang, Jing Shao, Ming Yan, Jieping Ye
Research Track B · General AI
Computer Use Agents (CUAs) can act through both atomic GUI actions, such as click and type, and high-level tool calls, such as API-based file operations, but this hybrid action space often leaves them uncertain about when to continue with GUI actions or switch to tools, leading to suboptimal execution paths. This diffi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.0
2026-05-12 · Phu-Quy Nguyen-Lam, Phu-Hoa Pham, Dao Sy Duy Minh, Chi-Nguyen Tran, Huynh Trung Kiet, Long Tran-Thanh
Research Track A · General AI
Object-centric representations promise a key property for few-shot learning: Rather than treating a scene as a single unit, a model can decompose it into individual object-level parts that can be matched and compared across different concepts. In practice, this potential is rarely realized. Continual learners either co…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.9
2026-05-04 · Joern Hentsch
Research Track A · General AI
Continual learning systems face a fundamental tension between plasticity -- acquiring new knowledge -- and stability -- retaining prior knowledge. We introduce MPCS (Multi-Plasticity Continual System), a neuroplastic architecture that integrates eleven complementary mechanisms: task-driven neurogenesis, Fourier-encoded…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.8
2026-03-26 · Abdullah Hamdi, Changchun Yang, Xin Gao
General AI
Early screening via colonoscopy is critical for colon cancer prevention, yet developing robust AI systems for this domain is hindered by the lack of densely annotated, long-sequence video datasets. Existing datasets predominantly focus on single-class polyp detection and lack the rich spatial, temporal, and linguistic …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.8
2026-03-26 · Liang Zhang, Yu Fu, Xinyi Jin
General AI
Large Language Models (LLMs) are increasingly used in math education not only as problem solvers but also as assessors of learners' reasoning. However, it remains unclear whether stronger math problem-solving ability is associated with stronger step-level assessment performance. This study examines that relationship us…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.8
2026-03-26 · André G. Viveiros, Nuno Gonçalves, Matthias Lindemann, André Martins
General AI
While language reasoning models excel in many tasks, visual reasoning remains challenging for current large multimodal models (LMMs). As a result, most LMMs default to verbalizing perceptual content into text, a strong limitation for tasks requiring fine-grained spatial and visual understanding. While recent approaches…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.8
2026-03-26 · Jiwook Han, Geo Ahn, Youngrae Kim, Jinwoo Choi
General AI
Multimodal Large Language Models (MLLMs) have shown strong performance on Video Temporal Grounding (VTG). However, their coarse recognition capabilities are insufficient for fine-grained temporal understanding, making task-specific fine-tuning indispensable. This fine-tuning causes models to memorize dataset-specific s…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.8
2026-03-26 · Jingwei Ni, Yihao Liu, Xinpeng Liu, Yutao Sun, Mengyu Zhou, Pengyu Cheng, Dexin Wang, Xiaoxi Jiang, Guanjun Jiang
General AI
Equipping Large Language Model (LLM) agents with domain-specific skills is critical for tackling complex tasks. Yet, manual authoring creates a severe scalability bottleneck. Conversely, automated skill generation often yields fragile or fragmented results because it either relies on shallow parametric knowledge or seq…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.8
2026-03-30 · Huanxuan Liao, Zhongtao Jiang, Yupu Hao, Yuqiao Tan, Shizhu He, Jun Zhao, Kun Xu, Kang Liu
General AI
Multimodal Large Language Models (MLLMs) achieve stronger visual understanding by scaling input fidelity, yet the resulting visual token growth makes jointly sustaining high spatial resolution and long temporal context prohibitive. We argue that the bottleneck lies not in how post-encoding representations are compresse…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.8
2026-03-31 · Fumihiko Tsuchiya, Taiki Miyanishi, Mahiro Ukai, Nakamasa Inoue, Shuhei Kurita, Yusuke Iwasawa, Yutaka Matsuo
General AI
Counting in long videos remains a fundamental yet underexplored challenge in computer vision. Real-world recordings often span tens of minutes or longer and contain sparse, diverse events, making long-range temporal reasoning particularly difficult. However, most existing video counting benchmarks focus on short clips …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.8
2026-04-06 · Ke Li, Maoliang Li, Jialiang Chen, Jiayu Chen, Zihao Zheng, Shaoqi Wang, Xiang Chen
General AI
Video mashup creation represents a complex video editing paradigm that recomposes existing footage to craft engaging audio-visual experiences, demanding intricate orchestration across semantic, visual, and auditory dimensions and multiple levels. However, existing automated editing frameworks often overlook the cross-l…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.8
2026-04-06 · Shuai Liu, Shulin Tian, Kairui Hu, Yuhao Dong, Zhe Yang, Bo Li, Jingkang Yang, Chen Change Loy, Ziwei Liu
General AI
Coworking AI agents operating within local file systems are rapidly emerging as a paradigm in human-AI interaction; however, effective personalization remains limited by severe data constraints, as strict privacy barriers and the difficulty of jointly collecting multimodal real-world traces prevent scalable training an…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.8
2026-04-07 · Wang Yang, Chaoda Song, Xinpeng Li, Debargha Ganguly, Chuang Ma, Shouren Wang, Zhihao Dou, Yuli Zhou, Vipin Chaudhary, Xiaotian Han
General AI
Existing Agent benchmarks suffer from two critical limitations: high environment interaction overhead (up to 41\% of total evaluation time) and imbalanced task horizon and difficulty distributions that make aggregate scores unreliable. To address these issues, we propose ACE-Bench built around a unified grid-based plan…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.8
2026-04-07 · Bowen Ye, Rang Li, Qibin Yang, Yuanxin Liu, Linli Yao, Hanglong Lv, Zhihui Xie, Chenxin An, Lei Li, Lingpeng Kong, Qi Liu, Zhifang Sui, Tong Yang
General AI
Large language models are increasingly deployed as autonomous agents executing multi-step workflows in real-world software environments. However, existing agent benchmarks suffer from three critical limitations: (1) trajectory-opaque grading that checks only final outputs, (2) underspecified safety and robustness evalu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.8
2026-04-07 · Juekai Lin, Yun Zhu, Honglin Lin, Sijing Li, Tianwei Lin, Zheng Liu, Xiaoyang Wang, Wenqiao Zhang, Lijun Wu
General AI
Graphics Program Synthesis is pivotal for interpreting and editing visual data, effectively facilitating the reverse-engineering of static visuals into editable TikZ code. While TikZ is the de facto standard for scientific schematics due to its programmatic flexibility, its requirement for rigorous spatial precision pr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.8
2026-04-09 · Tanmay Gupta, Piper Wolters, Zixian Ma, Peter Sushko, Rock Yuren Pang, Diego Llanes, Yue Yang, Taira Anderson, Boyuan Zheng, Zhongzheng Ren, Harsh Trivedi, Taylor Blanton, Caleb Ouellette, Winson Han, Ali Farhadi, Ranjay Krishna
Research Track B · General AI
Web agents--autonomous systems that navigate and execute tasks on the web on behalf of users--have the potential to transform how people interact with the digital world. However, the most capable web agents today rely on proprietary models with undisclosed training data and recipes, limiting scientific understanding, r…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.8
2026-04-09 · Hang Ye, Xiaoxuan Ma, Fan Lu, Wayne Wu, Kwan-Yee Lin, Yizhou Wang
General AI
Digital human generation has been studied for decades and supports a wide range of real-world applications. However, most existing systems are passively animated, relying on privileged state or scripted control, which limits scalability to novel environments. We instead ask: how can digital humans actively behave using…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.5
2026-01-08 · Uday Allu, Sonu Kedia, Tanmay Odapally, Biddwan Ahmed
General AI
Retrieval-Augmented Generation (RAG) systems critically depend on effective document chunking strategies to balance retrieval quality, latency, and operational cost. Traditional chunking approaches, such as fixed-size, rule-based, or fully agentic chunking, often suffer from high token consumption, redundant text gener…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.5
2026-04-13 · Yinuo Yang, Zixian Ma, Manasi Ganti, Jieyu Zhang, Ranjay Krishna
General AI
We present a discriminative multimodal reward model that scores all candidate responses in a single forward pass. Conventional discriminative reward models evaluate each response independently, requiring multiple forward passes, one for each potential response. Our approach concatenates multiple responses with separato…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.5
2026-04-14 · Tomer Ashuach, Liat Ein-Dor, Shai Gretz, Yoav Katz, Yonatan Belinkov
General AI
Humans use introspection to evaluate their understanding through private internal states inaccessible to external observers. We investigate whether large language models possess similar privileged knowledge about answer correctness, information unavailable through external observation. We train correctness classifiers …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.5
2026-04-15 · Genghan Zhang, Shaowei Zhu, Anjiang Wei, Zhenyu Song, Allen Nie, Zhen Jia, Nandita Vijaykumar, Yida Wang, Kunle Olukotun
General AI
We present AccelOpt, a self-improving large language model (LLM) agentic system that autonomously optimizes kernels for emerging AI acclerators, eliminating the need for expert-provided hardware-specific optimization knowledge. AccelOpt explores the kernel optimization space through iterative generation, informed by an…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.5
2026-04-17 · Jize Wang, Xuanxuan Liu, Yining Li, Songyang Zhang, Yijun Wang, Zifei Shan, Xinyi Le, Cailian Chen, Xinping Guan, Dacheng Tao
General AI
The development of general-purpose agents requires a shift from executing simple instructions to completing complex, real-world productivity workflows. However, current tool-use benchmarks remain misaligned with real-world requirements, relying on AI-generated queries, dummy tools, and limited system-level coordination…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.5
2026-04-21 · Hongnan Ma, Han Wang, Shenglin Wang, Tieyue Yin, Yiwei Shi, Yucong Huang, Yingtian Zou, Muning Wen, Mengyue Yang
General AI
Large language models can generate plausible game code, but turning this capability into iterative creative improvement remains difficult. In practice, single-shot generation often produces brittle runtime behavior, weak accumulation of experience across versions, and creativity scores that are too subjective to serve …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.5
2026-04-21 · Zijie Li, Yichun Shi, Jingxiang Sun, Ye Wang, Yixuan Huang, Zhiyao Guo, Xiaochen Lian, Peihao Zhu, Yu Tian, Zhonghua Zhai, Peng Wang
General AI
We present MMCORE, a unified framework designed for multimodal image generation and editing. MMCORE leverages a pre-trained Vision-Language Model (VLM) to predict semantic visual embeddings via learnable query tokens, which subsequently serve as conditioning signals for a diffusion model. This streamlined design effect…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.5
2026-04-21 · Xiachong Feng, Yi Jiang, Xiaocheng Feng, Deyi Yin, Libo Qin, Yangfan Ye, Lei Huang, Weitao Ma, Yuxuan Gu, Chonghan Qin, Bing Qin, Lingpeng Kong
General AI
Social intelligence, the ability to navigate complex interpersonal interactions, presents a fundamental challenge for language agents. Training such agents via reinforcement learning requires solving the credit assignment problem: determining how individual utterances contribute to multi-turn dialogue outcomes. Existin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.5
2026-04-26 · Alexander Bering
Research Track A · General AI
Despite a century of empirical memory research, existing AI agent memory systems rely on system-engineering metaphors (virtual-memory paging, flat LLM storage, Zettelkasten notes), none integrating principles of consolidation, forgetting, and reconsolidation. We present ZenBrain, a multi-layer memory architecture integ…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.5
2026-04-27 · Yingqian Min, Kun Zhou, Yifan Li, Yuhuan Wu, Han Peng, Yifan Du, Wayne Xin Zhao, Min Yang, Ji-Rong Wen
General AI
Recent advancements in reinforcement learning with verifiable rewards (RLVR) have significantly improved the complex reasoning ability of vision-language models (VLMs). However, its outcome-level supervision is too coarse to diagnose and correct errors within the reasoning chain. To this end, we propose Perceval, a pro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.5
2026-05-07 · Oğuzhan Fatih Kar, Roman Bachmann, Yuanzheng Gong, Anders Boesen Lindbo Larsen, Afshin Dehghan
Research Track B · General AI
The web is complex, open-ended, and constantly changing, making it challenging to scale training data for visual web agents. Existing data collection attempts remain limited to offline trajectories for supervised fine-tuning or a handful of simulated environments for RL training, thus failing to capture web diversity. …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.4
2026-04-29 · Jinbiao Wei, Kangqi Ni, Yilun Zhao, Guo Gan, Arman Cohan
General AI
Computer-use agents provide a promising path toward general software automation because they can interact directly with arbitrary graphical user interfaces instead of relying on brittle, application-specific integrations. Despite recent advances in benchmark performance, strong computer-use agents remain expensive and …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.4
2026-04-30 · Qiyao Wang, Haoran Hu, Longze Chen, Hongbo Wang, Hamid Alinejad-Rokny, Yuan Lin, Min Yang
General AI
With the advancement of multimodal large language models (MLLMs) and coding agents, the website development has shifted from manual programming to agent-based project-level code synthesis. Existing benchmarks rely on idealized assumptions, especially for well-structured, information-rich inputs and static execution set…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.4
2026-05-02 · Wenhao Li, Xiu Su, Yichao Cao, Hongyan Xu, Xiaobo Xia, Shan You, Yi Chen, Chang Xu
Research Track A · General AI
Vision-language-action (VLA) models have advanced the field of embodied manipulation by harnessing broad world knowledge and strong generalization. However, current VLA models still face several key challenges, including limited reasoning capability, lack of status monitoring, and difficulty in self-correction. In this…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-14 · Yifan Du, Zikang Liu, Jinbiao Peng, Jie Wu, Junyi Li, Jinyang Li, Wayne Xin Zhao, Ji-Rong Wen
General AI
Multimodal deep search agents have shown great potential in solving complex tasks by iteratively collecting textual and visual evidence. However, managing the heterogeneous information and high token costs associated with multimodal inputs over long horizons remains a critical challenge, as existing methods often suffe…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-20 · Jinghui Lu, Jiayi Guan, Zhijian Huang, Jinlong Li, Guang Li, Lingdong Kong, Yingyan Li, Han Wang, Shaoqing Xu, Yuechen Luo, Fang Li, Chenxu Dang, Junli Wang, Tao Xu, Jing Wu, Jianhua Wu, Xiaoshuai Hao, Wen Zhang, Tianyi Jiang, Lingfeng Zhang, Lei Zhou, Yingbo Tang, Jie Wang, Yinfeng Gao, Xizhou Bu, Haochen Tian, Yihang Qiu, Feiyang Jia, Lin Liu, Yigu Ge, Hanbing Li, Yuannan Shen, Jianwei Cui, Hongwei Xie, Bing Wang, Haiyang Sun, Jingwei Zhao, Jiahui Huang, Pei Liu, Zeyu Zhu, Yuncheng Jiang, Zibin Guo, Chuhong Gong, Hanchao Leng, Kun Ma, Naiyang Wang, Guang Chen, Kuiyuan Yang, Hangjun Ye, Long Chen
General AI
Chain-of-Thought (CoT) reasoning has become a powerful driver of trajectory prediction in VLA-based autonomous driving, yet its autoregressive nature imposes a latency cost that is prohibitive for real-time deployment. Latent CoT methods attempt to close this gap by compressing reasoning into continuous hidden states, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-22 · Naizhong Xu
Research Track A · General AI
Modern retrieval-augmented generation (RAG) systems treat vector embeddings as static, context-free artifacts: an embedding has no notion of when it was created, how trustworthy its source is, or which other embeddings depend on it. This flattening of knowledge has a measurable cost: recent work on VersionRAG reports t…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-24 · Lihao Zheng, Zhenwei Shao, Yu Zhou, Yan Yang, Xintian Shen, Jiawei Chen, Hao Ma, Tao Wei
General AI
Although Multimodal Large Language Models (MLLMs) have advanced rapidly, they still face notable challenges in fine-grained multi-image understanding, often exhibiting spatial hallucination, attention leakage, and failures in object constancy. In addition, existing approaches typically rely on expensive human annotatio…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-24 · Jichao Wang, Liuyang Bian, Yufeng Zhou, Han Xiao, Yue Pan, Guozhi Wang, Hao Wang, Zhaoxiong Wang, Yafei Wen, Xiaoxin Chen, Shuai Ren, Lingfang Zeng
Research Track B · General AI
As Multimodal Large Language Models (MLLMs) mature, GUI agents are evolving from static interactions to complex navigation. While Reinforcement Learning (RL) has emerged as a promising paradigm for training MLLM agents on dynamic GUI tasks, its effective application faces a dilemma. Standard Offline RL often relies on …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-27 · Xihang Wang, Zihan Wang, Chengkai Huang, Quan Z. Sheng, Lina Yao
General AI
Multimodal Retrieval-Augmented Generation (MRAG) addresses key limitations of Multimodal Large Language Models (MLLMs), such as hallucination and outdated knowledge. However, current MRAG systems struggle to distinguish whether retrieved multimodal data truly supports the semantic core of an answer or merely provides s…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-27 · Mofei Li, Taozhi Chen, Guowei Yang, Jia Li
Research Track A · General AI
Large Language Models (LLMs) excel at general code generation, but their performance drops sharply in enterprise settings that rely on internal private libraries absent from public pre-training corpora. While Retrieval-Augmented Generation (RAG) offers a training-free alternative by providing static API documentation, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-05-12 · Haoyu Wang, Yuliang Song, Tao Li, Zhiwei Deng, Yaqing Wang, Deepak Ramachandran, Eldan Cohen, Dan Roth
General AI
Large Language Models (LLMs) struggle to solve complex combinatorial problems through direct reasoning, so recent neuro-symbolic systems increasingly use them to synthesize executable solvers. A central design question is how the LLM should represent the solver, and whether it should also attempt to optimize search. We…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-05-12 · Wufei Ma, Chloe Wang, Siyi Chen, Jiawei Peng, Patrick Li, Alan Yuille
General AI
While self-supervised pretraining has reduced vision systems' reliance on synthetic data, simulation remains an indispensable tool for closed-loop optimization and rigorous out-of-distribution (OOD) evaluation. However, modern simulation platforms often present steep technical barriers, requiring extensive expertise in…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-05-12 · Zhong Li, Zihan Guo, Xiaohan Lu, Juntao Wang, Jie Song, Chao Shen, Jiageng Wu, Mingyang Sun
Research Track A · General AI
Large language models (LLMs) are increasingly used to translate natural-language optimization problems into mathematical formulations and solver code, but matching the reference objective value is not a reliable test of correctness: an artifact may agree numerically while still changing the underlying optimization sema…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.2
2026-04-29 · Saber Zerhoudi, Michael Granitzer, Jelena Mitrovic
General AI
Training trustworthy agentic LLMs requires data that shows the grounded reasoning process, not just the final answer. Existing datasets fall short: question-answering data is outcome-only, chain-of-thought data is not tied to specific documents, and web-agent datasets track interface actions rather than the core retrie…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.2
2026-04-29 · Wanrong Zheng, Yunhao Ge, Laurent Itti
General AI
Breakthrough progress in vision-based navigation through unknown environments has been achieved by using multimodal large language models (MLLMs). These models can plan a sequence of motions by evaluating the current view at each time step against the task and goal given to the agent. However, current zero-shot Vision-…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.2
2026-04-30 · Yanting Wang, Chenlong Yin, Ying Chen, Jinyuan Jia
General AI
Long-context large language models (LLMs)-for example, Gemini-3.1-Pro and Qwen-3.5-are widely used to empower many real-world applications, such as retrieval-augmented generation, autonomous agents, and AI assistants. However, security remains a major concern for their widespread deployment, with threats such as prompt…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.2
2026-05-01 · Yawen Qin, Ke Qiu, Qin Zhang
General AI
Dance serves as both a cultural cornerstone and a medium for personal expression, yet the rapid growth of online dance content has made personalized discovery increasingly difficult. Text-based dance retrieval offers a natural interface for users to search with choreographic intent, but it remains underexplored because…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.2
2026-05-01 · Arunabh Srivastava, Mohammad A., Khojastepour, Srimat Chakradhar, Sennur Ulukus
General AI
Humans solve problems by executing targeted plans, yet large language models (LLMs) remain unreliable for structured workflow execution. We propose RunAgent, a multi-agent plan execution platform that interprets natural-language plans while enforcing stepwise execution through constraints and rubrics. RunAgent bridges …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.0
2026-03-19 · Minhua Lin, Zhiwei Zhang, Hanqing Lu, Hui Liu, Xianfeng Tang, Qi He, Xiang Zhang, Suhang Wang
General AI
Memory-augmented LLM agents maintain external memory banks to support long-horizon interaction, yet most existing systems treat construction, retrieval, and utilization as isolated subroutines. This creates two coupled challenges: strategic blindness on the forward path of the memory cycle, where construction and retri…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.0
2026-04-01 · Shuguang Chen, Adil Hafeez, Salman Paracha
General AI
Agentic applications based on large language models increasingly rely on multi-step interaction loops involving planning, action execution, and environment feedback. While such systems are now deployed at scale, improving them post-deployment remains challenging. Agent trajectories are voluminous and non-deterministic,…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.0
2026-04-03 · Shufan Jiang, Chios Chen, Zhiyang Chen
General AI
The autonomous discovery of bugs remains a significant challenge in modern software development. Compared to code generation, the complexity of dynamic runtime environments makes bug discovery considerably harder for large language models (LLMs). In this paper, we take game development as a representative domain and in…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.0
2026-04-06 · Varun Pratap Bhardwaj
Research Track A · General AI
AI coding agents operate in a paradox: they possess vast parametric knowledge yet cannot remember a conversation from an hour ago. Existing memory systems store text in vector databases with single-channel retrieval, require cloud LLMs for core operations, and implement none of the cognitive processes that make human m…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.0
2026-04-07 · Guhao Feng, Shengjie Luo, Kai Hua, Ge Zhang, Di He, Wenhao Huang, Tianle Cai
Research Track A · General AI
The static ``train then deploy" paradigm fundamentally limits Large Language Models (LLMs) from dynamically adapting their weights in response to continuous streams of new information inherent in real-world tasks. Test-Time Training (TTT) offers a compelling alternative by updating a subset of model parameters (fast we…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.0
2026-04-09 · Sai Srinivas Kancheti, Aditya Kanade, Rohit Sinha, Vineeth N Balasubramanian, Tanuja Ganu
General AI
Multimodal reasoning models (MRMs) trained with reinforcement learning with verifiable rewards (RLVR) show improved accuracy on visual reasoning benchmarks. However, we observe that accuracy gains often come at the cost of reasoning quality: generated Chain-of-Thought (CoT) traces are frequently inconsistent with the f…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.0
2026-04-09 · Ailing Zeng, Casper Yang, Chauncey Ge, Eddie Zhang, Garvey Xu, Gavin Lin, Gilbert Gu, Jeremy Pi, Leo Li, Mingyi Shi, Sheng Bi, Steven Tang, Thorn Hang, Tobey Guo, Vincent Li, Xin Tong, Yikang Li, Yuchen Sun, Yue, Zhao, Yuhan Lu, Yuwei Li, Zane Zhang, Zeshi Yang, Zi Ye
General AI
Performance, the externalization of intent, emotion, and personality through visual, vocal, and temporal behavior, is what makes a character alive. Learning such performance from video is a promising alternative to traditional 3D pipelines. However, existing video models struggle to jointly achieve high expressiveness,…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.0
2026-04-19 · Ou Wu
Research Track A · General AI
Large language model optimization has historically bifurcated into isolated data-centric and model-centric paradigms: the former manipulates involved samples through selection, augmentation, or poisoning, while the latter tunes model weights via masking, quantization, or low-rank adaptation. This paper establishes a un…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-03-24 · Yenchia Feng, Chirag Sharma, Karime Maamari
Research Track B · General AI
Although large language models (LLMs) have advanced rapidly, robust automation of complex software workflows remains an open problem. In long-horizon settings, agents frequently suffer from cascading errors and environmental stochasticity; a single misstep in a dynamic interface can lead to task failure, resulting in h…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-03-26 · Vishal Narnaware, Animesh Gupta, Kevin Zhai, Zhenyi Wang, Mubarak Shah
General AI
Multimodal Diffusion Large Language Models (MDLLMs) achieve high-concurrency generation through parallel masked decoding, yet the architectures remain prone to multimodal hallucinations. This structural vulnerability stems from an algorithmic flaw: the decoder ranks candidate tokens based on textual likelihood without …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-03-30 · Kaushitha Silva, Srinath Perera
General AI
Large Language Models (LLMs) have demonstrated impressive capabilities in code generation. While an interactive feedback loop can improve performance, writing effective tests is a non-trivial task. Early multi-agent frameworks, such as AgentCoder, automated this process but relied on generated tests as absolute ground …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-03-30 · Kaituo Feng, Manyuan Zhang, Shuang Chen, Yunlong Lin, Kaixuan Fan, Yilei Jiang, Hongyu Li, Dian Zheng, Chenyang Wang, Xiangyu Yue
General AI
Recent image generation models have shown strong capabilities in generating high-fidelity and photorealistic images. However, they are fundamentally constrained by frozen internal knowledge, thus often failing on real-world scenarios that are knowledge-intensive or require up-to-date information. In this paper, we pres…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-03-30 · Zimu Zhang, Yucheng Zhang, Xiyan Xu, Ziyin Wang, Sirui Xu, Kai Zhou, Bing Zhou, Chuan Guo, Jian Wang, Yu-Xiong Wang, Liang-Yan Gui
General AI
Synthesizing human motion has advanced rapidly, yet realistic hand motion and bimanual interaction remain underexplored. Whole-body models often miss the fine-grained cues that drive dexterous behavior, finger articulation, contact timing, and inter-hand coordination, and existing resources lack high-fidelity bimanual …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-03-31 · Yang Shen, Zhenyi Yi, Ziyi Zhao, Lijun Sun, Dongyang Li, Chin-Teng Lin, Yuhui Shi
Research Track A · General AI
As AI agents evolve, the community is rapidly shifting from single Large Language Models (LLMs) to Multi-Agent Systems (MAS) to overcome cognitive bottlenecks in automated research. However, the optimal multi-agent coordination framework for these autonomous agents remains largely unexplored. In this paper, we present …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-04-06 · Lei Zhang, Junjiao Tian, Zhipeng Fan, Kunpeng Li, Jialiang Wang, Weifeng Chen, Markos Georgopoulos, Felix Juefei-Xu, Yuxiang Bao, Julian McAuley, Manling Li, Zecheng He
General AI
Humans paint images incrementally: they plan a global layout, sketch a coarse draft, inspect, and refine details, and most importantly, each step is grounded in the evolving visual states. However, can unified multimodal models trained on text-image interleaved datasets also imagine the chain of intermediate states? In…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-04-07 · Yuchi Wang, Haiyang Yu, Weikang Bian, Jiefeng Long, Xiao Liang, Chao Feng, Hongsheng Li
General AI
MLLMs have been successfully applied to multimodal embedding tasks, yet their generative reasoning capabilities remain underutilized. Directly incorporating chain-of-thought reasoning into embedding learning introduces two fundamental challenges. First, structural misalignment between instance-level reasoning and pairw…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-04-07 · Komal Kumar, Aman Chadha, Salman Khan, Fahad Shahbaz Khan, Hisham Cholakkal
General AI
The rapid growth of scientific literature has made it increasingly difficult for researchers to efficiently discover, evaluate, and synthesize relevant work. Recent advances in multi-agent large language models (LLMs) have demonstrated strong potential for understanding user intent and are being trained to utilize vari…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-04-28 · Jianghao Lin, Zi Ling, Chenyu Zhou, Tianyi Xu, Ruoqing Jiang, Zizhuo Wang, Dongdong Ge
General AI
Optimization modeling underpins real-world decision-making in logistics, manufacturing, energy, and public services, but reliably solving such problems from natural-language requirements remains challenging for current large language models (LLMs). In this paper, we propose \emph{Agora-Opt}, a modular agentic framework…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-04-28 · Guanglin Niu, Bo Li
General AI
Most multi-modal knowledge graph completion (MMKGC) models use one embedding scorer to do both retrieval over the full entity set and final decision making. We argue that this coupling is a core bottleneck: global high-recall search and local fine-grained disambiguation require different inductive biases. Therefore, we…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-04-28 · Xiyuan Yang, Jiaru Zou, Rui Pan, Ruizhong Qiu, Pan Lu, Shizhe Diao, Jindong Jiang, Hanghang Tong, Tong Zhang, Markus J. Buehler, Jingrui He, James Zou
General AI
Recursive or looped language models have recently emerged as a new scaling axis by iteratively refining the same model computation over latent states to deepen reasoning. We extend such scaling principle from a single model to multi-agent systems, and ask: Can agent collaboration itself be scaled through recursion? To …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-04-28 · Ran Gu, Benjamin Hou, Mélanie Hébert, Asmita Indurkar, Yifan Yang, Emily Y. Chew, Tiarnán D. L. Keenan, Zhiyong Lu
General AI
Despite strong performance of deep learning models in retinal disease detection, most systems produce static predictions without clinical reasoning or interactive explanation. Recent advances in multimodal large language models (MLLMs) integrate diagnostic predictions with clinically meaningful dialogue to support clin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-05-07 · Mingwei Xu, Hao Fang
General AI
Reinforcement learning with verifiable rewards (RLVR), due to the deterministic verification, becomes a dominant paradigm for enhancing the reasoning ability of large language models (LLMs). The community witnesses the rapid change from the Proximal Policy Optimization (PPO) to Group Relative Policy Optimization (GRPO)…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-05-07 · Ziyu Zhai, Siyou Li, Juexi Shao, Juntao Yu
General AI
Developing ceramic glazes is a costly, time-consuming process of trial and error due to complex chemistry, placing a significant burden on independent artists. While recent advances in multimodal AI offer a modern solution, the field lacks the large-scale datasets required to train these models. We propose GlazyBench, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-05-07 · Siru Ouyang, Jun Yan, Yanfei Chen, Rujun Han, Zifeng Wang, Bhavana Dalvi Mishra, Rui Meng, Chun-Liang Li, Yizhu Jiao, Kaiwen Zha, Maohao Shen, Vishy Tirumalashetty, George Lee, Jiawei Han, Tomas Pfister, Chen-Yu Lee
General AI
LLM-based agents are increasingly deployed to handle streaming tasks, yet they often remain one-off problem solvers that fail to learn from past interactions. Reusable skills distilled from experience provide a natural substrate for self-evolution, where high-quality skill curation serves as the key bottleneck. Existin…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.8
2026-05-10 · Kun Xiang, Terry Jingchen Zhang, Zirong Liu, Bokai Zhou, Yueling Tang, Junjie Yu, Jiacong Lu, Shangrui Huang, Heng Li, Likui Zhang, Kunkun Liu, Changzheng Zhang, Yangle Fang, Boqiang Guo, Hui-Ling Zhen, Dandan Tu, Yinya Huang, Xiaodan Liang
General AI
We introduce SeePhys Pro, a fine-grained modality transfer benchmark that studies whether models preserve the same reasoning capability when critical information is progressively transferred from text to image. Unlike standard vision-essential benchmarks that evaluate a single input form, SeePhys Pro features four sema…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.6
2026-05-11 · Kefei Tian, Yuansheng Lian, Kai Yang, Xiangdong Chen, Shen Li
General AI
Safety-critical planning in complex environments, particularly at urban intersections, remains a fundamental challenge for autonomous driving. Existing methods, whether rule-based or data-driven, frequently struggle to capture complex scene semantics, infer potential risks, and make reliable decisions in rare, high-ris…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.5
2026-04-05 · Gunn Kim
Research Track A · General AI
Continual learning in artificial neural networks is fundamentally limited by the stability--plasticity dilemma: systems that retain prior knowledge tend to resist acquiring new knowledge, and vice versa. Existing approaches, most notably elastic weight consolidation~(EWC), address this empirically without a physical ac…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.5
2026-04-07 · Manuel Barusco, Francesco Borsatti, David Petrovic, Davide Dalle Pezze, Gian Antonio Susto
Research Track A · General AI
Visual Anomaly Detection (VAD) is a critical task for many applications including industrial inspection and healthcare. While VAD has been extensively studied, two key challenges remain largely unaddressed in conjunction: edge deployment, where computational resources are severely constrained, and continual learning, w…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.5
2026-04-09 · Danit Yanowsky, Daphna Weinshall
Research Track A · General AI
Catastrophic forgetting remains a key challenge in Continual Learning (CL). In replay-based CL with severe memory constraints, performance critically depends on the sample selection strategy for the replay buffer. Most existing approaches construct memory buffers using embeddings learned under supervised objectives. Ho…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.5
2026-04-21 · Fan Li, Chonghuinan Wang, Lina Lei, Yuping Qiu, Jiaqi Xu, Jiaxiu Jiang, Xinran Qin, Zhikai Chen, Fenglong Song, Zhixin Wang, Renjing Pei, Wangmeng Zuo
General AI
Common image editing tasks typically adopt powerful generative diffusion models as the leading paradigm for real-world content editing. Meanwhile, although reinforcement learning (RL) methods such as Diffusion-DPO and Flow-GRPO have further improved generation quality, efficiently applying Reinforcement Learning from H…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.5
2026-04-21 · Bobo Li, Rui Wu, Zibo Ji, Meishan Zhang, Hao Fei, Min Zhang, Mong-Li Lee, Wynne Hsu
General AI
Large Language Model agents have rapidly evolved from static text generators into dynamic systems capable of executing complex autonomous workflows. To enhance reliability, multi-agent frameworks assigning specialized roles are increasingly adopted to enable self-reflection and mutual auditing. While such role-playing …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.5
2026-04-21 · Qihua Dong, Gozde Sahin, Pei Wang, Zhaowei Cai, Robik Shrestha, Hao Yang, Davide Modolo
General AI
In this paper, we investigate the problem of how to effectively master tool-use to solve complex visual reasoning tasks for Multimodal Large Language Models. To achieve that, we propose a novel Tool-supervised Reinforcement Learning (ToolsRL) framework, with direct tool supervision for more effective tool-use learning.…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-13 · Stefan Miteski
Research Track A · General AI
Retrieval-Augmented Generation remains the dominant pattern for giving LLMs persistent memory, but a visible cluster of personal wiki-style memory architectures emerged in April 2026 -- design proposals from Karpathy, MemPalace, and LLM Wiki v2 that compile knowledge into an interlinked artifact for long-term use by a …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-14 · Sohyun An, Shuibenyang Yuan, Hayeon Lee, Cho-Jui Hsieh, Alexander Min
General AI
Reinforcement Learning (RL) has shown strong potential for optimizing search agents in complex information retrieval tasks. However, existing approaches predominantly rely on gold supervision, such as ground-truth answers, which is difficult to scale. To address this limitation, we propose Cycle-Consistent Search (CCS)…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-16 · Zihan Liang, Yufei Ma, Ben Chen, Zhipeng Qian, Huangyu Dai, Lingtao Mao, Xuxin Zhang, Chenyi Lei, Wenwu Ou
General AI
Reinforcement learning has emerged as an effective paradigm for training large language models to perform search-augmented reasoning. However, existing approaches rely on trajectory-level rewards that cannot distinguish precise search queries from vague or redundant ones within a rollout group, and collapse to a near-z…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-16 · Huanran Hu, Zihui Ren, Dingyi Yang, Liangyu Chen, Qixiang Gao, Tiezheng Ge, Qin Jin
General AI
Real-world video creation often involves a complex reasoning workflow of selecting relevant shots from noisy materials, planning missing shots for narrative completeness, and organizing them into coherent storylines. However, existing benchmarks focus on isolated sub-tasks and lack support for evaluating this full proc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-16 · Hao Gao, Shaoyu Chen, Yifan Zhu, Yuehao Song, Wenyu Liu, Qian Zhang, Xinggang Wang
General AI
High-level autonomous driving requires motion planners capable of modeling multimodal future uncertainties while remaining robust in closed-loop interactions. Although diffusion-based planners are effective at modeling complex trajectory distributions, they often suffer from stochastic instabilities and the lack of cor…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-20 · Andrew Zhang, Tong Ding, Sophia J. Wagner, Caiwei Tian, Ming Y. Lu, Rowland Pettit, Joshua E. Lewis, Alexandre Misrahi, Dandan Mo, Long Phi Le, Faisal Mahmood
General AI
Modern medicine generates vast multimodal data across siloed systems, yet no existing model integrates the full breadth and temporal depth of the clinical record into a unified patient representation. We introduce Apollo, a multimodal temporal foundation model trained and evaluated on over three decades of longitudinal…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-22 · Yuxuan Cai, Jie Zhou, Qin Chen, Liang He
Research Track A · General AI
Online lifelong learning enables agents to accumulate experience across interactions and continually improve on long-horizon tasks. However, existing methods typically treat retrieval from past experience as a passive operation, triggering it only at task initialization or after completing a step. Consequently, agents …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-22 · Inclusion AI, Tiwei Bie, Haoxing Chen, Tieyuan Chen, Zhenglin Cheng, Long Cui, Kai Gan, Zhicheng Huang, Zhenzhong Lan, Haoquan Li, Jianguo Li, Tao Lin, Qi Qin, Hongjun Wang, Xiaomei Wang, Haoyuan Wu, Yi Xin, Junbo Zhao
General AI
We present LLaDA2.0-Uni, a unified discrete diffusion large language model (dLLM) that supports multimodal understanding and generation within a natively integrated framework. Its architecture combines a fully semantic discrete tokenizer, a MoE-based dLLM backbone, and a diffusion decoder. By discretizing continuous vi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-22 · Qiguang Chen, Chengyu Luan, Jiajun Wu, Qiming Yu, Yi Yang, Yizhuo Li, Jingqi Tong, Xiachong Feng, Libo Qin, Wanxiang Che
General AI
Large vision-language models (LVLMs) have made substantial advances in reasoning tasks at the Olympiad level. Nevertheless, current Olympiad-level multimodal reasoning benchmarks for these models often emphasize single-image analysis and fail to exploit contextual information across multiple images. We present OMIBench…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-22 · Dongding Lin, Jian Wang, Yongqi Li, Wenjie Li
General AI
Situated conversational recommendation (SCR), which utilizes visual scenes grounded in specific environments and natural language dialogue to deliver contextually appropriate recommendations, has emerged as a promising research direction due to its close alignment with real-world scenarios. Compared to traditional reco…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-23 · Praval Sharma
General AI
Event extraction is essential for event understanding and analysis. It supports tasks such as document summarization and decision-making in emergency scenarios. However, existing event extraction approaches have limitations: (1) closed-domain algorithms are restricted to predefined event types and thus rarely generaliz…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-23 · Chee Wei Tan, Yuchen Wang, Shangxin Guo
General AI
This paper introduces a new paradigm for AI game programming, leveraging large language models (LLMs) to extend and operationalize Claude Shannon's taxonomy of game-playing machines. Central to this paradigm is Nemobot, an interactive agentic engineering environment that enables users to create, customize, and deploy L…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-24 · Jinghong Chen, Jingbiao Mei, Guangyu Yang, Bill Byrne
General AI
A common approach to question answering with retrieval-augmented generation (RAG) is to concatenate documents into a single context and pass it to a language model to generate an answer. While simple, this strategy can obscure the contribution of individual documents, making attribution difficult and contributing to th…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-05-12 · Di Wu, Zixiang Ji, Asmi Kawatkar, Bryan Kwan, Jia-Chen Gu, Nanyun Peng, Kai-Wei Chang
Research Track B · General AI
Long-term memory is crucial for agents in specialized web environments, where success depends on recalling interface affordances, state dynamics, workflows, and recurring failure modes. However, existing memory benchmarks for agents mostly focus on user histories, short traces, or downstream task success, leaving open …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-05-12 · Haiwen Diao, Penghao Wu, Hanming Deng, Jiahao Wang, Shihao Bai, Silei Wu, Weichen Fan, Wenjie Ye, Wenwen Tong, Xiangyu Fan, Yan Li, Yubo Wang, Zhijie Cao, Zhiqian Lin, Zhitao Yang, Zhongang Cai, Yuwei Niu, Yue Zhu, Bo Liu, Chengguang Lv, Haojia Yu, Haozhe Xie, Hongli Wang, Jianan Fan, Jiaqi Li, Jiefan Lu, Jingcheng Ni, Junxiang Xu, Kaihuan Liang, Lianqiang Shi, Linjun Dai, Linyan Wang, Oscar Qian, Peng Gao, Pengfei Liu, Qingping Sun, Rui Shen, Ruisi Wang, Shengnan Ma, Shuang Yang, Siyi Xie, Siying Li, Tianbo Zhong, Xiangli Kong, Xuanke Shi, Yang Gao, Yongqiang Yao, Yves Wang, Zhengqi Bai, Zhengyu Lin, Zixin Yin, Wenxiu Sun, Ruihao Gong, Quan Wang, Lewei Lu, Lei Yang, Ziwei Liu, Dahua Lin
General AI
Recent large vision-language models (VLMs) remain fundamentally constrained by a persistent dichotomy: understanding and generation are treated as distinct problems, leading to fragmented architectures, cascaded pipelines, and misaligned representation spaces. We argue that this divide is not merely an engineering arti…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.2
2026-04-29 · Fei Bai, Huatong Song, Shuang Sun, Daixuan Cheng, Yike Yang, Chuan Hao, Renyuan Li, Feng Chang, Yuan Wei, Ran Tao, Bryan Dai, Jian Yang, Wayne Xin Zhao
General AI
Claw-style environments support multi-step workflows over local files, tools, and persistent workspace states. However, scalable development around these environments remains constrained by the absence of a systematic framework, especially one for synthesizing verifiable training data and integrating it with agent trai…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.2
2026-04-29 · Gongbo Zhang, Wen Wang, Ye Tian, Li Yuan
General AI
Diffusion large language models (dLLMs) offer parallel decoding and bidirectional context, but state-of-the-art dLLMs require billions of parameters for competitive performance. While existing distillation methods for dLLMs reduce inference steps within a single architecture, none address cross-architecture knowledge t…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.2
2026-04-30 · Neemias B da Silva, Rodrigo Minetto, Daniel Silver, Thiago H Silva
General AI
Large Language Models (LLMs) are increasingly used as proxies for human perception in urban analysis, yet it remains unclear whether persona prompting produces meaningful and reproducible behavioral diversity. We investigate whether distinct personas influence urban sentiment judgments generated by multimodal LLMs. Usi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.2
2026-05-01 · Ziyang Huang, Yi Cao, Ali K. Shargh, Jing Luo, Ruidong Mei, Mohd Zaki, Zhan Liu, Wyatt Bunstine, William Jurayj, Somdatta Goswami, Tyrel McQueen, Michael Shields, Jaafar El-Awady, Paulette Clancy, Benjamin Van Durme, Nicholas Andrews, William Walden, Daniel Khashabi
General AI
Large language models are increasingly deployed as autonomous coding agents and have achieved remarkably strong performance on software engineering benchmarks. However, it is unclear whether such success transfers to computational scientific workflows, where tasks require not only strong coding ability, but also the ab…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.2
2026-05-01 · Xihao Chen, Yangyang Guo, Roger Zimmermann
General AI
Key-Value (KV) cache has become a de facto component of modern Large Vision-Language Models (LVLMs) for inference. While it enhances decoding efficiency in Large Language Models (LLMs), its direct adoption in LVLMs introduces substantial GPU memory overhead due to the large number of vision tokens processed during the …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.0
2026-03-17 · Jian Yang, Wei Zhang, Shawn Guo, Zhengmao Ye, Lin Jing, Shark Liu, Yizhi Li, Jiajun Wu, Cening Liu, X. Ma, Yuyang Song, Siwei Wu, Yuwen Li, L. Liao, T. Zheng, Ziling Huang, Zelong Huang, Che Liu, Yan Xing, Renyuan Li, Qingsong Cai, Hanxu Yan, Siyue Wang, Shikai Li, Jason Klein Liu, An Huang, Yongsheng Kang, Jinxing Zhang, Chuan Hao, Haowen Wang, Weicheng Gu, Ran Tao, Mingjie Tang, Peihao Wu, Jianzhou Wang, Xianglong Liu, Weifeng Lv, Bryan Dai
General AI
In this report, we introduce the IQuest-Coder-V1 series-(7B/14B/40B/40B-Loop), a new family of code large language models (LLMs). Moving beyond static code representations, we propose the code-flow multi-stage training paradigm, which captures the dynamic evolution of software logic through different phases of the pipe…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.0
2026-03-26 · Yicheng Zou, Dongsheng Zhu, Lin Zhu, Tong Zhu, Yunhua Zhou, Peiheng Zhou, Xinyu Zhou, Dongzhan Zhou, Zhiwang Zhou, Yuhao Zhou, Bowen Zhou, Zhanping Zhong, Zhijie Zhong, Haiteng Zhao, Penghao Zhao, Xiaomeng Zhao, Zhiyuan Zhao, Yechen Zhang, Jin Zhang, Wenwei Zhang, Hongjie Zhang, Zhuo Zhang, Wenlong Zhang, Bo Zhang, Chao Zhang, Chen Zhang, Yuhang Zang, Fei Yuan, Jiakang Yuan, Jiashuo Yu, Jinhui Yin, Haochen Ye, Qian Yao, Bowen Yang, Danni Yang, Kaichen Yang, Ziang Yan, Jun Xu, Yicheng Xu, Wanghan Xu, Xuenan Xu, Chao Xu, Ruiliang Xu, Shuhao Xing, Long Xing, Xinchen Xie, Ling-I Wu, Zijian Wu, Zhenyu Wu, Lijun Wu, Yue Wu, Jianyu Wu, Wen Wu, Fan Wu, Xilin Wei, Qi Wei, Bingli Wang, Rui Wang, Ziyi Wang, Zun Wang, Yi Wang, Haomin Wang, Yizhou Wang, Lintao Wang, Yiheng Wang, Longjiang Wang, Bin Wang, Jian Tong, Zhongbo Tian, Huanze Tang, Chen Tang, Shixiang Tang, Yu Sun, Qiushi Sun, Xuerui Su, Qisheng Su, Chenlin Su, Demin Song, Jin Shi, Fukai Shang, Yuchen Ren, Pengli Ren, Xiaoye Qu, Yuan Qu, Jiantao Qiu, Yu Qiao, Runyu Peng, Tianshuo Peng, Jiahui Peng, Qizhi Pei, Zhuoshi Pan, Linke Ouyang, Wenchang Ning, Yichuan Ma, Zerun Ma, Ningsheng Ma, Runyuan Ma, Chengqi Lyu, Haijun Lv, Han Lv, Lindong Lu, Kuikun Liu, Jiangning Liu, Yuhong Liu, Kai Liu, Hongwei Liu, Zhoumianze Liu, Mengjie Liu, Ziyu Liu, Wenran Liu, Yang Liu, Liwei Liu, Kaiwen Liu, Junyao Lin, Junming Lin, Tianyang Lin, Dahua Lin, Jianze Liang, Linyang Li, Peiji Li, Zonglin Li, Zehao Li, Pengze Li, Guoyan Li, Lingkai Kong, Linglin Jing, Zhenjiang Jin, Feifei Jiang, Qian Jiang, Junhao Huang, Zixian Huang, Haian Huang, Zhouqi Hua, Han Hu, Linfeng Hou, Yinan He, Conghui He, Tianyao He, Xu Guo, Qipeng Guo, Aijia Guo, Yuzhe Gu, Lixin Gu, Jingyang Gong, Qiming Ge, Jiaye Ge, Songyang Gao, Jianfei Gao, Xinyu Fang, Caihua fan, Yue Fan, Yanhui Duan, Zichen Ding, Shengyuan Ding, Xuanlang Dai, Erfei Cui, Ganqu Cui, Pei Chu, Tao Chu, Guangran Cheng, Yu Cheng, Kai Chen, Yongkang Chen, Chiyu Chen, Guanzhou Chen, Qiaosheng Chen, Sitao Chen, Xin Chen, Haojiong Chen, Yicheng Chen, Weihan Cao, Yuhang Cao, Qinglong Cao, Lei Bai
General AI
We introduce Intern-S1-Pro, the first one-trillion-parameter scientific multimodal foundation model. Scaling to this unprecedented size, the model delivers a comprehensive enhancement across both general and scientific domains. Beyond stronger reasoning and image-text understanding capabilities, its intelligence is aug…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.0
2026-03-29 · Chongyang Zhao, Mingsong Li, Haodong Lu, Dong Gong
Research Track A · General AI
Multimodal Continual Instruction Tuning aims to continually enhance Large Vision Language Models (LVLMs) by learning from new data without forgetting previously acquired knowledge. Mixture of Experts (MoE) architectures naturally facilitate this by incrementally adding new experts and expanding routers while keeping th…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.0
2026-03-31 · Shuang Chen, Quanxin Shou, Hangting Chen, Yucheng Zhou, Kaituo Feng, Wenbo Hu, Yi-Fan Zhang, Yunlong Lin, Wenxuan Huang, Mingyang Song, Dasen Dai, Bolin Jiang, Manyuan Zhang, Shi-Xue Zhang, Zhengkai Jiang, Lucas Wang, Zhao Zhong, Yu Cheng, Nanyun Peng
General AI
Unified multimodal models provide a natural and promising architecture for understanding diverse and complex real-world knowledge while generating high-quality images. However, they still rely primarily on frozen parametric knowledge, which makes them struggle with real-world image generation involving long-tail and kn…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.0
2026-04-03 · Lei Song, Shihan Guan, Youyong Kong
Research Track A · General AI
Non-Exemplar Continual Graph Learning (NECGL) seeks to eliminate the privacy risks intrinsic to rehearsal-based paradigms by retaining solely class-level prototype representations rather than raw graph examples for mitigating catastrophic forgetting. However, this design choice inevitably precipitates feature drift. As…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.0
2026-04-04 · Yuqi Zhu, Jintian Zhang, Zhenjie Wan, Yujie Luo, Shuofei Qiao, Zhengke Gui, Da Zheng, Lei Liang, Huajun Chen, Ningyu Zhang
General AI
Large language models (LLMs) excel at complex reasoning, yet their efficiency is limited by the surging cognitive overhead of long thought traces. In this paper, we propose LightThinker, a method that enables LLMs to dynamically compress intermediate thoughts into compact semantic representations. However, static compr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.0
2026-04-06 · Seoyoung Park, Haemin Lee, Hankook Lee
Research Track A · General AI
Task-free online continual learning has recently emerged as a realistic paradigm for addressing continual learning in dynamic, real-world environments, where data arrive in a non-stationary stream without clear task boundaries and can only be observed once. To consider such challenging scenarios, many recent approaches…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.0
2026-04-14 · Yifei Yan, Linqi Ye
Research Track A · General AI
As reinforcement learning for humanoid robots evolves from single-task to multi-skill paradigms, efficiently expanding new skills while avoiding catastrophic forgetting has become a key challenge in embodied intelligence. Existing approaches either rely on complex topology adjustments in Mixture-of-Experts (MoE) models…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.0
2026-04-22 · Saish Sachin Shinde
Research Track A · General AI
We present SCM (Sleep-Consolidated Memory), a research preview of a memory architecture for large language models that draws on neuroscientific principles to address a fundamental limitation in current systems: the absence of persistent, structured, and biologically plausible memory. Existing approaches rely on truncat…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.9
2026-04-29 · Aditya A. Ramesh, Alex Lewandowski, Jürgen Schmidhuber
Research Track A · General AI
Continual learning agents with finite capacity must balance acquiring new knowledge with retaining the old. This requires controlled forgetting of knowledge that is no longer needed, freeing up capacity to learn. Weight decay, viewed as a mechanism for forgetting, can serve this role by gradually discarding information…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.8
2026-03-30 · Songjun Tu, Chengdong Xu, Qichao Zhang, Yaocheng Zhang, Xiangyuan Lan, Linjing Li, Dongbin Zhao
General AI
Agentic reinforcement learning (RL) can benefit substantially from reusable experience, yet existing skill-based methods mainly extract trajectory-level guidance and often lack principled mechanisms for maintaining an evolving skill memory. We propose D2Skill, a dynamic dual-granularity skill bank for agentic RL that o…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.8
2026-03-31 · Davide Di Gioia
General AI
Current autonomous AI agents, driven primarily by Large Language Models (LLMs), operate in a state of cognitive weightlessness: they process information without an intrinsic sense of network topology, temporal pacing, or epistemic limits. Consequently, heuristic agentic loops (e.g., ReAct) can exhibit failure modes in …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.8
2026-04-02 · Payal Fofadiya, Sunil Tiwari
Research Track A · General AI
Long-horizon conversational agents require persistent memory for coherent reasoning, yet uncontrolled accumulation causes temporal decay and false memory propagation. Benchmarks such as LOCOMO and LOCCO report performance degradation from 0.455 to 0.05 across stages, while MultiWOZ shows 78.2% accuracy with 6.8% false …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.8
2026-04-02 · Xueying Li, Feng Lyu, Hao Wu, Mingliu Liu, Jia-Nan Liu, Guozi Liu
General AI
Training-free Vision-Language Navigation (VLN) agents powered by foundation models can follow instructions and explore 3D environments. However, existing approaches rely on greedy frontier selection and passive spatial memory, leading to inefficient behaviors such as local oscillation and redundant revisiting. We argue…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.8
2026-04-06 · Sixun Dong, Juhua Hu, Steven Li, Wei Wen, Qi Qian
General AI
Most vision-language models (VLMs) apply a large language model (LLM) as the decoder, where the response tokens are generated sequentially through autoregression. Therefore, the number of output tokens can be the bottleneck of the end-to-end latency. However, different models may require vastly different numbers of out…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.8
2026-04-09 · Yifang Wang, Rui Sheng, Erzhuo Shao, Yifan Qian, Haotian Li, Nan Cao, Dashun Wang
General AI
Large language models (LLMs) are transforming scientific workflows, not only through their generative capabilities but also through their emerging ability to use tools, reason about data, and coordinate complex analytical tasks. Yet in most human-AI collaborations, the primary outputs, figures, are still treated as sta…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.8
2026-04-28 · Qianqian Chen, Anglin Liu, Jingyang Zhang, Yudong Zhang
Research Track A · General AI
Accurate brain lesion segmentation in MRI is vital for effective clinical diagnosis and treatment planning. Due to high annotation costs and strict data privacy regulations, universal models require employing Continual Learning (CL) to adapt to evolving clinical tasks without losing previously acquired knowledge. Howev…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.8
2026-05-07 · Hyeongwon Kang, Jeongseob Kim, Jinwoo Park, Pilsung Kang
General AI
Recent studies have explored large language models for time-series anomaly detection, yet existing approaches often rely on a single general-purpose model to directly infer anomaly indices or intervals, limiting controllability, interpretability, and reliability for complex anomaly patterns. We propose SAGE (Specialize…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.8
2026-05-07 · Isaac David, Arthur Gervais
General AI
Security updates create a short but important window in which defenders and attackers can compare vulnerable and patched software. Yet in many operational settings, the most accessible artifacts are binary packages rather than source patches or advisory text. This paper asks whether a language-model agent, restricted t…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.8
2026-05-07 · Zeyu Yang, Qi Ma, Jason Chen, Anshumali Shrivastava
General AI
Retrieval-augmented agents are increasingly the interface to large organizational knowledge bases, yet most still treat retrieval as a black box: they issue exploratory queries, inspect returned snippets, and iteratively reformulate until useful evidence emerges. This approach resembles how a newcomer searches an unfam…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.5
2026-04-08 · Ziqiao Ma, Xueyang Yu, Haoyu Zhen, Yuncong Yang, Joyce Chai, Chuang Gan
Research Track A · General AI
Large Chunk Test-Time Training (LaCT) has shown strong performance on long-context 3D reconstruction, but its fully plastic inference-time updates remain vulnerable to catastrophic forgetting and overfitting. As a result, LaCT is typically instantiated with a single large chunk spanning the full input sequence, falling…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.5
2026-04-13 · Peng Yuan, Yuyang Yin, Yuxuan Cai, Zheng Wei
Research Track B · General AI
Existing browser agent benchmarks face a fundamental trilemma: real-website benchmarks lack reproducibility due to content drift, controlled environments sacrifice realism by omitting real-web noise, and both require costly manual curation that limits scalability. We present WebForge, the first fully automated framewor…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.5
2026-04-14 · Linhao Yu, Tianmeng Yang, Siyu Ding, Renren Jin, Naibin Gu, Xiangzhao Hao, Shuaiyi Nie, Deyi Xiong, Weichong Yin, Yu Sun, Hua Wu
General AI
RLVR improves reasoning in large language models, but its effectiveness is often limited by severe reward sparsity on hard problems. Recent hint-based RL methods mitigate sparsity by injecting partial solutions or abstract templates, yet they typically scale guidance by adding more tokens, which introduce redundancy, i…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.5
2026-04-14 · NVIDIA, Aakshita Chandiramani, Aaron Blakeman, Abdullahi Olaoye, Abhibha Gupta, Abhilash Somasamudramath, Abhinav Khattar, Adeola Adesoba, Adi Renduchintala, Adil Asif, Aditya Agrawal, Aditya Vavre, Ahmad Kiswani, Aishwarya Padmakumar, Ajay Hotchandani, Akanksha Shukla, Akhiad Bercovich, Aleksander Ficek, Aleksandr Shaposhnikov, Alex Gronskiy, Alex Kondratenko, Alex Neefus, Alex Steiner, Alex Yang, Alexander Bukharin, Alexander Young, Ali Hatamizadeh, Ali Taghibakhshi, Alina Galiautdinova, Alisa Liu, Alok Kumar, Ameya Sunil Mahabaleshwarkar, Amir Klein, Amit Zuker, Amnon Geifman, Anahita Bhiwandiwalla, Ananth Subramaniam, Andrew Tao, Anjaney Shrivastava, Anjulie Agrusa, Ankur Srivastava, Ankur Verma, Ann Guan, Anna Shors, Annamalai Chockalingam, Anubhav Mandarwal, Aparnaa Ramani, Arham Mehta, Arti Jain, Arun Venkatesan, Asha Anoosheh, Ashwath Aithal, Ashwin Poojary, Asif Ahamed, Asit Mishra, Asli Sabanci Demiroz, Asma Kuriparambil Thekkumpate, Atefeh Sohrabizadeh, Avinash Kaur, Ayush Dattagupta, Barath Subramaniam Anandan, Bardiya Sadeghi, Barnaby Simkin, Ben Lanir, Benedikt Schifferer, Benjamin Chislett, Besmira Nushi, Bilal Kartal, Bill Thiede, Bita Darvish Rouhani, Bobby Chen, Boris Ginsburg, Brandon Norick, Branislav Kisacanin, Brian Yu, Bryan Catanzaro, Buvaneswari Mani, Carlo del Mundo, Chankyu Lee, Chanran Kim, Chantal Hwang, Chao Ni, Charles Wang, Charlie Truong, Cheng-Ping Hsieh, Chenhan Yu, Chenjie Luo, Cherie Wang, Chetan Mungekar, Chintan Patel, Chris Alexiuk, Chris Holguin, Chris Wing, Christian Munley, Christopher Parisien, Chuck Desai, Chunyang Sheng, Collin Neale, Cyril Meurillon, Dakshi Kumar, Dan Gil, Dan Su, Dane Corneil, Daniel Afrimi, Daniel Burkhardt Eliuth Triana, Daniel Egert, Daniel Fatade, Daniel Lo, Daniel Rohrer, Daniel Serebrenik, Daniil Sorokin, Daria Gitman, Daria Levy, Darko Stosic, David Edelsohn, David Messina, David Mosallanezhad, David Tamok, Deena Donia, Deepak Narayanan, Devin O'Kelly, Dheeraj Peri, Dhruv Nathawani, Di Wu, Dima Rekesh, Dina Yared, Divyanshu Kakwani, Dmitry Konyagin Brandon Tuttle, Dong Ahn, Dongfu Jiang, Dorrin Poorkay, Douglas O'Flaherty, Duncan Riach, Dusan Stosic, Dustin Van Stee, Edgar Minasyan, Edward Lin, Eileen Peters Long, Elad Segal, Elena Lantz, Elena Lewis, Ellie Evans, Elliott Ning, Eric Chung, Eric Harper, Eric Pham-Hung, Eric W. Tramel, Erick Galinkin, Erik Pounds, Esti Etrog, Evan Briones, Evan Wu, Evelina Bakhturina, Evgeny Tsykunov, Ewa Dobrowolska, Farshad Saberi Movahed, Farzan Memarian, Fay Wang, Fei Jia, Felipe Soares, Felipe Vieira Frujeri, Feng Chen, Fengguang Lin, Ferenc Galko, Fortuna Zhang, Frankie Siino, Frida Hou, Gantavya Bhatt, Gargi Prasad, Geethapriya Venkataramani, Geetika Gupta, George Armstrong, Gerald Shen, Giulio Borghesi, Gordana Neskovic, Gorkem Batmaz, Grace Lam, Grace Wu, Greg Pauloski, Greyson Davis, Grigor Nalbandyan, Guoming Zhang, Guy Farber, Guyue Huang, Haifeng Qian, Haran Kumar Shiv Kumar, Harry Kim, Harsh Sharma, Hayate Iso, Hayley Ross, Herbert Hum, Herman Sahota, Hexin Wang, Himanshu Soni, Hiren Upadhyay, Huy Nguyen, Iain Cunningham, Ido Galil, Ido Shahaf, Igino Padovani, Igor Gitman, Igor Shovkun, Ikroop Dhillon, Ilya Loshchilov, Ingrid Kelly, Itamar Schen, Itay Levy, Ivan Moshkov, Izik Golan, Izzy Putterman, Jain Tu, Jan Baczek, Jan Kautz, Jane Polak Scowcroft, Janica Rosenberg, Jared Casper, Jarrod Pflum, Jason Grant, Jason Sewall, Jatin Mitra, Jeffrey Glick, Jenny Chen, Jesse Oliver, Jiacheng Xu, Jiafan Zhu, Jialin Song, Jian Zhang, Jiaqi Zeng, Jie Lou, Jill Milton, Jim Chow, Jimmy Zhang, Jinhang Choi, Jining Huang, Jocelyn Huang, Joel Caruso, Joey Conway, Joey Guman, Johan Jatko, John Kamalu, Johnny Greco, Jonathan Cohen, Jonathan Raiman, Joseph Jennings, Joyjit Daw, Juan Yu, Julio Tapia, Junkeun Yi, Jupinder Parmar, Jyothi Achar, Kari Briski, Kartik Mattoo, Katherine Cheung, Katherine Luna, Keith Wyss, Kevin Shih, Kezhi Kong, Khanh Nguyen, Khushi Bhardwaj, Kirill Buryak, Kirthi Shankar Sivamani, Konstantinos Krommydas, Kris Murphy, Krishna C. Puvvada, Krzysztof Pawelec, Kumar Anik, Laikh Tewari, Laya Sleiman, Leo Du, Leon Derczynski, Li Ding, Lilach Ilan, Lingjie Wu, Lizzie Wei, Luis Vega, Lun Su, Maarten Van Segbroeck, Maer Rodrigues de Melo, Magaret Zhang, Mahan Fathi, Makesh Narsimhan Sreedhar, Makesh Sreedhar, Makesh Tarun Chandran, Manuel Reyes Gomez, Maor Ashkenazi, Marc Cuevas, Marc Romeijn, Margaret Zhang, Mark Cai, Mark Gabel, Markus Kliegl, Martyna Patelka, Maryam Moosaei, Matthew Varacalli, Matvei Novikov, Mauricio Ferrato, Mehrzad Samadi, Melissa Corpuz, Meng Xin, Mengdi Wang, Mengru Wang, Meredith Price, Micah Schaffer, Michael Andersch, Michael Boone, Michael Evans, Michael Z Wang, Miguel Martinez, Mikail Khona, Mike Chrzanowski, Mike Hollinger, Mingyuan Ma, Minseok Lee, Mohammad Dabbah, Mohammad Shoeybi, Mostofa Patwary, Nabin Mulepati, Nader Khalil, Najeeb Nabwani, Nancy Agarwal, Nanthini Balasubramaniam, Narimane Hennouni, Narsi Kodukula, Natalie Hereth, Nathaniel Pinckney, Nave Assaf, Negar Habibi, Nestor Qin, Neta Zmora, Netanel Haber, Nick Reamaroon, Nickson Quak, Nidhi Bhatia, Nikhil Jukar, Nikki Pope, Nikolai Ludwig, Nima Tajbakhsh, Nir Ailon, Nirmal Juluru, Nirmalya De, Nowel Pitt, Oleg Rybakov, Oleksii Hrinchuk, Oleksii Kuchaiev, Olivier Delalleau, Oluwatobi Olabiyi, Omer Ullman Argov, Omri Almog, Omri Puny, Oren Tropp, Otavio Padovani, Ouye Xie, Parth Chadha, Pasha Shamis, Paul Gibbons, Pavlo Molchanov, Peter Belcak, Peter Jin, Pinky Xu, Piotr Januszewski, Pooya Jannaty, Prachi Shevate, Pradeep Thalasta, Pranav Prashant Thombre, Prasoon Varshney, Prerana Gambhir, Pritam Gundecha, Przemek Tredak, Qing Miao, Qiyu Wan, Quan Tran Minh, Rabeeh Karimi Mahabadi, Rachel Oberman, Rachit Garg, Rahul Kandu, Raina Zhong, Ran El-Yaniv, Ran Zilberstein, Rasoul Shafipour, Renee Yao, Renjie Pi, Richard Mazzarese, Richard Wang, Rick Izzo, Ridhima Singla, Rima Shahbazyan, Rishabh Garg, Ritika Borkar, Ritu Gala, Riyad Islam, Robert Clark, Robert Hesse, Roger Waleffe, Rohit Varma Kalidindi, Rohit Watve, Roi Koren, Ron Fan, Ruchika Kharwar, Ruisi Cai, Ruoxi Zhang, Russell J. Hewett, Ryan Prenger, Ryan Timbrook, Ryota Egashira, Sadegh Mahdavi, Sagar Singh Ashutosh Joshi, Sahil Modi, Samuel Kriman, Sandeep Pombra, Sanjay Kariyappa, Sanjeev Satheesh, Santiago Pombo, Saori Kaji, Satish Pasumarthi, Saurav Mishra, Saurav Muralidharan, Scott Hara, Sean Narenthiran, Sebastian Rogawski, Seonjin Na, Seonmyeong Bak, Sepehr Sameni, Seth Poulos, Shahar Mor, Shantanu Acharya, Shaona Ghosh Adam Lord, Sharath Turuvekere Sreenivas, Shaun Kotek, Shaya Gharghabi, Shelby Thomas, Sheng-Chieh Lin, Shibani Likhite, Shiqing Fan, Shiyang Chen, Shreya Gopal, Shrimai Prabhumoye, Shubham Pachori, Shubham Toshniwal, Shuo Zhang, Shuoyang Ding, Shyam Renjith, Shyamala Prayaga, Siddhartha Jain, Simeng Sun, Sirisha Rella, Sirshak Das, Smita Ithape, Sneha Harishchandra S, Somshubra Majumdar, Soumye Singhal, Sri Harsha Singudasu, Sriharsha Niverty, Stas Sergienko, Stefana Gloginic, Stefania Alborghetti, Stephen Ge, Stephen McCullough, Sugam Dipak Devare, Suguna Varshini Velury, Sukrit Rao, Sumeet Kumar Barua, Sunny Gai, Suseella Panguluri, Sushil Koundinyan, Swathi Patnam, Sweta Priyadarshi, Swetha Bhendigeri, Syeda Nahida Akter, Sylendran Arunagiri, Tailling Yuan, Talor Abramovich, Tan Bui, Tan Yu, Terry Kong, Thanh Do, Thomas Gburek, Thorgane Marques, Tiffany Moore, Tijmen Blankevoort, Tim Moon, Timothy Ma, Tiyasa Mitra, Tomasz Grzegorzek, Tomer Asida, Tomer Bar Natan, Tomer Keren, Tomer Ronen, Traian Rebedea, Trenton Starkey, Tugrul Konuk, Twinkle Vashishth, Tyler Condensa, Udi Karpas, Ushnish De, Vahid Noorozi, Vahid Noroozi, Vanshil Atul Shah, Veena Vaidyanathan, Venkat Srinivasan, Venmugil Elango, Victor Cui, Vijay Korthikanti, Vikas Mehta, Virginia Adams, Virginia Wu, Vitaly Kurin, Vitaly Lavrukhin, Vladimir Anisimov, Wan Seo, Wanli Jiang, Wasi Uddin Ahmad, Wei Du, Wei Ping, Wei-Ming Chen, Wendy Quan, Wenliang Dai, Wenwen Gao, Will Jennings, William Zhang, Xiaowei Ren, Xiaowen Xin, Xin Li, Yang Yu, Yangyi Chen, Yaniv Galron, Yashaswi Karnati, Yejin Choi, Yev Meyer, Yi-Fu Wu, Yian Zhang, Ying Lin, Yonatan Geifman, Yonggan Fu, Yoshi Suhara, Youngeun Kwon, Yuan Zhang, Yuki Huang, Zach Moshe, Zhilin Wang, Zhiyu Cheng, Zhongbo Zhu, Zhuolin Yang, Zihan Liu, Zijia Chen, Zijie Yan, Zuhair Ahmed
General AI
We describe the pre-training, post-training, and quantization of Nemotron 3 Super, a 120 billion (active 12 billion) parameter hybrid Mamba-Attention Mixture-of-Experts model. Nemotron 3 Super is the first model in the Nemotron 3 family to 1) be pre-trained in NVFP4, 2) leverage LatentMoE, a new Mixture-of-Experts arch…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.5
2026-04-15 · Muhammad Ahmed Ullah Khan, Muhammad Haris Bin Amir, Didier Stricker, Muhammad Zeshan Afzal
Research Track A · General AI
Continual learning enables models to acquire new knowledge over time while retaining previously learned capabilities. However, its application to text-to-3D generation remains unexplored. We present ReConText3D, the first framework for continual text-to-3D generation. We first demonstrate that existing text-to-3D model…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.5
2026-04-16 · Bowen Ping, Zijun Chen, Tingfeng Hui, Qize Yu, Chenxuan Li, Junchi Yan, Baobao Chang
General AI
Reinforcement Learning (RL) has emerged as a critical driver for enhancing the reasoning capabilities of Large Language Models (LLMs). While recent advancements have focused on reward engineering or data synthesis, few studies exploit the model's intrinsic representation characteristics to guide the training process. I…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.5
2026-04-20 · Martiño Ríos-García, Nawaf Alampara, Chandan Gupta, Indrajeet Mandal, Sajid Mannan, Ali Asghar Aghajani, N. M. Anoop Krishnan, Kevin Maik Jablonka
General AI
Large language model (LLM)-based systems are increasingly deployed to conduct scientific research autonomously, yet whether their reasoning adheres to the epistemic norms that make scientific inquiry self-correcting is poorly understood. Here, we evaluate LLM-based scientific agents across eight domains, spanning workf…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.5
2026-04-20 · Guanting Dong, Junting Lu, Junjie Huang, Wanjun Zhong, Longxiang Liu, Shijue Huang, Zhenyu Li, Yang Zhao, Xiaoshuai Song, Xiaoxi Li, Jiajie Jin, Yutao Zhu, Hanbin Wang, Fangyu Lei, Qinyu Luo, Mingyang Chen, Zehui Chen, Jiazhan Feng, Ji-Rong Wen, Zhicheng Dou
General AI
Large language models are increasingly expected to serve as general-purpose agents that interact with external, stateful tool environments. The Model Context Protocol (MCP) and broader agent skills offer a unified interface for connecting agents with scalable real-world services, but training robust agents remains limi…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.5
2026-04-21 · Venus Team, Sunhao Dai, Yong Deng, Jinzhen Lin, Yusheng Song, Guoqing Wang, Xiaofeng Wu, Yuqi Zhou, Shuo Yang, Zhenzhe Ying, Zhanwei Zhang, Changhua Meng, Weiqiang Wang
General AI
Edge-scale deep research agents based on small language models are attractive for real-world deployment due to their advantages in cost, latency, and privacy. In this work, we study how to train a strong small deep research agent under limited open-data by improving both data quality and data utilization. We present DR…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.5
2026-04-25 · Víctor Gallego
General AI
Can large language model agents discover hidden safety objectives through experience alone? We introduce EPO-Safe (Experiential Prompt Optimization for Safe Agents), a framework where an LLM iteratively generates action plans, receives sparse binary danger warnings, and evolves a natural language behavioral specificati…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.5
2026-05-12 · Zhong Guan, Yongjian Guo, Haoran Sun, Wen Huang, Shuai Di, Xiong Jun Wu, Likang Wu, Hongke Zhao
General AI
Asynchronous reinforcement learning improves rollout throughput for large language model agents by decoupling sample generation from policy optimization, but it also introduces a critical failure mode for PPO-style off-policy correction. In heterogeneous training systems, the total importance ratio should ideally be de…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.4
2026-04-29 · Xingwei Tan, Marco Valentino, Mahmud Elahi Akhter, Yuxiang Zhou, Maria Liakata, Nikolaos Aletras
General AI
Large Language Models (LLMs) are known to acquire reasoning capabilities through shared inference patterns in pre-training data, which are further elicited via Chain-of-Thought (CoT) practices. However, whether fundamental reasoning patterns, such as induction, deduction, and abduction, can be decoupled from specific p…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.4
2026-05-04 · Haixin Wang, Hejie Cui, Chenwei Zhang, Xin Liu, Shuowei Jin, Shijie Geng, Xinyang Zhang, Nasser Zalmout, Zhenyu Shi, Yizhou Sun
General AI
Recent progress in multi-turn reinforcement learning (RL) has significantly improved reasoning LLMs' performances on complex interactive tasks. Despite advances in stabilization techniques such as fine-grained credit assignment and trajectory filtering, instability remains pervasive and often leads to training collapse…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-13 · Junlin Liu, Shengnan An, Shuang Zhou, Dan Ma, Shixiong Luo, Ying Xie, Yuan Zhang, Wenling Yuan, Yifan Zhou, Xiaoyu Li, Ziwen Wang, Xuezhi Cao, Xunliang Cai
General AI
Contemporary large language models (LLMs) have demonstrated remarkable reasoning capabilities, particularly in specialized domains like mathematics and physics. However, their ability to generalize these reasoning skills to more general and broader contexts--often termed general reasoning--remains under-explored. Unlik…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-14 · Ya-Qi Yu, Fangyu Hong, Xiangyang Qu, Hao Wang, Gaojie Wu, Qiaoyu Luo, Nuo Xu, Huixin Wang, Wuheng Xu, Yongxin Liao, Zihao Chen, Haonan Li, Ziming Li, Dezhi Peng, Minghui Liao, Jihao Wu, Haoyu Ren, Dandan Tu
General AI
The effectiveness of Direct Preference Optimization (DPO) depends on preference data that reflect the quality differences that matter in multimodal tasks. Existing pipelines often rely on off-policy perturbations or coarse outcome-based signals, which are not well suited to fine-grained visual reasoning. We propose rDP…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-15 · Zhuofeng Li, Yi Lu, Dongfu Jiang, Haoxiang Zhang, Yuyang Bai, Chuan Li, Yu Wang, Shuiwang Ji, Jianwen Xie, Yu Zhang
Research Track A · General AI
The rapid rise in AI conference submissions has driven increasing exploration of large language models (LLMs) for peer review support. However, LLM-based reviewers often generate superficial, formulaic comments lacking substantive, evidence-grounded feedback. We attribute this to the underutilization of two key compone…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-16 · Emanuel Tewolde, Xiao Zhang, David Guzman Piedrahita, Vincent Conitzer, Zhijing Jin
General AI
It is increasingly important that LLM agents interact effectively and safely with other goal-pursuing agents, yet, recent works report the opposite trend: LLMs with stronger reasoning capabilities behave _less_ cooperatively in mixed-motive games such as the prisoner's dilemma and public goods settings. Indeed, our exp…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-16 · Hatice Merve Vural, Doga Kukul, Ege Erdem Ozlu, Demir Ekin Arikan, Bob Mankoff, Erkut Erdem, Aykut Erdem
General AI
Humor is one of the few cognitive tasks where getting the reasoning right matters as much as getting the answer right. While recent work evaluates humor understanding on benchmarks such as the New Yorker Cartoon Caption Contest (NYCC), it largely treats it as black-box prediction, overlooking the structured reasoning p…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-17 · Yige Xu, Yongjie Wang, Zizhuo Wu, Kaisong Song, Jun Lin, Zhiqi Shen
General AI
Reasoning in vision-language models (VLMs) has recently attracted significant attention due to its broad applicability across diverse downstream tasks. However, it remains unclear whether the superior performance of VLMs stems from genuine vision-grounded reasoning or relies predominantly on the reasoning capabilities …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-17 · Van-Truong Le
General AI
The complexity of Vietnam's legal texts presents a significant barrier to public access to justice. While Large Language Models offer a promising solution for legal text simplification, evaluating their true capabilities requires a multifaceted approach that goes beyond surface-level metrics. This paper introduces a co…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-17 · Seulgi Kim, Mohit Prabhushankar, Ghassan AlRegib
General AI
Vision Language models (VLMs) have demonstrated strong performance across a wide range of benchmarks, yet they often suffer from modality dominance, where predictions rely disproportionately on a single modality. Prior approaches primarily address this issue by steering model's attention allocation, implicitly assuming…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-17 · Xu Huang, Weixin Mao, Yinhao Li, Hua Chen, Jiabao Zhao
General AI
Vision-Language-Action (VLA) models have demonstrated significant potential for embodied decision-making; however, their application in complex chemical laboratory automation remains restricted by limited long-horizon reasoning and the absence of persistent experience accumulation. Existing frameworks typically treat p…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-20 · Xinyu Ma, Mingzhou Xu, Xuebo Liu, Chang Jin, Qiang Wang, Derek F. Wong, Min Zhang
General AI
Recent advancements in Reinforcement Learning with Verifiable Rewards (RLVR) have significantly improved Large Language Model (LLM) reasoning, yet models often struggle to explore novel trajectories beyond their initial latent space. While offline teacher guidance and entropy-driven strategies have been proposed to add…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-21 · Yi Zhong, Buqiang Xu, Yijun Wang, Zifei Shan, Shuofei Qiao, Guozhou Zheng, Ningyu Zhang
General AI
At present, executable visual workflows have emerged as a mainstream paradigm in real-world industrial deployments, offering strong reliability and controllability. However, in current practice, such workflows are almost entirely constructed through manual engineering: developers must carefully design workflows, write …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-22 · Joyjit Roy, Samaresh Kumar Singh
General AI
Security Operations Centers (SOCs) increasingly encounter difficulties in correlating heterogeneous alerts, interpreting multi-stage attack progressions, and selecting safe and effective response actions. This study introduces AgentSOC, a multi-layered agentic AI framework that enhances SOC automation by integrating pe…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-27 · Sercan Karakaş, Yusuf Şimşek
General AI
This paper investigates whether source trustworthiness shapes Turkish evidential morphology and whether large language models (LLMs) track this sensitivity. We study the past-domain contrast between -DI and -mIs in controlled cloze contexts where the information source is overtly external, while only its perceived reli…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-27 · Aaron J. Li, Nicolas Sanchez, Hao Huang, Ruijiang Dong, Jaskaran Bains, Katrin Jaradeh, Zhen Xiang, Bo Li, Feng Liu, Aaron Kornblith, Bin Yu
General AI
Large language models (LLMs) are increasingly deployed, yet their outputs can be highly sensitive to routine, non-adversarial variation in how users phrase queries, a gap not well addressed by existing red-teaming efforts. We propose Green Shielding, a user-centric agenda for building evidence-backed deployment guidanc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-27 · Yunze Xiao, Vivienne J. Zhang, Chenghao Yang, Ningshan Ma, Weihao Xuan, Jen-tse Huang
General AI
Applications based on large language models (LLMs), such as multi-agent simulations, require population diversity among agents. We identify a pervasive failure mode we term \emph{Persona Collapse}: agents each assigned a distinct profile nonetheless converge into a narrow behavioral mode, producing a homogeneous simula…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-05-12 · Junxian Li, Kai Liu, Zizhong Ding, Zhixin Wang, Zhikai Chen, Renjing Pei, Yulun Zhang
General AI
The development of separate-encoder Unified multimodal models (UMMs) comes with a rapidly growing inference cost due to dense visual token processing. In this paper, we focus on understanding-side visual token reduction for improving the efficiency of separate-encoder UMMs. While this topic has been widely studied for …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-05-12 · Guohui Zhang, XiaoXiao Ma, Jie Huang, Hang Xu, Hu Yu, Siming Fu, Yuming Li, Zeyue Xue, Lin Song, Haoyang Huang, Nan Duan, Feng Zhao
General AI
Recent advances in joint audio-video generation have been remarkable, yet real-world applications demand strong per-modality fidelity, cross-modal alignment, and fine-grained synchronization. Reinforcement Learning (RL) offers a promising paradigm, but its extension to multi-objective and multi-modal joint audio-video …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.2
2026-04-29 · Bochao Liu, Zhipeng Qian, Yang Zhao, Xinyuan Jiang, Zihan Liang, Yufei Ma, Junpeng Zhuang, Ben Chen, Shuo Yang, Hongen Wan, Yao Wu, Chenyi Lei, Xiao Liang
General AI
Operating and maintaining (O&M) large-scale online engine systems (search, recommendation, advertising) demands substantial human effort for release monitoring, alert response, and root cause analysis. While LLM-based agents are a natural fit for these tasks, the deployment bottleneck is not reasoning capability but or…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.2
2026-04-29 · Tianqi Gao, Chengkai Huang, Zihan Wang, Cao Liu, Ke Zeng, Lina Yao
General AI
Large language models (LLMs) have recently been adopted for recommendation by framing user preference modeling as a language generation problem. However, existing latent reasoning approaches typically represent user intent with a single latent vector, which struggles to capture the inherently multi-faceted nature of us…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.2
2026-04-29 · Yuanze Hu, Gen Li, Yuqin Lan, Qingchen Yu, Zhichao Yang, Junwei Jing, Zhaoxin Fan, Xiaotie Deng
General AI
Multimodal large language models (MLLMs) have achieved impressive progress on general multimodal tasks, yet they remain brittle on dial-based measurement reading. In this paper, we study this problem through controlled benchmarks and feature-space probing, and show that current MLLMs not only achieve unsatisfactory acc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.2
2026-04-30 · Xin Zhou, Dingkang Liang, Xiwu Chen, Feiyang Tan, Dingyuan Zhang, Hengshuang Zhao, Xiang Bai
General AI
Driving world models serve as a pivotal technology for autonomous driving by simulating environmental dynamics. However, existing approaches predominantly focus on future scene generation, often overlooking comprehensive 3D scene understanding. Conversely, while Large Language Models (LLMs) demonstrate impressive reaso…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.2
2026-04-30 · Keming Wu, Zuhao Yang, Kaichen Zhang, Shizun Wang, Haowei Zhu, Sicong Leng, Zhongyu Yang, Qijie Wang, Sudong Wang, Ziting Wang, Zili Wang, Hui Zhang, Haonan Wang, Hang Zhou, Yifan Pu, Xingxuan Li, Fangneng Zhan, Bo Li, Lidong Bing, Yuxin Song, Ziwei Liu, Wenhu Chen, Jingdong Wang, Xinchao Wang, Xiaojuan Qi, Shijian Lu, Bin Wang
General AI
Recent visual generation models have made major progress in photorealism, typography, instruction following, and interactive editing, yet they still struggle with spatial reasoning, persistent state, long-horizon consistency, and causal understanding. We argue that the field should move beyond appearance synthesis towa…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.2
2026-04-30 · Ivan Bercovich
General AI
Terminal-agent benchmarks have become a primary signal for measuring the coding and system-administration capabilities of large language models. As the market for evaluation environments grows, so does the pressure to ship tasks quickly, often without thorough adversarial review of the verification logic. This paper is…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.2
2026-05-01 · Saeid Jamshidi, Foutse Khomh, Carol Fung, Kawser Wazed Nafi
General AI
The adoption of Internet of Things (IoT) systems at the network edge of smart architectures is increasing rapidly, intensifying the need for security mechanisms that are both adaptive and resource-efficient. In such environments, runtime defence mechanisms are no longer limited to detection alone but become a resource-…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.2
2026-05-04 · Hamza Ahmed Durrani, Rafay Suleman Durrani
General AI
The integration of Large Language Model (LLM) reasoning principles into classical robot path planning represents a rapidly emerging research direction. In this paper, we propose a Semantic Risk-Aware Heuristic (SRAH) planner that encodes LLM-inspired cost functions penalising geometrically cluttered or high-risk zones …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.2
2026-05-04 · Purna Sai Garigipati, Onur Ayan, Kishor Chandra Joshi, Xueli An
General AI
Artificial Intelligence (AI) will play an essential role in 6G. It will fundamentally reshape the network architecture itself and drive major changes in the design of network entities, interfaces, and procedures. The adoption of agentic AI in next-generation networks is expected to enhance network intelligence and auto…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.0
2026-03-06 · Xudong Wang, Jiahua Dong, Baichen Liu, Qi Lyu, Lianqing Liu, Zhi Han
Research Track A · General AI
Embodied navigation agents powered by large language models have shown strong performance on individual tasks but struggle to continually acquire new navigation skills, which suffer from catastrophic forgetting. We formalize this challenge as lifelong embodied navigation learning (LENL), where an agent is required to a…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.0
2026-03-15 · Jiayuan Du, Yuebing Song, Yiming Zhao, Xianghui Pan, Jiawei Lian, Yuchu Lu, Liuyi Wang, Chengju Liu, Qijun Chen
Research Track A · General AI
End-to-End autonomous driving (E2E-AD) systems face challenges in lifelong learning, including catastrophic forgetting, difficulty in knowledge transfer across diverse scenarios, and spurious correlations between unobservable confounders and true driving intents. To address these issues, we propose DeLL, a Deconfounded…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.0
2026-03-26 · Yuqian Fu, Haohuan Huang, Kaiwen Jiang, Yuanheng Zhu, Dongbin Zhao
General AI
On-policy distillation (OPD) is appealing for large language model (LLM) post-training because it evaluates teacher feedback on student-generated rollouts rather than fixed teacher traces. In long-horizon settings, however, the common sampled-token variant is fragile: it reduces distribution matching to a one-token sig…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.0
2026-03-30 · Hongtao Wu, Boyun Zheng, Dingjie Song, Yu Jiang, Jianfeng Gao, Lei Xing, Lichao Sun, Yixuan Yuan
General AI
Autonomous systems that generate scientific hypotheses, conduct experiments, and draft manuscripts have recently emerged as a promising paradigm for accelerating discovery. However, existing AI Scientists remain largely domain-agnostic, limiting their applicability to clinical medicine, where research is required to be…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.0
2026-04-02 · Yang Zhou, Xiaofeng Wang, Hao Shao, Letian Wang, Guosheng Zhao, Jiangnan Shao, Jiagang Zhu, Tingdong Yu, Zheng Zhu, Guan Huang, Steven L. Waslander
General AI
Recently, world-action models (WAM) have emerged to bridge vision-language-action (VLA) models and world models, unifying their reasoning and instruction-following capabilities and spatio-temporal world modeling. However, existing WAM approaches often focus on modeling 2D appearance or latent representations, with limi…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.0
2026-04-02 · Difan Jiao, Qianfeng Wen, Blair Yang, Zhenwei Tang, Ashton Anderson
General AI
We introduce ThinkTwice, a simple two-phase framework that jointly optimizes LLMs to solve reasoning problems and refine the answers, based on Group Relative Policy Optimization (GRPO). In each pair of training steps, ThinkTwice first optimizes the model on solving reasoning problems, then optimizes it on refining its …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.0
2026-04-16 · Peifeng Zhang, Zice Qiu, Donghua Yu, Shilei Cao, Juepeng Zheng, Yutong Lu, Haohuan Fu
Research Track A · General AI
In continual visual question answering (VQA), existing Continual Learning (CL) methods are mostly built for symmetric, unimodal architectures. However, modern Vision-Language Models (VLMs) violate this assumption, as their trainable components are inherently asymmetric. This structural mismatch renders VLMs highly pron…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.0
2026-04-16 · Amirhosein Javadi, Tuomas Oikarinen, Tara Javidi, Tsui-Wei Weng
Research Track A · General AI
Catastrophic forgetting remains a fundamental challenge in continual learning, in which models often forget previous knowledge when fine-tuned on a new task. This issue is especially pronounced in class incremental learning (CIL), which is the most challenging setting in continual learning. Existing methods to address …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.0
2026-04-18 · Dongkyu Cho, Xiyue Li, Samrachana Adhikari, Rumi Chunara
Research Track A · General AI
Continual learning aims to update models under distribution shift without forgetting, yet many high-stakes deployments, such as healthcare, also require interpretability. In practice, models that adapt well (e.g., deep networks) are often opaque, while models that are interpretable (e.g., decision trees) are brittle un…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.0
2026-04-22 · Zeyu Shen, Peter Henderson
Research Track A · General AI
Mixture-of-Experts models, now popular for scaling capacity at fixed inference speed, switch experts at nearly every token. Once a model outgrows available GPU memory, this churn can render optimizations like offloading and pre-fetching ineffective. We make the case that the options framework in reinforcement learning …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.0
2026-04-23 · Paul-Tiberiu Iordache, Elena Burceanu
Research Track A · General AI
Continual learning (CL) studies how models acquire tasks sequentially while retaining previously learned knowledge. Despite substantial progress in benchmarking CL methods, comparative evaluations typically keep the fine-tuning regime fixed. In this paper, we argue that the fine-tuning regime, defined by the trainable …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.0
2026-04-25 · Yida Xue, Ningyu Zhang, Tingwei Wu, Zhe Ma, Daxiong Ji, Zhao Wang, Guozhou Zheng, Huajun Chen
General AI
The vast and underexplored ocean plays a critical role in regulating global climate and supporting marine biodiversity, yet artificial intelligence has so far delivered limited impact in this domain due to a fundamental data bottleneck. Specifically, ocean data are highly fragmented across disparate sources and inheren…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.0
2026-04-27 · Yale Song, Yiwen Song, Nick Losier, Nathan Hodson, Ye Jin, Rhyard Zhu, Yan Xu, Daniel Vlasic, Carina Claassen, Jasmine Leon, Khanh G. LeViet, Zack Chomyn, Joe Timmons, Brett Slatkin, Scott Penberthy, Tomas Pfister
General AI
While diffusion models generate high-fidelity video clips, transforming them into coherent storytelling engines remains challenging. Current agentic pipelines automate this via chained modules but suffer from semantic drift and cascading failures due to independent, handcrafted prompting. We present Co-Director, a hier…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.0
2026-04-27 · Qiliang Liang, Hansi Wang, Zhong Liang, Yang Liu
General AI
LLM agents increasingly rely on reusable skills, capability packages that combine instructions, control flow, constraints, and tool calls. In most current agent systems, however, skills are still represented by text-heavy artifacts, including SKILL.md-style documents and structured records whose machine-usable evidence…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.0
2026-04-27 · Sivajeet Chand, Kevin Nguyen, Peter Kuntz, Alexander Pretschner
Research Track A · General AI
Large language models (LLMs) perform strongly on general-purpose code generation, yet their applicability to enterprise domain-specific languages (DSLs) remains underexplored, especially for repository-scale change generation spanning multiple files and folder structures from a single natural-language (NL) instruction.…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.0
2026-04-27 · Chenkai Pan, Xinglong Xu, Yuhang Xu, Yujun Wu, Siyuan Li, Jintao Chen, Conghui He, Jingxuan Wei, Cheng Tan
General AI
Reliably transferring specialized human knowledge from text into large language models remains a fundamental challenge in artificial intelligence. Fine-tuning on domain corpora has enabled substantial capability gains, but the process operates without feedback: when a model fails on a domain task, there is no method to…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.0
2026-04-27 · Jiaqi Wang, Wenhao Zhang, Weijie Shi, Yaliang Li, James Cheng
General AI
On-policy distillation (OPD) has shown strong potential for transferring reasoning ability from frontier or domain-specific models to smaller students. While effective on static single-turn tasks, its behavior in multi-turn agent settings remains underexplored. In this work, we identify a key limitation of vanilla OPD …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.0
2026-04-28 · Lei Xiong, Kun Luo, Ziyi Xia, Wenbo Zhang, Jin-Ge Yao, Zheng Liu, Jingying Shao, Jianlyu Chen, Hongjin Qian, Xi Yang, Qian Yu, Hao Li, Chen Yue, Xiaan Du, Yuyang Wang, Yesheng Liu, Haiyu Xu, Zhicheng Dou
General AI
Autonomous scientific research is significantly advanced thanks to the development of AI agents. One key step in this process is finding the right scientific literature, whether to explore existing knowledge for a research problem, or to acquire evidence for verifying assumptions and supporting claims. To assess AI age…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.0
2026-05-12 · Phu-Hoa Pham, Chi-Nguyen Tran, Nguyen Lam Phu Quy, Dao Sy Duy Minh, Huynh Trung Kiet, Long Tran-Thanh
Research Track A · General AI
Streaming decision trees are natural candidates for open-world continual learning, as they perform local updates, enjoy bounded memory, and static decision boundaries. Despite these, they still fail in online class-incremental learning due to two coupled miscalibrations: (i) their split criterion grows unreliable as th…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-03-30 · Seyed Parsa Neshaei, Richard Lee Davis, Tanja Käser
General AI
Reflective writing is known to support the development of students' metacognitive skills, yet learners often struggle to engage in deep reflection, limiting learning gains. Although large language models (LLMs) have been shown to improve writing skills, their use as conversational agents for reflective writing has prod…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-03-30 · Mih Dinh, SouYoung Jin
General AI
Large-scale image datasets frequently contain identifiable or sensitive content, raising privacy risks when training models that may memorize and leak such information. We present Unsafe2Safe, a fully automated pipeline that detects privacy-prone images and rewrites only their sensitive regions using multimodally guide…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-03-31 · Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa, Kai Greshake, Hanshen Xiao, Chaowei Xiao, G. Edward Suh
General AI
AI agents, predominantly powered by large language models (LLMs), are vulnerable to indirect prompt injection, in which malicious instructions embedded in untrusted data can trigger dangerous agent actions. This position paper discusses our vision for system-level defenses against indirect prompt injection attacks. We …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-04-02 · Alexander Pondaven, Ziyi Wu, Igor Gilitschenski, Philip Torr, Sergey Tulyakov, Fabio Pizzati, Aliaksandr Siarohin
General AI
Recent advances in video diffusion have enabled the development of "world models" capable of simulating interactive environments. However, these models are largely restricted to single-agent settings, failing to control multiple agents simultaneously in a scene. In this work, we tackle a fundamental issue of action bin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-04-06 · Xiangzhao Hao, Zefeng Zhang, Zhenyu Zhang, Linhao Yu, Yao Chen, Yiqian Zhang, Haiyun Guo, Shuohuan Wang, Yu Sun
General AI
Image degradation from blur, noise, compression, and poor illumination severely undermines multimodal understanding in real-world settings. Unified multimodal models that combine understanding and generation within a single architecture are a natural fit for this challenge, as their generative pathway can model the fin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-04-06 · Yuhang Zhou, Lizhu Zhang, Yifan Wu, Jiayi Liu, Xiangjun Fan, Zhuokai Zhao, Hong Yan
General AI
As large language model agents advance beyond software engineering (SWE) tasks toward machine learning engineering (MLE), verifying agent behavior becomes orders of magnitude more expensive: while SWE tasks can be verified via fast-executing unit tests, MLE verification requires running full ML pipelines -- data prepro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-04-06 · Gabriel Sarch, Linrong Cai, Qunzhong Wang, Haoyang Wu, Danqi Chen, Zhuang Liu
General AI
What does it take to build a visual reasoner that works across charts, science, spatial understanding, and open-ended tasks? The strongest vision-language models (VLMs) show such broad visual reasoning is within reach, but the recipe behind them remains unclear, locked behind proprietary reinforcement learning (RL) pip…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-04-07 · Naen Xu, Jiayi Sheng, Changjiang Li, Chunyi Zhou, Yuyuan Li, Tianyu Du, Jun Wang, Zhihui Fu, Jinbao Li, Shouling Ji
General AI
Puns are a common form of rhetorical wordplay that exploits polysemy and phonetic similarity to create humor. In multimodal puns, visual and textual elements synergize to ground the literal sense and evoke the figurative meaning simultaneously. Although Vision-Language Models (VLMs) are widely used in multimodal unders…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-04-07 · Hongxu Zhou
General AI
Intrinsic self-correction in Large Language Models (LLMs) frequently fails in open-ended reasoning tasks due to ``hallucination snowballing,'' a phenomenon in which models recursively justify early errors during free-text reflection. While structured feedback can mitigate this issue, existing approaches often rely on e…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-04-09 · Addison J. Wu, Ryan Liu, Shuyue Stella Li, Yulia Tsvetkov, Thomas L. Griffiths
General AI
Today's large language models (LLMs) are trained to align with user preferences through methods such as reinforcement learning. Yet models are beginning to be deployed not merely to satisfy users, but also to generate revenue for the companies that created them through advertisements. This creates the potential for LLM…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-04-28 · Xueying Zeng, Youquan Xian, Sihao Liu, Xudong Mou, Yanze Li, Lei Cui, Bo Li
Research Track A · General AI
With the rapid evolution of Android applications, traditional machine learning-based detection models suffer from concept drift. Additionally, they are constrained by shallow features, lacking deep semantic understanding and interpretability of decisions. Although Large Language Models (LLMs) demonstrate remarkable sem…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-04-28 · Mengyao Du, Han Fang, Haokai Ma, Jiahao Chen, Kai Xu, Quanjun Yin, Ee-Chien Chang
Research Track B · General AI
Web agents have emerged as an effective paradigm for automating interactions with complex web environments, yet remain vulnerable to prompt injection attacks that embed malicious instructions into webpage content to induce unintended actions. This threat is further amplified for screenshot-based web agents, which opera…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-05-07 · Weien Li, Rui Song, Zeyu Li, Haochen Liu, Gonghao Zhang, Difan Jiao, Zhenwei Tang, Bowei He, Haolun Wu, Xue Liu, Ye Yuan
General AI
Visual document retrieval has become essential for accessing information in visually rich documents. Existing approaches fall into two camps. Late-interaction retrievers achieve strong quality through fine-grained token-level matching but store hundreds of vectors per page, incurring large index footprints and high ser…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-05-07 · Xiangyuan Xue, Yifan Zhou, Zidong Wang, Shengji Tang, Philip Torr, Wanli Ouyang, Lei Bai, Zhenfei Yin
General AI
Large language models (LLMs) are increasingly used as interactive agents, but optimizing them for long-horizon decision making remains difficult because current methods are largely purely reactive, which weakens both exploration and credit assignment over extended trajectories. In this work, we present Strategic Trajec…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-05-07 · Yuhang Lai, Jiazhan Feng, Yee Whye Teh, Ning Miao
General AI
Large Language Models (LLMs) demonstrate strong capabilities for solving scientific and mathematical problems, yet they struggle to produce valid, challenging, and novel problems - an essential component for advancing LLM training and enabling autonomous scientific research. Existing problem generation approaches eithe…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-05-10 · Zhiqing Zhong, Zhijing Ye, Jiamin Wang, Xiaodong Yu
Research Track B · General AI
Closed-loop tool-using agents are increasingly evaluated in executable web, code, and micro-task environments, but benchmark reports often conflate workloads, action-generating drivers, and the evidence admitted for systems-facing claims. We present an executable benchmarking suite that makes these objects explicit und…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-05-10 · Yilin Zhang, Yingkai Hua, Chunyu Wei, Xin Wang, Yueguo Chen
Research Track B · General AI
Vision-language model (VLM) based web agents demonstrate impressive autonomous GUI interaction but remain vulnerable to deceptive interface elements. Existing approaches either detect deception without task integration or document attacks without proposing defenses. We formalize deception-aware web agent defense and pr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.6
2026-05-11 · Kainat Riaz, Muhammad Ahmed Mohsin, Ahsan Bilal, Muhammad Umer, Ayesha Mohsin, Aqib Riaz, Ali Subhan, John M. Cioffi
General AI
Automated scientific discovery using large language models relies on identifying genuinely novel solutions. Standard reinforcement learning penalizes high-variance mutations, which leads the policy to prioritize familiar patterns. As a result, the maximum reward plateaus even as the average reward increases. Overcoming…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.5
2026-03-22 · Elif Ceren Gok Yildirim, Murat Onur Yildirim, Joaquin Vanschoren
Research Track A · General AI
The continual learning literature has rapidly shifted from traditional class incremental learning (CIL) techniques to foundation model (FM)-based CIL methods without a clear understanding of how these newer approaches compare to strong, lightweight convolutional baselines. This abrupt transition has created a substanti…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.5
2026-04-02 · Xinlei Yu, Zhangquan Chen, Yongbo He, Tianyu Fu, Cheng Yang, Chengming Xu, Yue Ma, Xiaobin Hu, Zhe Cao, Jie Xu, Guibin Zhang, Jiale Tao, Jiayi Zhang, Siyuan Ma, Kaituo Feng, Haojie Huang, Youxing Li, Ronghao Chen, Huacan Wang, Chenglin Wu, Zikun Su, Xiaogang Xu, Kelu Yao, Kun Wang, Chen Gao, Yue Liao, Ruqi Huang, Tao Jin, Cheng Tan, Jiangning Zhang, Wenqi Ren, Yanwei Fu, Yong Liu, Yu Wang, Xiangyu Yue, Yu-Gang Jiang, Shuicheng Yan
Research Track A · General AI
Latent space is rapidly emerging as a native substrate for language-based models. While modern systems are still commonly understood through explicit token-level generation, an increasing body of work shows that many critical internal processes are more naturally carried out in continuous latent space than in human-rea…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.5
2026-04-03 · Linyu Li, Zhi Jin, Yichi Zhang, Dongming Jin, Yuanpeng He, Haoran Duan, Gadeng Luosang, Nyima Tashi
Research Track A · General AI
Real-world multimodal knowledge graphs (MMKGs) are dynamic, with new entities, relations, and multimodal knowledge emerging over time. Existing continual knowledge graph reasoning (CKGR) methods focus on structural triples and cannot fully exploit multimodal signals from new entities. Existing multimodal knowledge grap…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-04-07 · Weiyue Li, Ruizhi Qian, Yi Li, Yongce Li, Yunfan Long, Jiahui Cai, Yan Luo, Mengyu Wang
General AI
Large language models (LLMs) are widely explored for reasoning-intensive research tasks, yet resources for testing whether they can infer scientific conclusions from structured biomedical evidence remain limited. We introduce MedConclusion, a large-scale dataset of 5.7M PubMed structured abstracts for biomedical conclu…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-04-13 · Hanqi Xiao, Vaidehi Patil, Zaid Khan, Hyunji Lee, Elias Stengel-Eskin, Mohit Bansal
General AI
As large language models (LLMs) become the engine behind conversational systems, their ability to reason about the intentions and states of their dialogue partners (i.e., form and use a theory-of-mind, or ToM) becomes increasingly critical for safe interaction with potentially adversarial partners. We propose a novel p…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-04-17 · Sai Srinivas Kancheti, Aditya Sanjiv Kanade, Vineeth N. Balasubramanian, Tanuja Ganu
General AI
Multimodal Reasoning Models (MRMs) leveraging Chain-of-Thought (CoT) based thinking have revolutionized mathematical and logical problem-solving. However, we show that this paradigm struggles with generalized spatial intelligence. We perform a comprehensive evaluation of seventeen models across thirteen spatial benchma…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-04-18 · Bo Li, Ningyuan Deng, Tianyu Dong, Shaobo Wang, Shaolin Zhu, Lijie Wen
General AI
Multimodal large language models (MLLMs) have shown impressive capabilities, yet they often struggle to effectively capture the fine-grained textual information within images crucial for accurate image translation. This often leads to a modality gap between visual text inputs and textual inputs/outputs for image transl…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-04-19 · Xinyu Zhu, Yuzhu Cai, Zexi Liu, Cheng Wang, Fengyang Li, Wenkai Jin, Wanxu Liu, Zehao Bing, Bingyang Zheng, Jingyi Chai, Shuo Tang, Rui Ye, Yuwen Du, Xianghe Pang, Yaxin Du, Tingjia Miao, Yuzhi Zhang, Ruoxue Liao, Zhaohan Ding, Linfeng Zhang, Yanfeng Wang, Weinan E, Siheng Chen
General AI
The convergence of large language models and agents is catalyzing a new era of scientific discovery: Agentic Science. While the scientific method is inherently iterative, existing agent frameworks are predominantly static, narrowly scoped, and lack the capacity to learn from trial and error. To bridge this gap, we pres…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-04-19 · Yueyang Ding, HaoPeng Zhang, Rui Dai, Yi Wang, Tianyu Zong, Kaikui Liu, Xiangxiang Chu
General AI
Comprehensive understanding of time series remains a significant challenge for Large Language Models (LLMs). Current research is hindered by fragmented task definitions and benchmarks with inherent ambiguities, precluding rigorous evaluation and the development of unified Time Series Reasoning Models(TSRMs). To bridge …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-04-20 · Yejin Yoon, Minseo Kim, Taeuk Kim
General AI
Users often omit essential details in their requests to LLM-based agents, resulting in under-specified inputs for tool use. This poses a fundamental challenge for tool-augmented agents, as API execution typically requires complete arguments, highlighting the need for personalized tool calling. To study this problem, we…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-04-20 · Sua Lee, Sanghee Park, Jinbae Im
General AI
Multimodal Large Language Models (MLLMs) have been increasingly used as automatic evaluators-a paradigm known as MLLM-as-a-Judge. However, their reliability and vulnerabilities to biases remain underexplored. We find that many MLLM judges fail to reliably integrate key visual or textual cues, yielding unreliable evalua…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-04-20 · Yilei Jiang, Jinyuan Hu, Qianyin Xiao, Yaozhi Zheng, Ruize Ma, Kaituo Feng, Jiaming Han, Tianshuo Peng, Kaixuan Fan, Manyuan Zhang, Xiangyu Yue
General AI
Game development sits at the intersection of creative design and intricate software engineering, demanding the joint orchestration of game engines, real-time loops, and tightly coupled state across many files. While Large Language Models (LLMs) and code agents now solve isolated programming tasks with ease, they consis…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-04-23 · Vipula Rawte, Ryan Rossi, Franck Dernoncourt, Nedim Lipka
General AI
Large Language Models (LLMs) have demonstrated remarkable fluency and versatility across a wide range of NLP tasks, yet they remain prone to factual inaccuracies and hallucinations. This limitation poses significant risks in high-stakes domains such as healthcare, law, and scientific communication, where trust and veri…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-04-26 · Fanqing Meng, Lingxiao Du, Zijian Wu, Guanzheng Chen, Xiangyan Liu, Jiaqi Liao, Chonghe Jiang, Zhenglin Wan, Jiawei Gu, Pengfei Zhou, Rui Huang, Ziqi Zhao, Shengyuan Ding, Ailing Yu, Bo Peng, Bowei Xia, Hao Sun, Haotian Liang, Ji Xie, Jiajun Chen, Jiajun Song, Liu Yang, Ming Xu, Qionglin Qiu, Runhao Fu, Shengfang Zhai, Shijian Wang, Tengfei Ma, Tianyi Wu, Weiyang Jin, Yan Wang, Yang Dai, Yao Lai, Youwei Shu, Yue Liu, Yunzhuo Hao, Yuwei Niu, Jinkai Huang, Jiayuan Zhuo, Zhennan Shen, Linyu Wu, Cihang Xie, Yuyin Zhou, Jiaheng Zhang, Zeyu Zheng, Mengkang Hu, Michael Qizhe Shieh
General AI
Language-model agents are increasingly used as persistent coworkers that assist users across multiple working days. During such workflows, the surrounding environment may change independently of the agent: new emails arrive, calendar entries shift, knowledge-base records are updated, and evidence appears across images,…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.4
2026-05-04 · Junjie Yu, Pengrui Lu, Weiye Si, Hongliang Lu, Jiabao Wu, Kaiwen Tao, Kun Wang, Lingyu Yang, Qiran Zhang, Xiuting Guo, Xuanyu Wang, Yang Wang, Yanjie Wang, Yi Yang, Zijian Hu, Ziyi Yang, Zonghan Zhou, Binghao Qiang, Borui Zhang, Chenning Li, Enchang Zhang, Feifan Chen, Feng Jian, Fengyin Sun, Hao Qiu, Hao Zheng, Haoran Zhu, Hongyu Liu, Jianbin Deng, Jiaxin Song, Jiaying Chi, Jiayou Shi, Jie Fang, Jinghui Zhong, Jingyu Zhou, Jinze Li, Junfeng Yi, Junyan Yu, Junzhi Xue, Ni Song, Pengyi Chen, Qi Chen, Quansheng Li, Rui Tao, Shenghai Gong, Shenhang Lu, Tianqi Shen, Tianxiang Zhu, Tiehan Kang, Tingyu Li, Wendi Wu, Xiao Shen, Xiao Zhou, Xiaotao Zhang, Xinrong Li, Xuankun Yang, Xun Zhang, Yan Li, Ye Lu, Yi Wang, Yibo Zhou, Yichi Zhang, Yihao Sun, Yijun Huang, Yixin Zhu, Yixuan Wu, Yuchen Sun, Yue Wu, Yuheng Sun, Yukun Li, Yutian Tu, Yuxuan Qin, Yuzhuo Wu, Zeyu Li, Zhengyu Lou, Zhenning Ran, Zizhu He, Pengfei Liu
General AI
Benchmarks within the OpenClaw ecosystem have thus far evaluated exclusively assistant-level tasks, leaving the academic-level capabilities of OpenClaw largely unexamined. We introduce AcademiClaw, a bilingual benchmark of 80 complex, long-horizon tasks sourced directly from university students' real academic workflows…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.4
2026-05-04 · Thanasis Pantsios, Dimitrios Karageorgiou, Christos Koutlis, George Karantaidis, Olga Papadopoulou, Symeon Papadopoulos
Research Track A · General AI
The rapid advancement of generative Artificial Intelligence (AI) has introduced significant challenges for reliable AI-generated image detection. Existing detectors often suffer from performance degradation under distribution shifts and when encountering newly emerging generative models. In this work, we propose a data…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-13 · Yuqian Yuan, Wenqiao Zhang, Juekai Lin, Yu Zhong, Mingjian Gao, Binhe Yu, Yunqi Cao, Wentong Li, Yueting Zhuang, Beng Chin Ooi
General AI
Large Multimodal Models (LMMs) have achieved remarkable progress in general-purpose vision--language understanding, yet they remain limited in tasks requiring precise object-level grounding, fine-grained spatial reasoning, and controllable visual manipulation. In particular, existing systems often struggle to identify …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-14 · Jaywon Koo, Jefferson Hernandez, Ruozhen He, Hanjie Chen, Chen Wei, Vicente Ordonez
General AI
We introduce HypoExplore, an agentic framework that formulates neural architecture discovery for visual recognition as a hypothesis-driven scientific inquiry. Given a human-specified high-level research direction, HypoExplore ideates, implements, evaluates, and improves neural architectures through evolutionary branchi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-14 · Muhammad Kamran Janjua, Hugo Silva, Di Niu, Bahador Rashidi
General AI
Multimodal language models (MLLMs) are increasingly paired with vision tools (e.g., depth, flow, correspondence) to enhance visual reasoning. However, despite access to these tool-generated visual cues, MLLMs often fail to benefit from them. Existing approaches typically feed raw tool outputs into the model, but these …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-14 · Han Bao, Penghao Zhang, Yue Huang, Zhengqing Yuan, Yanchi Ru, Rui Su, Yujun Zhou, Xiangqi Wang, Kehan Guo, Nitesh V Chawla, Yanfang Ye, Xiangliang Zhang
General AI
Large Language Models (LLMs) are increasingly integrated into real-world decision-making, including in the domain of public policy. Yet, their ability to comprehend and reason about policy-related content remains underexplored. To fill this gap, we present \textbf{\textit{PolicyBench}}, the first large-scale cross-syst…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-14 · Himangi Mittal, Gaurav Mittal, Nelson Daniel Troncoso, Yu Hu
Research Track B · General AI
Computer Use Agents (CUAs) fundamentally rely on graphical user interface (GUI) grounding to translate language instructions into executable screen actions, but editing-level grounding in dense coding interfaces, where sub-pixel accuracy is required to interact with dense IDE elements, remains underexplored. Existing a…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-16 · Federico Pierucci, Matteo Prandi, Marcantonio Bracale Syrnikov, Marcello Galisai, Piercosma Bisconti
General AI
This paper advances a methodological proposal for safety research in agentic AI. As systems acquire planning, memory, tool use, persistent identity, and sustained interaction, safety can no longer be analysed primarily at the level of the isolated model. Population-level risks arise from structured interaction among ag…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-16 · Moin Aminnaseri, Farima Fatahi Bayat, Nikita Bhutani, Jean-Flavien Bussotti, Kevin Chan, Rafael Li Chen, Yanlin Feng, Jackson Hassell, Estevam Hruschka, Eser Kandogan, Hannah Kim, James Levine, Seiji Maekawa, Jalal Mahmud, Kushan Mitra, Naoki Otani, Pouya Pezeshkpour, Nima Shahbazi, Chen Shen, Dan Zhang
General AI
NL2SQL systems aim to address the growing need for natural language interaction with data. However, real-world information rarely maps to a single SQL query because (1) users express queries iteratively (2) questions often span multiple data sources beyond the closed-world assumption of a single database, and (3) queri…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-16 · XiangRui Zhang, Qiang Li, Haining Wang
General AI
Binary analysis increasingly relies on large language models (LLMs) to perform semantic reasoning over complex program behaviors. However, existing approaches largely adopt a one-pass execution paradigm, where reasoning operates over a fixed program representation constructed by static analysis tools. This formulation …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-16 · Alexey Khoroshilov, Alexey Chernysh, Orkhan Ekhtibarov, Nini Kamkia, Dmitry Zmitrovich
General AI
Large language models have demonstrated strong performance on general-purpose programming tasks, yet their ability to generate executable algorithmic trading strategies remains underexplored. Unlike standard code benchmarks, trading-strategy generation requires simultaneous mastery of domain-specific financial logic, k…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-17 · Vitor F. Grizzi, Thang Duc Pham, Luke N. Pretzie, Jiayi Xu, Murat Keceli, Cong Liu
General AI
Computational X-ray absorption near-edge structure (XANES) is widely used to probe local coordination environments, oxidation states, and electronic structure in chemically complex systems. However, the use of computational XANES at scale is constrained more by workflow complexity than by the underlying simulation meth…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-17 · Yunhe Li, Hao Shi, Bowen Deng, Wei Wang, Mengzhe Ruan, Hanxu Hou, Zhongxiang Dai, Siyang Gao, Chao Wang, Shuang Qiu, Linqi Song
General AI
Although most of the automated theorem-proving approaches depend on formal proof systems, informal theorem proving can align better with large language models' (LLMs) strength in natural language processing. In this work, we identify a primary bottleneck in informal theorem proving as a lack of insight, namely the diff…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-17 · Deshan Sumanathilaka, Nicholas Micallef, Julian Hough, Saman Jayasinghe
General AI
Recent advances in language models have substantially improved Natural Language Understanding (NLU). Although widely used benchmarks suggest that Large Language Models (LLMs) can effectively disambiguate, their practical applicability in real-world narrative contexts remains underexplored. SemEval-2026 Task 5 addresses…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-17 · Siddhant Bharadwaj, Ashish Vashist, Fahimul Aleem, Shruti Vyas
General AI
Image geolocalization has traditionally been addressed through retrieval-based place recognition or geometry-based visual localization pipelines. Recent advances in Vision-Language Models (VLMs) have demonstrated strong zero-shot reasoning capabilities across multimodal tasks, yet their performance in geographic infere…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-19 · Mohit Dubey
Research Track B · General AI
Multi-agent systems (MAS) powered by large language models suffer from severe token inefficiency arising from two compounding sources: (i) unstructured parallel execution, where all agents activate simultaneously irrespective of input readiness; and (ii) unrestricted context sharing, where every agent receives the full…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-20 · Ghazal Khalighinejad, Raghuveer Thirukovalluru, Alexander H. Oh, Bhuwan Dhingra
General AI
Many recent document embedding models are trained on document-as-image representations, embedding rendered pages as images rather than the underlying source. Meanwhile, existing benchmarks for scientific document retrieval, such as ArXivQA and ViDoRe, treat documents as images of pages, implicitly favoring such represe…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-20 · Liubomyr Horbatko
General AI
Modern sequence models are dominated by Transformers, where self-attention mixes information from the visible context in an input-dependent way. However, when retrieval is not sharp and attention remains diffuse over an effective support $S_{\mathrm{eff}}(t)$, the influence of any individual token is diluted, typically…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-22 · Marisa Hudspeth, Patrick J. Burns, Brendan O'Connor
General AI
We introduce a benchmark dataset for question answering and translation in bilingual Latin and English settings, containing about 7,800 question-answer pairs. The questions are drawn from Latin pedagogical sources, including exams, quizbowl-style trivia, and textbooks ranging from the 1800s to the present. After automa…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-22 · Fulong Fan, Peilin Liu, Fengzhe Liu, Shuyan Yang, Gang Yan
General AI
Large language models perform well on many reasoning tasks, yet they often lack awareness of whether their current knowledge or reasoning state is complete. In non-interactive puzzle settings, the narrative is fixed and the underlying structure is hidden; once a model forms an early hypothesis under incomplete premises…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-23 · Buqiang Xu, Yijun Chen, Jizhan Fang, Ruobin Zhong, Yunzhi Yao, Yuqi Zhu, Lun Du, Shumin Deng
Research Track A · General AI
Long-term conversational agents need memory systems that capture relationships between events, not merely isolated facts, to support temporal reasoning and multi-hop question answering. Current approaches face a fundamental trade-off: flat memory is efficient but fails to model relational structure, while graph-based m…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-23 · Maximilian Stralz, Meshal Alharbi, Yujun Huang, Gioele Zardini
General AI
Designing multi-agent robotic systems requires reasoning across tightly coupled decisions spanning heterogeneous domains, including robot design, fleet composition, and planning. Much effort has been devoted to isolated improvements in these domains, whereas system-level co-design considering trade-offs and task requir…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-24 · Erez Yosef, Oron Anschel, Shunit Haviv Hakimi, Asaf Gendler, Adam Botach, Nimrod Berman, Igor Kviatkovsky
General AI
Recent advancements in large language models have led to significant improvements across various tasks, including mathematical reasoning, which is used to assess models' intelligence in logical reasoning and problem-solving. Models are evaluated on mathematical reasoning benchmarks by verifying the correctness of the f…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-27 · Lirong Gao, Zeqing Wang, Yuyan Cai, Jiayi Deng, Yanmei Gu, Yiming Zhang, Jia Zhou, Yanfei Zhang, Junbo Zhao
General AI
While Large Language Models (LLMs) have increasingly assisted in historical tasks such as text processing, their capacity for professional-level historical reasoning remains underexplored. Existing benchmarks primarily assess basic knowledge breadth or lexical understanding, failing to capture the higher-order skills, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-05-12 · Christos Ziakas, Alessandra Russo, Avishek Joey Bose
General AI
Generative policies based on expressive model classes, such as diffusion and flow matching, are well-suited to complex control problems with highly multimodal action distributions. Their expressivity, however, comes at a significant inference cost: generating each action typically requires simulating many steps of the …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-05-12 · Hannes Büchi, Manon Flageat, Eduardo Sebastián, Amanda Prorok
General AI
Effective multi-agent cooperation requires agents to adopt diverse behaviors as task conditions evolve-and to do so at the right moment. Yet, current Multi-Agent Reinforcement Learning (MARL) frameworks that facilitate this diversity are still limited by the fact that they bind fixed behaviors to fixed agent identities…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.2
2026-05-04 · Yuecai Zhu, Nikolaos Tsantalis, Peter C. Rigby
General AI
The promise of Large Language Models in automated software engineering is often measured by functional correctness, overlooking the critical issue of long term maintainability. This paper presents a systematic audit of technical debt in AI-generated software, revealing that AI does not eliminate flaws but rather introd…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.2
2026-05-04 · Ruichao Liang, Jing Chen, Xianglong Li, Huangpeng Gu, Yebo Feng, Yue Xue, Cong Wu, Yang Liu
General AI
Smart contract vulnerabilities in Decentralized Finance caused over billions of dollars losses every year, yet the security community faces a critical bottleneck: identifying a vulnerability is not the same as proving it is exploitable. Manual PoC construction is prohibitively labor-intensive, leaving most disclosed vu…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.0
2026-03-20 · Chiyu Ma, Shuo Yang, Kexin Huang, Jinda Lu, Haoming Meng, Shangshang Wang, Bolin Ding, Soroush Vosoughi, Guoyin Wang, Jingren Zhou
General AI
We present Future-KL Influenced Policy Optimization (FIPO), a reinforcement learning algorithm designed to overcome reasoning bottlenecks in large language models. While GRPO style training scales effectively, it typically relies on outcome-based rewards (ORM) that distribute a global advantage uniformly across every t…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.0
2026-03-30 · Zefeng He, Siyuan Huang, Xiaoye Qu, Yafu Li, Tong Zhu, Yu Cheng, Yang Yang
General AI
Recent multimodal generation models have achieved remarkable progress on general-purpose generation tasks, yet continue to struggle with complex instructions and specialized downstream tasks. Inspired by the success of advanced agent frameworks such as Claude Code, we propose GEMS (Agent-Native Multimodal GEneration wi…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.0
2026-03-31 · Zhaohe Liao, Kaixun Jiang, Zhihang Liu, Yujie Wei, Junqiu Yu, Quanhao Li, Hong-Tao Yu, Pandeng Li, Yuzheng Wang, Zhen Xing, Shiwei Zhang, Chen-Wei Xie, Yun Zheng, Xihui Liu
General AI
Although image generation has boosted various applications via its rapid evolution, whether the state-of-the-art models are able to produce ready-to-use academic illustrations for papers is still largely unexplored. Directly comparing or evaluating the illustration with VLM is native but requires oracle multi-modal und…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.0
2026-03-31 · Xiaoyan Zhang, Jiangpeng He
Research Track A · General AI
Visual food recognition in real-world dietary logging scenarios naturally exhibits severe data imbalance, where a small number of food categories appear frequently while many others occur rarely, resulting in long-tailed class distributions. In practice, food recognition systems often operate in a continual learning se…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.0
2026-04-02 · Ao Qu, Han Zheng, Zijian Zhou, Yihao Yan, Yihong Tang, Shao Yong Ong, Fenglu Hong, Kaichen Zhou, Chonghe Jiang, Minwei Kong, Jiacheng Zhu, Xuan Jiang, Sirui Li, Cathy Wu, Bryan Kian Hsiang Low, Jinhua Zhao, Paul Pu Liang
General AI
Large language model (LLM)-based evolution is a promising approach for open-ended discovery, where progress requires sustained search and knowledge accumulation. Existing methods still rely heavily on fixed heuristics and hard-coded exploration rules, which limit the autonomy of LLM agents. We present CORAL, the first …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.0
2026-04-05 · Haonian Ji, Kaiwen Xiong, Siwei Han, Peng Xia, Shi Qiu, Yiyang Zhou, Jiaqi Liu, Jinlong Li, Bingzhou Li, Zeyu Zheng, Cihang Xie, Huaxiu Yao
General AI
AI agents deployed as persistent assistants must maintain correct beliefs as their information environment evolves. In practice, evidence is scattered across heterogeneous sources that often contradict one another, new information can invalidate earlier conclusions, and user preferences surface through corrections rath…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.0
2026-04-06 · Chenxi Wang, Zhuoyun Yu, Xin Xie, Wuguannan Yao, Runnan Fang, Shuofei Qiao, Kexin Cao, Guozhou Zheng, Xiang Qi, Peng Zhang, Shumin Deng
General AI
Learning from experience is critical for building capable large language model (LLM) agents, yet prevailing self-evolving paradigms remain inefficient: agents learn in isolation, repeatedly rediscover similar behaviors from limited experience, resulting in redundant exploration and poor generalization. To address this …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.0
2026-04-06 · Chaoyou Fu, Haozhi Yuan, Yuhao Dong, Yi-Fan Zhang, Yunhang Shen, Xiaoxing Hu, Xueying Li, Jinsen Su, Chengwu Long, Xiaoyao Xie, Yongkang Xie, Xiawu Zheng, Xue Yang, Haoyu Cao, Yunsheng Wu, Ziwei Liu, Xing Sun, Caifeng Shan, Ran He
General AI
With the rapid advancement of video understanding, existing benchmarks are becoming increasingly saturated, exposing a critical discrepancy between inflated leaderboard scores and real-world model capabilities. To address this widening gap, we introduce Video-MME-v2, a comprehensive benchmark designed to rigorously eva…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.0
2026-04-22 · Sachin Kumar
Research Track B · General AI
Can small language models achieve strong tool-use performance without complex adaptation mechanisms? This paper investigates this question through Meta-Tool, a controlled empirical study comparing hypernetwork-based LoRA adaptation against carefully designed few-shot prompting. Using a Llama-3.2-3B-Instruct backbone, w…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.0
2026-04-23 · Yi-Ling Liu, Melvin Laux, Mariela De Lucas Alvarez, Frank Kirchner, Rebecca Adam
Research Track A · General AI
Autonomous underwater vehicles are required to perform multiple tasks adaptively and in an explainable manner under dynamic, uncertain conditions and limited sensing, challenges that classical controllers struggle to address. This demands robust, generalizable, and inherently interpretable control policies for reliable…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.0
2026-04-28 · Yuxin Zhang, Xiangyu Tony Zhang, Daijiao Liu, Fei Tian, Yayue Deng, Jun Chen, Qingjian Lin, Haoyang Zhang, Yuxin Li, Jinglan Gong, Yechang Huang, Liang Zhao, Chengyuan Yao, Hexin Liu, Eng Siong Chng, Xuerui Yang, Gang Yu, Xiangyu Zhang, Daxin Jiang
General AI
Recent advancements in large audio language models have extended Chain-of-Thought (CoT) reasoning into the auditory domain, enabling models to tackle increasingly complex acoustic and spoken tasks. To elicit and sustain these extended reasoning chains, the prevailing paradigm -- driven by the success of text-based reas…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.0
2026-05-07 · Yuxing Liu, Jianyu Wang, Tong Zhang
Research Track A · General AI
Optimizers play an important role in both pretraining and finetuning stages when training large language models (LLMs). In this paper, we present an observation that full finetuning with the same optimizer as in pretraining achieves a better learning-forgetting tradeoff, i.e., forgetting less while achieving the same o…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.0
2026-05-09 · Liangqi Yuan, Wenzhi Fang, Shiqiang Wang, H. Vincent Poor, Christopher G. Brinton
General AI
Large language models (LLMs) are transforming society, powering applications from smartphone assistants to autonomous driving. Yet cloud-based LLM services alone cannot serve a growing class of applications, including those operating under intermittent connectivity, sub-second latency budgets, data-residency constraint…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-03-20 · Xuanwang Zhang, Yuteng Han, Jinnan Qi, Mulong Xie, Zhen Wu, Xinyu Dai
Research Track B · General AI
Despite significant advances in autonomous web navigation, current methods remain far from human-level performance in complex web environments. We argue that this limitation stems from Topological Blindness, where agents are forced to explore via trial-and-error without access to the global topological structure of the…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-03-26 · Abhishek Bhandwaldar, Mihir Choudhury, Ruchir Puri, Akash Srivastava
General AI
We present an empirical study of how far general-purpose coding agents -- without hardware-specific training -- can optimize hardware designs from high-level algorithmic specifications. We introduce an agent factory, a two-stage pipeline that constructs and coordinates multiple autonomous optimization agents. In Stage~…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-03-26 · Yunus Talha Erzurumlu, Jiyong Kwag, Alper Yilmaz
General AI
Cross-view geo-localization (CVGL) estimates a camera's location by matching a street-view image to geo-referenced overhead imagery, enabling GPS-denied localization and navigation. Existing methods almost universally formulate CVGL as an image-retrieval problem in a contrastively trained embedding space. This ties per…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-03-27 · Shanglin Wu, Yuyang Luo, Yueqing Liang, Kaiwen Shi, Yanfang Ye, Ali Payani, Kai Shu
Research Track A · General AI
Large language model (LLM) multi-agent systems can scale along two distinct dimensions: by increasing the number of agents and by improving through accumulated experience over time. Although prior work has studied these dimensions separately, their interaction under realistic cost constraints remains unclear. In this p…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-03-30 · Iman Sharifi, Alex Zongo, Peng Wei
General AI
The growing deployment of small Unmanned Aerial Systems (sUASs) in low-altitude airspaces has increased the need for reliable tactical deconfliction under safety-critical constraints. Tactical deconfliction involves short-horizon decision-making in dense, partially observable, and heterogeneous multi-agent environments…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-03-31 · Iulian Lucău, Adelin-George Voicu
General AI
This paper evaluates whether commercial large language models (LLMs) can function as reliable political advisory tools by comparing their outputs against official legislative reasoning. Using a dataset of 15 Romanian Senate law proposals paired with their official explanatory memoranda (expuneri de motive), we test six…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-03-31 · Shi Li, Vinkle Srivastav, Nicolas Chanel, Saurav Sharma, Nabani Banik, Lorenzo Arboit, Kun Yuan, Pietro Mascagni, Nicolas Padoy
General AI
Surgical procedures are inherently complex and risky, requiring extensive expertise and constant focus to well navigate evolving intraoperative scenes. Computer-assisted systems such as surgical visual question answering (VQA) offer promises for education and intraoperative support. Current surgical VQA research largel…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-03-31 · Xue Jiang, Tianyu Zhang, Ge Li, Mengyang Liu, Taozhi Chen, Zhenhua Xu, Binhua Li, Wenpin Jiao, Zhi Jin, Yongbin Li, Yihong Dong
General AI
Recent advances in reasoning Large Language Models (LLMs) have primarily relied on upfront thinking, where reasoning occurs before final answer. However, this approach suffers from critical limitations in code generation, where upfront thinking is often insufficient as problems' full complexity only reveals itself duri…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-04-02 · Bangji Yang, Hongbo Ma, Jiajun Fan, Ge Liu
General AI
Large Language Models employing Chain-of-Thought reasoning achieve strong performance but suffer from excessive token consumption that inflates inference costs. Existing efficiency methods such as explicit length penalties, difficulty estimators, or multi-stage curricula either degrade reasoning quality or require comp…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-04-02 · Jona Ruthardt, Manu Gaur, Deva Ramanan, Makarand Tapaswi, Yuki M. Asano
General AI
Pretrained Vision Transformers (ViTs) such as DINOv2 and MAE provide generic image features that can be applied to a variety of downstream tasks such as retrieval, classification, and segmentation. However, such representations tend to focus on the most salient visual cues in the image, with no way to direct them towar…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-04-02 · Gengsheng Li, Tianyu Yang, Junfeng Fang, Mingyang Song, Mao Zheng, Haiyun Guo, Dan Zhang, Jinqiao Wang, Tat-Seng Chua
General AI
Reinforcement learning with verifiable rewards (RLVR) has become a standard paradigm for post-training large language models. While Group Relative Policy Optimization (GRPO) is widely adopted, its coarse credit assignment uniformly penalizes failed rollouts, lacking the token-level focus needed to efficiently address s…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-04-04 · Ying Yao
Research Track A · General AI
Unsustainable land-use practices in ecologically sensitive regions threaten biodiversity, water resources, and the livelihoods of millions. This paper presents a deep reinforcement learning (RL) framework for optimizing land-use allocation in the Lake Malawi Basin to maximize total ecosystem service value (ESV). Drawin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-04-06 · Hengrui Gu, Xiaotian Han, Yujing Bian, Kaixiong Zhou
General AI
Reinforcement learning with verifiable rewards (RLVR) has significantly advanced the reasoning capabilities of large language models (LLMs). However, it faces a fundamental limitation termed \textit{restricted exploration}, where the policy rapidly converges to a narrow set of solutions. While entropy regularization is…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-04-07 · Shao Wang, Rui Ren, Lin Gui
General AI
The serving paradigm of large language models (LLMs) is rapidly shifting towards complex multi-agent workflows where specialized agents collaborate over massive shared contexts. While Low-Rank Adaptation (LoRA) enables the efficient co-hosting of these specialized agents on a single base model, it introduces a critical…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-04-07 · Changgeon Ko, Jisu Shin, Hoyun Song, Huije Lee, Eui Jun Hwang, Jong C. Park
General AI
Large language model (LLM) agents are increasingly acting as human delegates in multi-agent environments, where a representative agent integrates diverse peer perspectives to make a final decision. Drawing inspiration from social psychology, we investigate how the reliability of this representative agent is undermined …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-04-07 · Jintao Sun, Hu Zhang, Donglin Di, Gangyi Ding, Zhedong Zheng
General AI
Vision-Language models (VLMs) have demonstrated remarkable capability in ground-view visual understanding but often fracture when deployed on high-altitude Unmanned Aerial Vehicles (UAVs). The failure largely stems from a pronounced domain shift, characterized by tiny and densely packed objects, repetitive textures, an…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-04-08 · Bingxuan Li, Simo Du, Yue Guo
Research Track A · General AI
Clinical expertise improves not only by acquiring medical knowledge, but by accumulating experience that yields reusable diagnostic patterns. Recent LLMs-based diagnostic agents have shown promising progress in clinical reasoning for decision support. However, most approaches treat cases independently, limiting experie…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-04-09 · Shiwan Zhao, Zhihu Wang, Xuyang Zhao, Jiaming Zhou, Caiyue Xu, Chenfei Liu, Liting Zhang, Yuhang Jia, Yanzhe Zhang, Hualong Yu, Zichen Xu, Qicheng Li, Yong Qin
Research Track A · General AI
Post-training has become central to turning pretrained large language models (LLMs) into aligned and deployable systems. Recent progress spans supervised fine-tuning (SFT), preference optimization, reinforcement learning (RL), process supervision, verifier-guided methods, distillation, and multi-stage pipelines. Yet th…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-04-09 · Haolei Xu, Haiwen Hong, Hongxing Li, Rui Zhou, Yang Zhang, Longtao Huang, Hui Xue, Yongliang Shen, Weiming Lu, Yueting Zhuang
General AI
Multimodal Mixture-of-Experts (MoE) models have achieved remarkable performance on vision-language tasks. However, we identify a puzzling phenomenon termed Seeing but Not Thinking: models accurately perceive image content yet fail in subsequent reasoning, while correctly solving identical problems presented as pure tex…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-04-09 · Xingyu Xia, Lekai Zhou, Yujie Tang, Xiaozhou Zhu, Hai Zhu, Wen Yao
General AI
Aerial vision-and-language navigation (Aerial VLN) aims to enable unmanned aerial vehicles (UAVs) to interpret natural language instructions and autonomously navigate complex three-dimensional environments by grounding language in visual perception. This survey provides a critical and analytical review of the Aerial VL…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-04-09 · Emmy Liu, Kaiser Sun, Millicent Li, Isabelle Lee, Lindia Tjuatja, Jen-tse Huang, Graham Neubig
General AI
Large language models (LLMs) can perform remarkably complex tasks, yet the fine-grained details of how these capabilities emerge during pretraining remain poorly understood. Scaling laws on validation loss tell us how much a model improves with additional compute, but not what skills it acquires in which order. To reme…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-04-28 · Zhou Hanlin, Chan Huah Yong
General AI
Long-horizon LLM tasks often fail not because a single answer is unattainable, but because knowledge states drift across rounds, intermediate commitments remain implicit, and interruption fractures the evolving evidence chain. This paper presents ADEMA as a knowledge-state orchestration architecture for long-horizon kn…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-04-28 · Hector G. Rodriguez, Marcus Rohrbach
General AI
Multimodal large language models (MLLMs) achieve ever-stronger performance on visual-language tasks. Even as traditional visual question answering benchmarks approach saturation, reliable deployment requires satisfying low error tolerances in real-world out-of-distribution (OOD) scenarios. Precisely, selective predicti…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-05-06 · Pietro Bonazzi, Youssef Ahmed, Daniel Eckert, Andrea Ronco, Junjie Zeng, Dengxin Dai, Michele Magno
General AI
Despite widespread adoption of smartwatches worldwide, open-benchmarks for wrist-based gesture recognition remain surprisingly limited. In this work, we introduce the first open-access multi-modal benchmark, OpenWatch, for wrist-based gesture recognition using synchronized inertial and physiological sensing on a commer…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-05-07 · Hao Dong, Hongzhao Li, Shupan Li, Muhammad Haris Khan, Eleni Chatzi, Olga Fink
General AI
Despite the growing popularity of Multimodal Domain Generalization (MMDG) for enhancing model robustness, it remains unclear whether reported performance gains reflect genuine algorithmic progress or are artifacts of inconsistent evaluation protocols. Current research is fragmented, with studies varying significantly a…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-05-07 · Yangfu Zhu, Zitong Han, Nianwen Ning, Yuting Wei, Yuandong Wang, Hang Feng, Zhenzhou Shao
General AI
Multimodalpersonalityunderstandingplaysacriticalroleinhuman centered artificial intelligence. Previous work mainly focus on learn-ing rich multimodal representations for video personality under standing. However, they often suffer from potential harm caused by subject bias (e.g., observable age and unobservable mental …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-05-07 · Zhexuan Wang, Xuebo Liu, Li Wang, Zifei Shan, Yutong Wang, Zhenxi Song, Min Zhang
General AI
Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts. While the quality of these prompts is pivotal, jointly optimizing them across interacting agents remains a non-trivial challenge, prim…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-03-17 · Shuvam Banerji Seal, Aheli Poddar, Alok Mishra, Dwaipayan Roy
General AI
This paper introduces AgriIR, a configurable retrieval augmented generation (RAG) framework designed to deliver grounded, domain-specific answers while maintaining flexibility and low computational cost. Instead of relying on large, monolithic models, AgriIR decomposes the information access process into declarative mo…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.5
2026-04-01 · Xiao Zhang, Juntao Lyu, Tianyu Hu, Qianchuan Zhao, Huimin Ma
Research Track A · General AI
Large Language Models (LLMs) generalize across tasks via reusable representations and flexible reasoning, yet remain brittle in real deployment under evolving tasks and continual distribution shift. A common approach is Test-Time Adaptation (TTA), existing ones of which updates models with hand-designed unsupervised ob…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.5
2026-04-01 · Zhanzhi Lou, Hui Chen, Yibo Li, Qian Wang, Bryan Hooi
Research Track B · General AI
Test-Time Learning (TTL) enables language agents to iteratively refine their performance through repeated interactions with the environment at inference time. At the core of TTL is an adaptation policy that updates the actor policy based on experience from previous episodes, thereby improving future behavior. Existing …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-12 · Sandro Andric
General AI
Large language models are increasingly used as agents in social, economic, and policy simulations. A common assumption is that stronger reasoning should improve simulation fidelity. We argue that this assumption can fail when the objective is not to solve a strategic problem, but to sample plausible boundedly rational …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-13 · Yuqing Yang, Tengxiao Liu, Wang Bill Zhu, Taiwei Shi, Linxin Song, Robin Jia
General AI
As LLM-based assistants become persistent and personalized, they must extract and retain useful information from past conversations as memory. However, the types of information worth remembering vary considerably across tasks. We formalize the heterogeneous memory extraction task and introduce BEHEMOTH, a benchmark tha…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-16 · Yiqun Sun, Pengfei Wei, Lawrence B. Hsieh
General AI
Retrieval-Augmented Generation (RAG) grounds LLM responses in external evidence but treats the model as a passive consumer of search results: it never sees how the corpus is organized or what it has not yet retrieved, limiting its ability to backtrack or combine scattered evidence. We present Corpus2Skill, which distil…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-16 · Joongwon Kim, Wannan Yang, Kelvin Niu, Hongming Zhang, Yun Zhu, Eryk Helenowski, Ruan Silva, Zhengxing Chen, Srinivasan Iyer, Manzil Zaheer, Daniel Fried, Hannaneh Hajishirzi, Sanjeev Arora, Gabriel Synnaeve, Ruslan Salakhutdinov, Anirudh Goyal
General AI
Test-time scaling has become a powerful way to improve large language models. However, existing methods are best suited to short, bounded outputs that can be directly compared, ranked or refined. Long-horizon coding agents violate this premise: each attempt produces an extended trajectory of actions, observations, erro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.5
2026-04-17 · Eunju Lee, MiHyeon Kim, JuneHyoung Kwon, Yoonji Lee, JiHyun Kim, Soojin Jang, YoungBin Kim
Research Track A · General AI
Pretrained Vision-Language Models (VLMs) like CLIP show promise in continual learning, but existing Few-Shot Class-Incremental Learning (FSCIL) methods assume homogeneous domains and balanced data distributions, limiting real-world applicability where data arises from heterogeneous disciplines with imbalanced sample av…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-24 · Bin Wu, Arastun Mammadli, Xiaoyu Zhang, Emine Yilmaz
General AI
The rapid growth of AI agent ecosystems is transforming how complex tasks are delegated and executed, creating a new challenge of identifying suitable agents for a given task. Unlike traditional tools, agent capabilities are often compositional and execution-dependent, making them difficult to assess from textual descr…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.4
2026-05-01 · Zi-Bo Qin, Feng-Feng Wei, Tai-You Chen, Wei-Neng Chen
General AI
Distributed blackbox consensus optimization is a fundamental problem in multi-agent systems, where agents must improve a global objective using only local objective queries and limited neighbor communication. Existing methods largely rely on handcrafted update rules and static cooperation patterns, which often struggle…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.4
2026-05-01 · Chengshuai Shi, Wenzhe Li, Xinran Liang, Yizhou Lu, Wenjia Yang, Ruirong Feng, Seth Karten, Ziran Yang, Zihan Ding, Gabriel Sarch, Danqi Chen, Karthik Narasimhan, Chi Jin
General AI
Given the rapidly growing capabilities of vision-language models (VLMs), extending them to interactive decision-making tasks such as video games has emerged as a promising frontier. However, existing approaches either rely on large-scale supervised fine-tuning (SFT) on human trajectories or apply reinforcement learning…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-13 · Buseong Kim, Heejun Gwon
Research Track A · General AI
In large language models performing long-form reasoning, the KV cache grows rapidly with decode length, creating bottlenecks in memory and inference stability. Existing reasoning-oriented KV compression has mostly followed an eviction-centered view: estimate token importance more accurately, then discard lower-ranked e…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-13 · Jiayuan Rao, Tianlin Gui, Haoning Wu, Yanfeng Wang, Weidi Xie
General AI
Modeling open-play soccer tactics is a formidable challenge due to the stochastic, multi-agent nature of the game. Existing computational approaches typically produce single, deterministic trajectory forecasts or focus on highly structured set-pieces, fundamentally failing to capture the inherent variance and branching…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-14 · Yikun Liu, Jiangchao Yao, Weidi Xie, Yanfeng Wang
General AI
Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image, and a relative caption that specifies the desired modification. Despite the rapid development of CIR models, their performance is not well characterized by existing benchmarks, which inherently contain indeter…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-14 · Serdar Kadioglu, Karthik Uppuluri, Akash Singirikonda
General AI
There is growing interest in leveraging large language models (LLMs) for text-to-model translation and optimization tasks. This paper aims to advance this line of research by introducing \textsc{Text2Model} and \textsc{Text2Zinc}. \textsc{Text2Model} is a suite of co-pilots based on several LLM strategies with varying …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-16 · Zihao Xu, John Harvill, Ziwei Fan, Yizhou Sun, Hao Ding, Hao Wang
General AI
Large Language Models (LLMs) incur significant computational and memory costs when processing long prompts, as full self-attention scales quadratically with input length. Token compression aims to address this challenge by reducing the number of tokens representing inputs. However, existing prompt-compression approache…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-17 · Sarthak Mittal, Leo Gagnon, Guillaume Lajoie
General AI
Frontier models have demonstrated exceptional capabilities following the integration of task-reward-based reinforcement learning (RL) into their training pipelines, enabling systems to evolve from pure reasoning models into sophisticated agents. However, debate persists regarding whether RL genuinely instills new skill…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-17 · Shriram Chennakesavalu, Kirill Shmilovich, Hayley Weir, Colin Grambow, John Bradshaw, Patricia Suriana, Chen Cheng, Kangway Chuang
General AI
Large Language Models (LLMs) have the potential to accelerate small molecule drug design due to their ability to reason about information from diverse sources and formats. However, their practical utility remains unclear due to the lack of benchmarks that reflect real-world scenarios. In this work, we introduce a suite…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-20 · Joonhyuk Lee, Virginia Ma, Sarah Zhao, Yash Nair, Asher Spector, Regev Cohen, Emmanuel J. Candès
General AI
Verification of model outputs is rapidly emerging as a key primitive for both training and real-world deployment of large language models (LLMs). In practice, this often involves using imperfect LLM judges and reward models since ground truth acquisition can be time-consuming and expensive. We introduce Fully Unsupervi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-20 · HaeJun Yoo, Yongseop Shin, Insung Lee, Myoung-Wan Koo, Du-Seong Chang
General AI
Audio-text retrieval systems based on Contrastive Language-Audio Pretraining (CLAP) achieve strong performance on traditional benchmarks; however, these benchmarks rely on caption-style queries that differ substantially from real-world search behavior, limiting their assessment of practical retrieval robustness. We pre…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-20 · Yakoub Bazi, Mohamad M. Al Rahhal, Mansour Zuair, Faroun Mohamed
General AI
Change visual question answering (Change VQA) addresses the problem of answering natural-language questions about semantic changes between bi-temporal remote sensing (RS) images. Although vision-language models (VLMs) have recently been studied for temporal RS image understanding, Change VQA remains underexplored in th…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-20 · Salman Rahman, Jingyan Shen, Anna Mordvina, Hamid Palangi, Saadia Gabriel, Pavel Izmailov
General AI
Large language models have achieved significant reasoning improvements through reinforcement learning with verifiable rewards (RLVR). Yet as model capabilities grow, constructing high-quality reward signals becomes increasingly difficult, making it essential to understand when RLVR can succeed under weaker forms of sup…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-21 · Zhiyuan Peng, Wei Tao, Xin Yin, Chenhao Ying, Yuan Luo, Yiwen Guo
General AI
Large language models (LLMs) have achieved strong results in code generation, but their ability to generate GUI applications, especially games, remains insufficiently studied. Existing benchmarks mainly evaluate correctness through test cases, which are inadequate for GUI applications because these systems are interact…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-23 · Run Hao, Zhuoran Tan
General AI
Model Context Protocol (MCP) is increasingly adopted for tool-integrated LLM agents, but its multi-layer design and third-party server ecosystem expand risks across tool metadata, untrusted outputs, cross-tool flows, multimodal inputs, and supply-chain vectors. Existing MCP benchmarks largely measure robustness to mali…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-23 · Zixu Li, Yupeng Hu, Zhiheng Fu, Zhiwei Chen, Yongqi Li, Liqiang Nie
General AI
Composed Image Retrieval (CIR) is an important image retrieval paradigm that enables users to retrieve a target image using a multimodal query that consists of a reference image and modification text. Although research on CIR has made significant progress, prevailing setups still rely simple modification texts that typ…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-24 · William Dawson, Louis Beal, Yoann Curé, Giuseppe Fisicaro, Dorian Rolland, Luigi Genovese
General AI
Large language models (LLMs) and agentic systems have recently demonstrated potential for automating scientific workflows, including atomistic simulations. However, their deployment in high-performance computing (HPC) environments remains limited by the lack of mechanisms ensuring correctness, reproducibility, and safe…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-24 · Mengzhuo Chen, Junjie Wang, Fangwen Mu, Yawen Wang, Zhe Liu, Huanxiang Feng, Qing Wang
General AI
Failure attribution, i.e., identifying the responsible agent and decisive step of a failure, is particularly challenging in LLM-based multi-agent systems (MAS) due to their natural-language reasoning, nondeterministic outputs, and intricate interaction dynamics. A reliable benchmark is therefore essential to guide and …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-27 · Zahra Dehghanighobadi, Asja Fischer
General AI
Long-context reasoning is a critical capability of large language models (LLMs), enabling applications such as long-document understanding, summarization, and code generation. However, efficient autoregressive inference relies on the key-value (KV) cache, whose memory footprint grows linearly with sequence length, lead…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-05-11 · Lungchuan Chen
Research Track A · General AI
Memory consolidation, the process by which transient experiences are transformed into stable, structured representations, is a foundational organizing principle in the human brain, yet it remains largely unexplored as a design principle for modern sequence models. In this work, we leverage established neuroscientific t…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-05-12 · Miaosen Zhang, Xiaohan Zhao, Zhihong Tan, Zhou Huoshen, Yijia Fan, Yifan Yang, Kai Qiu, Bei Liu, Justin Wagle, Chenzhong Yin, Mingxi Cheng, Ji Li, Qi Dai, Chong Luo, Xu Yang, Xin Geng, Baining Guo
Research Track B · General AI
Computer-use agents (CUAs) automate on-screen work, as illustrated by GPT-5.4 and Claude. Yet their reliability on complex, low-frequency interactions is still poor, limiting user trust. Our analysis of failure cases from advanced models suggests a long-tail pattern in GUI operations, where a relatively small fraction …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-05-12 · Jacob Fein-Ashley, Paria Rashidinejad
General AI
Looped Transformers offer a promising alternative to purely feed-forward computation by iteratively refining latent representations, improving language modeling and reasoning. Yet recurrent architectures remain unstable to train, costly to optimize and deploy, and constrained to small, fixed recurrence depths. We intro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.2
2026-04-30 · Hao Chen, Jiaming Liu, Zhonghao Yan, Nuowei Han, Renrui Zhang, Chenyang Gu, Jialin Gao, Ziyu Guo, Siyuan Qian, Yinxi Wang, Peng Jia, Chi-Wing Fu, Shanghang Zhang, Pheng-Ann Heng
General AI
Vision-Language-Action (VLA) models have increasingly incorporated reasoning mechanisms for complex robotic manipulation. However, existing approaches share a critical limitation: whether employing explicit linguistic reasoning that suffers from latency and discretization, or utilizing more expressive continuous latent…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.2
2026-04-30 · Tao Ge, Baolin Peng, Hao Cheng, Jianfeng Gao
General AI
Realistic long-horizon productivity work is strongly conditioned on user-specific computer environments, where much of the work context is stored and organized through directory structures and content-rich artifacts. To scale synthetic data creation for such productivity scenarios, we introduce Synthetic Computers at S…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.2
2026-05-01 · Yan Fang, Mengcheng Lan, Zilong Huang, Weixian Lei, Yunqing Zhao, Yujie Zhong, Yingchen Yu, Qi She, Yao Zhao, Yunchao Wei
General AI
In this paper, we present \textbf{Gen}erative \textbf{L}anguage-\textbf{I}mage \textbf{P}re-training (GenLIP), a minimalist generative pretraining framework for Vision Transformers (ViTs) designed for multimodal large language models (MLLMs). To better align vision encoders with the autoregressive nature of LLMs, GenLI…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.2
2026-05-01 · Yuan Li, Jun Hu, Jiaxin Jiang, Bryan Hooi, Bingsheng He
General AI
Multimodal data plays a critical role in web-based recommendation systems, where information from diverse modalities such as vision and text enhances representation learning. However, real-world multimodal datasets often suffer from modality incompleteness due to sensor failures, annotation scarcity, or privacy constra…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.2
2026-05-01 · Sailesh Panda, Pritam Kadasi, Abhishek Upperwal, Mayank Singh
General AI
Large language models (LLMs) often achieve strong performance on reasoning benchmarks, but final-answer accuracy alone does not show whether they faithfully execute the procedure specified in a prompt. We study this question through a controlled diagnostic benchmark for procedural execution, where models are given a st…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.2
2026-05-04 · Mohamad Khajezade, Fatemeh H. Fard, Mohamed Sami Shehata
General AI
Cross-language code clone detection (X-CCD) is challenging because semantically equivalent programs written in different languages often share little surface similarity. Although large language models (LLMs) have shown promise for semantic clone detection, their use as black-box systems raises concerns about cost, repr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.2
2026-05-04 · Xin Zhang, Qiqi Tao, Jiawei Du, Moyun Liu, Joey Tianyi Zhou
General AI
Continuous latent-space reasoning offers a compact alternative to textual chain-of-thought for multimodal models, enabling high-dimensional visual evidence to be integrated without explicit reasoning tokens. However, we identify a previously overlooked optimization pathology in existing latent visual reasoning methods:…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.0
2026-03-22 · Shenghan Chen, Yiming Liu, Yanzhen Wang, Yujia Wang, Xiankai Lu
Research Track A · General AI
Balancing performance trade-off on long-tail (LT) data distributions remains a long-standing challenge. In this paper, we posit that this dilemma stems from a phenomenon called "tail performance degradation" (the model tends to severely overfit on head classes while quickly forgetting tail classes) and pose a solution …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 13.0
2026-03-25 · Risa Shinoda, Kaede Shiohara, Nakamasa Inoue, Kuniaki Saito, Hiroaki Santo, Fumio Okura
General AI
Understanding animal species from multimodal data poses an emerging challenge at the intersection of computer vision and ecology. While recent biological models, such as BioCLIP, have demonstrated strong alignment between images and textual taxonomic information for species identification, the integration of the audio …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.0
2026-03-25 · Isha Puri, Mehul Damani, Idan Shenfeld, Marzyeh Ghassemi, Jacob Andreas, Yoon Kim
General AI
Given a question, a language model (LM) implicitly encodes a distribution over possible answers. In practice, post-training procedures for LMs often collapse this distribution onto a single dominant mode. While this is generally not a problem for benchmark-style evaluations that assume one correct answer, many real-wor…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.0
2026-03-30 · He Du, Qiming Ge, Jiakai Hu, Aijun Yang, Zheng Cai, Zixian Huang, Sheng Yuan, Qinxiu Cheng, Xinchen Xie, Yicheng Chen, Yining Li, Jiaxing Xie, Huanan Dong, Yaguang Wu, Xiangjun Huang, Jian Yang, Hui Wang, Bowen Zhou, Bowen Li, Qipeng Guo, Kai Chen
General AI
We present Kernel-Smith, a framework for high-performance GPU kernel and operator generation that combines a stable evaluation-driven evolutionary agent with an evolution-oriented post-training recipe. On the agent side, Kernel-Smith maintains a population of executable candidates and iteratively improves them using an…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.0
2026-04-01 · Xingxing Weng, Ruifeng Ni, Chao Pang, XiangYu Hao, Yishan Wang, Xiaokang Zhang, Wei Xu, Gui-Song Xia
Research Track A · General AI
Current remote sensing vision-language models (RS VLMs) demonstrate impressive performance in image interpretation but rely on static training data, limiting their ability to accommodate continuously emerging sensing modalities and downstream tasks. This exposes a fundamental challenge: enabling RS VLMs to continually …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.0
2026-04-05 · Shenzhi Yang, Guangcheng Zhu, Bowen Song, Sharon Li, Haobo Wang, Xing Zheng, Yingfan Ma, Zhongqi Chen, Weiqiang Wang, Gang Chen
General AI
Reinforcement Learning with Verifiable Rewards (RLVR) effectively trains reasoning models that rely on abundant perfect labels, but its vulnerability to unavoidable noisy labels due to expert scarcity remains critically underexplored. In this work, we take the first step toward a systematic analysis of noisy label mech…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.0
2026-04-05 · Satyam Kumar, Saurabh Jha
General AI
Deploying large language models (LLMs) on heterogeneous edge devices demands frameworks that jointly optimize energy efficiency, inference quality, and reliability. Our prior QEIL v1 (Kumar & Jha, 2026) achieved 4.82x IPW improvement but relied on static efficiency factors, greedy optimization, and unverified candidate…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.0
2026-04-06 · Yujian Liu, Jiabao Ji, Li An, Tommi Jaakkola, Yang Zhang, Shiyu Chang
General AI
Agent skills, which are reusable, domain-specific knowledge artifacts, have become a popular mechanism for extending LLM-based agents, yet formally benchmarking skill usage performance remains scarce. Existing skill benchmarking efforts focus on overly idealized conditions, where LLMs are directly provided with hand-cr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.0
2026-04-14 · Amar Gahir, Varshil Patel, Shreyank N Gowda
Research Track A · General AI
Deep neural networks are typically trained by uniformly sampling large datasets across epochs, despite evidence that not all samples contribute equally throughout learning. Recent work shows that progressively reducing the amount of training data can improve efficiency and generalization, but existing methods rely on f…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.0
2026-04-16 · Tingjia Miao, Wenkai Jin, Muhua Zhang, Jinxin Tan, Yuelin Hu, Tu Guo, Jiejun Zhang, Yuhan Wang, Wenbo Li, Yinuo Gao, Shuo Chen, Weiqi Jiang, Yayun Hu, Zixing Lei, Xianghe Pang, Zexi Liu, Yuzhi Zhang, Linfeng Zhang, Kun Chen, Wei Wang, Weinan E, Siheng Chen
General AI
The paradigm of agentic science requires AI systems to conduct robust reasoning and engage in long-horizon, autonomous exploration. However, current scientific benchmarks remain confined to domain knowledge comprehension and complex reasoning, failing to evaluate the exploratory nature and procedural complexity of real…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.0
2026-04-16 · Guy Kaplan, Zorik Gekhman, Zhen Zhu, Lotem Rozner, Yuval Reif, Swabha Swayamdipta, Derek Hoiem, Roy Schwartz
Research Track A · General AI
Large language models are prone to hallucinating factually incorrect statements. A key source of these errors is exposure to new factual information through supervised fine-tuning (SFT), which can increase hallucinations w.r.t. knowledge acquired during pre-training. In this work, we explore whether SFT-induced halluci…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.0
2026-05-07 · Borui Zhang, Bo Zhang, Bo Wang, Wenzhao Zheng, Yuhao Cheng, Liang Tang, Yiqiang Yan, Jie Zhou, Jiwen Lu
Research Track B · General AI
GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance. Utilizing the proposed \textbf{Masked Prediction Distribution (MPD)} attribution metho…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.0
2026-05-07 · Xin Gao, Ruiyi Zhang, Meixi Du, Peijia Qin, Pengtao Xie
Research Track A · General AI
Despite the success of large language models (LLMs) on general-purpose tasks, their performance in highly specialized domains such as biomedicine remains unsatisfactory. A key limitation is the inability of LLMs to effectively leverage biomedical tools, which clinical experts and biomedical researchers rely on extensiv…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.0
2026-05-07 · Hao Ye, Jisheng Dang, Junfeng Fang, Bimei Wang, Yizhou Zhang, Ning Lv, Wencan Zhang, Hong Peng, Bin Hu, Tat-Seng Chua
Research Track A · General AI
Recent extensive research has demonstrated that the enhanced reasoning capabilities acquired by models through Reinforcement Learning with Verifiable Rewards (RLVR) are primarily concentrated within the rank-1 components. Predicated on this observation, we employed Periodic Rank-1 Substitution and identified a counteri…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.9
2026-04-29 · Mingze Li, Yu Rong, Songyou Li, Lihong Wang, Jiacheng Cen, Liming Wu, Anyi Li, Zongzhao Li, Qiuliang Liu, Rui Jiao, Tian Bian, Pengju Wang, Hao Sun, Jianfeng Zhang, Ji-Rong Wen, Deli Zhao, Shifeng Jin, Tingyang Xu, Wenbing Huang
General AI
The discovery of novel materials is critical for global energy and quantum technology transitions. While deep learning has fundamentally reshaped this landscape, existing predictive or generative models typically operate in isolation, lacking the autonomous orchestration required to execute the full discovery process. …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.9
2026-05-01 · Dongxin Guo, Jikun Wu, Siu Ming Yiu
Research Track B · General AI
AI agents execute tens to hundreds of chained LLM calls per task, yet GPU schedulers treat each call as independent, discarding gigabytes of intermediate state between steps and inflating end-to-end latency by 3-8x. We argue that this request-level abstraction is fundamentally mismatched to compound AI workloads, and p…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-03-25 · Bingqing Wei, Zhongyu Xia, Dingai Liu, Xiaoyu Zhou, Zhiwei Lin, Yongtao Wang
General AI
Vision-language models (VLMs) have shown remarkable general capabilities, yet embodied agents built on them fail at complex tasks, often skipping critical steps, proposing invalid actions, and repeating mistakes. These failures arise from a fundamental gap between the static training data of VLMs and the physical inter…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-03-26 · Zirui Zhang, Haoyu Dong, Kexin Pei, Chengzhi Mao
General AI
Robust perception and reasoning require consistency across sensory modalities. Yet current multimodal models often violate this principle, yielding contradictory predictions for visual and textual representations of the same concept. Rather than masking these failures with standard voting mechanisms, which can amplify …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-03-26 · Sicheng Zuo, Yuxuan Li, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu
General AI
Vision-language-action models have reshaped autonomous driving to incorporate languages into the decision-making process. However, most existing pipelines only utilize the language modality for scene descriptions or reasoning and lack the flexibility to follow diverse user instructions for personalized driving. To addr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-03-27 · Mahesh Bhosale, Abdul Wasi, Shantam Srivastava, Shifa Latif, Tianyu Luan, Mingchen Gao, David Doermann, Xuan Gong
General AI
While powerful in image-conditioned generation, multimodal large language models (MLLMs) can display uneven performance across demographic groups, highlighting fairness risks. In safety-critical clinical settings, such disparities risk producing unequal diagnostic narratives and eroding trust in AI-assisted decision-ma…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-03-29 · Zhaopeng Feng, Liangcai Su, Zhen Zhang, Xinyu Wang, Xiaotian Zhang, Xiaobin Wang, Runnan Fang, Qi Zhang, Baixuan Li, Shihao Cai, Rui Ye, Hui Chen, Jiang Yong, Joey Tianyi Zhou, Chenxiong Qian, Pengjun Xie, Bryan Hooi, Zuozhu Liu, Jingren Zhou
Research Track B · General AI
As large language models (LLMs) evolve into autonomous agents for long-horizon information-seeking, managing finite context capacity has become a critical bottleneck. Existing context management methods typically commit to a single fixed strategy throughout the entire trajectory. Such static designs may work well in so…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-03-30 · Ikechukwu Uchendu, Swati Goel, Karly Hou, Ebrahim Songhori, Kuang-Huei Lee, Joe Wenjie Jiang, Vijay Janapa Reddi, Vincent Zhuang
General AI
We propose using Vision-Language Models (VLMs) for macro placement in chip floorplanning, a complex optimization task that has recently shown promising advancements through machine learning methods. Because human designers rely heavily on spatial reasoning to arrange components on the chip canvas, we hypothesize that V…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-04-01 · Yutao Yang, Junsong Li, Qianjun Pan, Jie Zhou, Kai Chen, Qin Chen, Jingyuan Zhao, Ningning Zhou, Xin Li, Liang He
Research Track A · General AI
Existing methods for AI psychological counselors predominantly rely on supervised fine-tuning using static dialogue datasets. However, this contrasts with human experts, who continuously refine their proficiency through clinical practice and accumulated experience. To bridge this gap, we propose an Experience-Driven Li…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-04-02 · Ruozhen He, Nisarg A. Shah, Qihua Dong, Zilin Xiao, Jaywon Koo, Vicente Ordonez
General AI
Existing visual grounding benchmarks primarily evaluate alignment between image regions and literal referring expressions, where models can often succeed by matching a prominent named category. We explore a complementary and more challenging setting of scenario-based visual grounding, where the target must be inferred …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-04-02 · Zhengxi Lu, Zhiyuan Yao, Jinyang Wu, Chengcheng Han, Qi Gu, Xunliang Cai, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen
General AI
Agent skills, structured packages of procedural knowledge and executable resources that agents dynamically load at inference time, have become a reliable mechanism for augmenting LLM agents. Yet inference-time skill augmentation is fundamentally limited: retrieval noise introduces irrelevant guidance, injected skill co…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-04-03 · Renze Lou, Baolin Peng, Wenlin Yao, Qianhui Wu, Hao Cheng, Suman Nath, Wenpeng Yin, Jianfeng Gao
Research Track B · General AI
As web agents rapidly evolve, an increasing body of work has moved beyond conventional atomic browser interactions and explored tool use as a higher-level action paradigm. Although prior studies have shown the promise of tools, their conclusions are often drawn from limited experimental scales and sometimes non-compara…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-04-06 · Tien Nguyen, Muhammad Ali Gulzar, Kirshanthan Sundararajah
General AI
Scientific software relies on high-precision computation, yet finite floating-point representations can introduce precision errors that propagate in safety-critical domains. Despite the growing use of large language models (LLMs) in scientific applications, their reliability in handling floating-point numerical stabili…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-04-06 · Weian Mao, Xi Lin, Wei Huang, Yuxin Xie, Tianfu Fu, Bohan Zhuang, Song Han, Yukang Chen
General AI
Extended reasoning in large language models (LLMs) creates severe KV cache memory bottlenecks. Leading KV cache compression methods estimate KV importance using attention scores from recent post-RoPE queries. However, queries rotate with position during RoPE, making representative queries very few, leading to poor top-…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-04-07 · Ahmet Rasim Emirdagi, Süleyman Aslan, Mısra Yavuz, Görkay Aydemir, Yunus Bilge Kurt, Nasrin Rahimi, Burak Can Biner, M. Akın Yılmaz
General AI
Metal artifacts from high-attenuation implants severely degrade CT image quality, obscuring critical anatomical structures and posing a challenge for standard deep learning methods that require extensive paired training data. We propose a paradigm shift: reframing artifact reduction as an in-context reasoning task by a…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-04-09 · Jiaquan Zhang, Chaoning Zhang, Shuxu Chen, Zhenzhen Huang, Pengcheng Zheng, Zhicheng Wang, Ping Guo, Fan Mo, Sung-Ho Bae, Jie Zou, Jiwei Wei, Yang Yang
Research Track A · General AI
Although LLM agents can leverage tools for complex tasks, they still need memory to maintain cross-turn consistency and accumulate reusable information in long-horizon interactions. However, retrieval-based external memory systems incur low online overhead but suffer from unstable accuracy due to limited query construc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-04-28 · Wei-Chun Chen, Yu-Xuan Chen, I-Fang Chung, Ying-Jia Lin
General AI
Accurate nutrient estimation from unstructured recipe text is an important yet challenging problem in dietary monitoring, due to ambiguous ingredient terminology and highly variable quantity expressions. We systematically evaluate models spanning a wide range of representational capacity, from lexical matching methods …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-04-28 · Pengcheng Fang, Yuxia Chen, Xiaohao Cai
General AI
Video temporal grounding (VTG) aims to localize the start and end timestamps of the event described by a given query within an untrimmed video. Despite the strong open-world video understanding and recognition ability of video language large models (Vid-LLMs), outputting precise temporal grounding information remains c…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-05-07 · Omar El Khalifi, Thomas Rossi, Oscar Fossey, Thibault Fouque, Ulysse Mizrahi, Philip Torr, Ivan Laptev, Fabio Pizzati, Baptiste Bellot-Gurlet
General AI
For artistic applications, video generation requires fine-grained control over both performance and cinematography, i.e., the actor's motion and the camera trajectory. We present ActCam, a zero-shot method for video generation that jointly transfers character motion from a driving video into a new scene and enables per…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-05-07 · Lujia Zhong, Yihao Xia, Jianwei Zhang, Shuo huang, Jiaxin Yue, Mingyang Xia, Yonggang Shi
General AI
Multimodal neuroimaging analysis often involves complex, modality-specific preprocessing workflows that require careful configuration, quality control, and coordination across heterogeneous toolchains. Beyond preprocessing, downstream statistical analysis and disease classification commonly require task-specific code, …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.8
2026-05-11 · Ihor Stepanov, Oleksandr Lukashov, Mykhailo Shtopko, Vivek Kalyanarangan
General AI
Joint named entity recognition (NER) and relation extraction (RE) is a fundamental task in natural language processing for constructing knowledge graphs from unstructured text. While recent approaches treat NER and RE as separate tasks requiring distinct models, we introduce GLiNER-Relex, a unified architecture that ex…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.6
2026-05-11 · Pau de las Heras Molins, Beyazit Yalcinkaya, Lasse Peters, David Fridovich-Keil, Georgios Bakirtzis
General AI
Multi-objective reinforcement learning (MORL) allows a user to express preference over outcomes in terms of the relative importance of the objectives, but standard metrics cannot capture whether changes in preference reliably change the agent's behavior in the intended way, a property termed controllability. As a resul…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-04-01 · Mohammad R. Abu Ayyash
Research Track A · General AI
We present Brainstacks, a modular architecture for continual multi-domain fine-tuning of large language models that packages domain expertise as frozen adapter stacks composing additively on a shared frozen base at inference. Five interlocking components: (1) MoE-LoRA with Shazeer-style noisy top-2 routing across all s…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.5
2026-04-01 · Henry Peng Zou, Chunyu Miao, Wei-Chieh Huang, Yankai Chen, Yue Zhou, Hanrong Zhang, Yaozu Wu, Liancheng Fang, Zhengyao Gu, Zhen Zhang, Kening Zheng, Fangxin Wang, Yi Nian, Shanghao Li, Wenzhe Fan, Langzhou He, Weizhi Zhang, Xue Liu, Philip S. Yu
Research Track B · General AI
As LLM agents transition from short, static problem solving to executing complex, long-horizon tasks in dynamic environments, the ability to handle user interruptions, such as adding requirement or revising goals, during mid-task execution is becoming a core requirement for realistic deployment. However, existing bench…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-04-10 · Jingyu Zhang, Tianjian Li, William Jurayj, Hongyuan Zhan, Benjamin Van Durme, Daniel Khashabi
General AI
Large language model agents receive instructions from many sources-system messages, user prompts, tool outputs, and more-each carrying different levels of trust and authority. When these instructions conflict, models must reliably follow the highest-privilege instruction to remain safe and effective. The dominant parad…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-04-12 · Yu Li, Xiaoran Shang, Qizhi Pei, Yun Zhu, Xin Gao, Honglin Lin, Zhanping Zhong, Zhuoshi Pan, Zheng Liu, Xiaoyang Wang, Conghui He, Dahua Lin, Feng Zhao, Lijun Wu
General AI
Post-training data plays a pivotal role in shaping the capabilities of Large Language Models (LLMs), yet datasets are often treated as isolated artifacts, overlooking the systemic connections that underlie their evolution. To disentangle these complex relationships, we introduce the concept of data lineage to the LLM e…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-04-20 · Jiaqi Wang, Haoge Deng, Ting Pan, Yang Liu, Chengyuan Wang, Fan Zhang, Yonggang Qi, Xinlong Wang
General AI
Uniform Discrete Diffusion Model (UDM) has recently emerged as a promising paradigm for discrete generative modeling; however, its integration with reinforcement learning remains largely unexplored. We observe that naively applying GRPO to UDM leads to training instability and marginal performance gains. To address thi…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-04-24 · Shaoang Li, Yanhang Shi, Yufei Li, Mingfu Liang, Xiaohan Wei, Yunchen Pu, Fei Tian, Chonglin Sun, Frank Shyu, Luke Simon, Sandeep Pandey, Xi Liu, Jian Li
General AI
Large Language Models (LLMs) can reason well, yet often miss decisive evidence when it is buried in long, noisy contexts. We introduce HiLight, an Evidence Emphasis framework that decouples evidence selection from reasoning for frozen LLM solvers. HiLight avoids compressing or rewriting the input, which can discard or …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-04-27 · Han Wang, Xiaodong Yu, Jialian Wu, Jiang Liu, Ximeng Sun, Mohit Bansal, Zicheng Liu
General AI
Large language models (LLMs) achieve strong reasoning performance by allocating substantial computation at inference time, often generating long and verbose reasoning traces. While recent work on efficient reasoning reduces this overhead through length-based rewards or pruning, many approaches are post-trained under a …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.5
2026-05-07 · Bomin Wang, Hangqi Zhou, Yibo Gao, Xiahai Zhuang
Research Track A · General AI
Continual learning (CL) is essential for deploying medical image segmentation models in clinical environments where imaging domains, anatomical targets, and diagnostic tasks evolve over time. However, continual segmentation still faces three main challenges. First, the scenarios for this task remain insufficiently stan…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-05-12 · Zhennan Chen, Junwei Zhu, Xu Chen, Jiangning Zhang, Jiawei Chen, Zhuoqi Zeng, Wei Zhang, Chengjie Wang, Jian Yang, Ying Tai
General AI
Pixel diffusion models have recently regained attention for visual generation. However, training advanced pixel-space models from scratch demands prohibitive computational and data resources. To address this, we propose the Latent-to-Pixel (L2P) transfer paradigm, an efficient framework that directly harnesses the rich…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-05-12 · Lezhong Wang, Mehmet Onurcan Kaya, Siavash Bigdeli, Jeppe Revall Frisvad
General AI
Recent single-image relighting methods, powered by advanced generative models, have achieved impressive photorealism on synthetic benchmarks. However, their effectiveness in the complex visual landscape of the real world remains largely unverified. A critical gap exists, as current datasets are typically designed for m…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.4
2026-04-29 · Karthik Charan Raghunathan, Christian Metzner, Laura Kriener, Melika Payvand
Research Track A · General AI
In a continual learning setting, we require a model to be plastic enough to learn a new task and stable enough to not disturb previously learned capabilities. We argue that this dilemma has an architectural root. A finite network has limited representational and plastic resources, yet the required capacity depends on p…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-13 · Manuela González-González, Soufiane Belharbi, Muhammad Osama Zeeshan, Masoumeh Sharafi, Muhammad Haseeb Aslam, Lorenzo Sia, Nicolas Richet, Marco Pedersoli, Alessandro Lameiras Koerich, Simon L Bacon, Eric Granger
General AI
Using behavioural science, health interventions focus on behaviour change by providing a framework to help patients acquire and maintain healthy habits that improve medical outcomes. In-person interventions are costly and difficult to scale, especially in resource-limited regions. Digital health interventions offer a c…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-13 · Jinhui Ye, Ning Gao, Senqiao Yang, Jinliang Zheng, Zixuan Wang, Yuxin Chen, Pengguang Chen, Yilun Chen, Shu Liu, Jiaya Jia
General AI
Vision-Language-Action (VLA) models have recently emerged as a promising paradigm for building general-purpose robotic agents. However, the VLA landscape remains highly fragmented and complex: as existing approaches vary substantially in architectures, training data, embodiment configurations, and benchmark-specific en…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-14 · Anne Lee, Gurudutt Hosangadi
Research Track A · General AI
The rapid advancement of AI has changed the character of HPC usage such as dimensioning, provisioning, and execution. Not only has energy demand been amplified, but existing rudimentary continual learning capabilities limit ability of AI to effectively manage HPCs. This paper reviews emerging directions beyond monolith…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-14 · Guoxin Chen, Jie Chen, Lei Chen, Jiale Zhao, Fanzhe Meng, Wayne Xin Zhao, Ruihua Song, Cheng Chen, Ji-Rong Wen, Kai Jia
General AI
Autonomous AI research has advanced rapidly, but long-horizon ML research engineering remains difficult: agents must sustain coherent progress across task comprehension, environment setup, implementation, experimentation, and debugging over hours or days. We introduce AiScientist, a system for autonomous long-horizon e…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-16 · Kiran Purohit, Ramasuri Narayanam, Soumyabrata Pal
General AI
Speculative decoding (SD) accelerates large language model inference by allowing a lightweight draft model to propose outputs that a stronger target model verifies. However, its token-centric nature allows erroneous steps to propagate. Prior approaches mitigate this using external reward models, but incur additional la…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-16 · Chen Wang, Lai Wei, Yanzhi Zhang, Chenyang Shao, Zedong Dan, Weiran Huang, Ge Lan, Yue Wang
General AI
Recent advances in reinforcement learning (RL) have improved the reasoning capabilities of large language models (LLMs) and vision-language models (VLMs). However, the widely used Group Relative Policy Optimization (GRPO) consistently suffers from entropy collapse, causing the policy to converge prematurely and lose di…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-17 · Ruiyang Wang, Hao-Lun Hsu, Jiwoo Kim, Miroslav Pajic
General AI
Coordinating multi-robot systems (MRS) to search in unknown environments is particularly challenging for tasks that require semantic reasoning beyond geometric exploration. Classical coordination strategies rely on frontier coverage or information gain and cannot incorporate high-level task intent, such as searching fo…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-18 · Jinchang Zhu, Jindong Li, Cheng Zhang, Jiahong Liu, Menglin Yang
Research Track A · General AI
Long-term memory is a critical challenge for Large Language Model agents, as fixed context windows cannot preserve coherence across extended interactions. Existing memory systems represent conversation history as unstructured embedding vectors, retrieving information through semantic similarity. This paradigm fails to …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-19 · Ziqing Zhuang, Linhai Zhang, Jiasheng Si, Deyu Zhou, Yulan He
Research Track A · General AI
Large language models (LLMs) have demonstrated strong reasoning capabilities, and as existing approaches for enhancing LLM reasoning continue to mature, increasing attention has shifted toward meta-reasoning as a promising direction for further improvement. However, most existing meta-reasoning methods remain episodic:…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-21 · Yutian Chen, Shi Guo, Renbiao Jin, Tianshuo Yang, Xin Cai, Yawen Luo, Mingxin Yang, Mulin Yu, Linning Xu, Tianfan Xue
General AI
Sparse-view 3D reconstruction is essential for modeling scenes from casual captures, but remain challenging for non-generative reconstruction. Existing diffusion-based approaches mitigates this issues by synthesizing novel views, but they often condition on only one or two capture frames, which restricts geometric cons…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-21 · Zhihong Zhang, Jie Zhao, Xiaojian Huang, Jin Xu, Zhuodong Luo, Xin Liu, Jiansheng Wei, Xuejin Chen
General AI
Multimodal reward models (MRMs) play a crucial role in aligning Multimodal Large Language Models (MLLMs) with human preferences. Training a good MRM requires high-quality multimodal preference data. However, existing preference datasets face three key challenges: lack of granularity in preference strength, textual styl…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-21 · Feihao Fang, My T. Thai, Yuanyuan Lei
General AI
Large Language Models (LLMs) still struggle with multi-step logical reasoning. Existing approaches either purely refine the reasoning chain in natural language form or attach a symbolic solver as an external module. In this work, we instead ask whether LLMs contain a shared internal logical subspace that simultaneously…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-21 · Perry Dong, Alexander Swerdlow, Dorsa Sadigh, Chelsea Finn
General AI
Some of the most performant reinforcement learning algorithms today can be prohibitively expensive as they use test-time scaling methods such as sampling multiple action candidates and selecting the best one. In this work, we propose FASTER, a method for getting the benefits of sampling-based test-time scaling of diffu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-21 · Yiwen Qiu, Linjuan Wu, Yizhou Liu, Yuchen Yan, Jin Ma, Xu Tan, Yao Hu, Daoxin Zhang, Wenqi Zhang, Weiming Lu, Jun Xiao, Yongliang Shen
General AI
Large language models have achieved remarkable progress on complex reasoning tasks. However, they often implicitly fabricate information when inputs are incomplete, producing confident but unreliable conclusions -- a failure mode we term ungrounded reasoning. We argue that this issue arises not from insufficient reason…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-22 · Yupeng Zheng, Xiang Li, Songen Gu, Yuhang Zheng, Shuai Tian, Weize Li, Linbo Wang, Senyu Fei, Pengfei Li, Yinfeng Gao, Zebin Xing, Yilun Chen, Qichao Zhang, Haoran Li, Wenchao Ding
General AI
Recent advances in Vision-Language-Action (VLA) models have opened new avenues for robot manipulation, yet existing methods exhibit limited efficiency and a lack of high-level knowledge and spatial awareness. To address these challenges, we propose PokeVLA, a lightweight yet powerful foundation model for embodied manip…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-22 · Jiahao Xie, Alessio Tonioni, Nathalie Rauschmayr, Federico Tombari, Bernt Schiele
General AI
Large vision-language models (LVLMs) have demonstrated impressive performance in various multimodal understanding and reasoning tasks. However, they still struggle with object hallucinations, i.e., the claim of nonexistent objects in the visual input. To address this challenge, we propose Region-aware Chain-of-Verifica…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-23 · Ceyuan Yang, Zhijie Lin, Yang Zhao, Fei Xiao, Hao He, Qi Zhao, Chaorui Deng, Kunchang Li, Zihan Ding, Yuwei Guo, Fuyun Wang, Fangqi Zhu, Xiaonan Nie, Shenhan Zhu, Shanchuan Lin, Hongsheng Li, Weilin Huang, Guang Shi, Haoqi Fan
General AI
We present Omni, a unified multimodal model natively trained on diverse modalities, including text, images, videos, 3D geometry, and hidden representations. We find that such training enables Context Unrolling, where the model explicitly reasons across multiple modal representations before producing predictions. This p…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-24 · Yunquan Chen, Haoyu Chen
General AI
Understanding social dominance in animal behavior is critical for neuroscience and behavioral studies. In this work, we explore the capability of Multimodal Large Language Models(MLLMs) to analyze raw behavioral video of mice and predict their dominance hierarchy. We introduce MTT-Bench, a novel benchmark comprising an…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-27 · Cheng Wang, Zhibin He, Zhihao Peng, Shengyuan Liu, Yufan Hu, Lichao Sun, Xiang Li, Yixuan Yuan
General AI
Agentic artificial intelligence systems promise to accelerate scientific workflows, but neuroimaging poses unique challenges: heterogeneous modalities (sMRI, fMRI, dMRI, EEG), long multi-stage pipelines, and persistent reproducibility risks. To address this gap, we present NeuroClaw, a domain-specialized multi-agent re…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-27 · Amal AKLI, Mike PAPADAKIS, Maxime CORDY, Yves Le TRAON
General AI
Large language models are increasingly used for code generation, yet the correctness of their outputs depends not only on model capability but also on how tasks are specified. Prior studies demonstrate that small changes in natural language prompts, particularly under-specification can substantially reduce code correct…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.2
2026-04-30 · Gyoung S. Na, Chanyoung Park
Research Track A · General AI
Deriving governing equations from empirical observations is a longstanding challenge in science. Although artificial intelligence (AI) has demonstrated substantial capabilities in function approximation, the discovery of explainable and extrapolatable equations remains a fundamental limitation of modern AI, posing a ce…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.2
2026-04-30 · Sriram Narayanan, Ziyu Jiang, Srinivasa Narasimhan, Manmohan Chandraker
General AI
Modern video diffusion models excel at appearance synthesis but still struggle with physical consistency: objects drift, collisions lack realistic rebound, and material responses seldom match their underlying properties. We present PhyCo, a framework that introduces continuous, interpretable, and physically grounded co…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.2
2026-05-01 · Pavlin G. Poličar, Andraž Pevcin, Blaž Zupan
General AI
Generating diverse, readable statistical charts from tabular data remains challenging for LLMs, as many failures become apparent after rendering and are not detectable from data or code alone. Existing chart datasets also rarely provide fully aligned artifacts, such as executable code, dataset context, and question-ans…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.2
2026-05-04 · Jiujiu Chen, Yazheng Liu, Sihong Xie, Hui Xiong
General AI
Large language models excel at complex reasoning, yet evaluating their intermediate steps remains challenging. Although process reward models provide step-wise supervision, they often suffer from a risk compensation effect, where incorrect steps are offset by later correct ones, assigning high rewards to flawed reasoni…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.0
2026-02-02 · Yanjia Huang, Yunuo Chen, Ying Jiang, Jinru Han, Zhengzhong Tu, Yin Yang, Chenfanfu Jiang
General AI
The ability to transform a flat sheet into a complex three-dimensional structure is a fundamental test of physical intelligence. Unlike cloth manipulation, origami is governed by strict geometric axioms and hard kinematic constraints, where a single invalid crease or collision can invalidate the entire folding sequence…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.0
2026-03-04 · Hung Tran, Langston Nashold, Rayan Krishnan, Antoine Bigeard, Alex Gu
Research Track B · General AI
Code generation has emerged as one of AI's highest-impact use cases, yet existing benchmarks measure isolated tasks rather than the complete "zero-to-one" process of building a working application from scratch. We introduce Vibe Code Bench, a benchmark of 100 web application specifications (50 public validation, 50 hel…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 12.0
2026-03-19 · Haochen Zhao, Shaoyang Cui
Research Track B · General AI
Autonomous web agents such as \textbf{OpenClaw} are rapidly moving into high-impact real-world workflows, but their security robustness under live network threats remains insufficiently evaluated. Existing benchmarks mainly focus on static sandbox settings and content-level prompt attacks, which leaves a practical gap …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 12.0
2026-03-22 · Liang Ding
Research Track B · General AI
LLM-as-Judge evaluation fails agent tasks because a fixed rubric cannot capture what matters for this task: code debugging demands Correctness and Error Handling; web navigation demands Goal Alignment and Action Efficiency. We present ADARUBRIC, which closes this gap by generating task-specific evaluation rubrics on th…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.0
2026-03-29 · Meituan LongCat Team, Bin Xiao, Chao Wang, Chengjiang Li, Chi Zhang, Chong Peng, Hang Yu, Hao Yang, Haonan Yan, Haoze Sun, Haozhe Zhao, Hong Liu, Hui Su, Jiaqi Zhang, Jiawei Wang, Jing Li, Kefeng Zhang, Manyuan Zhang, Minhao Jing, Peng Pei, Quan Chen, Taofeng Xue, Tongxin Pan, Xiaotong Li, Xiaoyang Li, Xiaoyu Zhao, Xing Hu, Xinyang Lin, Xunliang Cai, Yan Bai, Yan Feng, Yanjie Li, Yao Qiu, Yerui Sun, Yifan Lu, Ying Luo, Yipeng Mei, Yitian Chen, Yuchen Xie, Yufang Liu, Yufei Chen, Yulei Qian, Yuqi Peng, Zhihang Yu, Zhixiong Han, Changran Wang, Chen Chen, Dian Zheng, Fengjiao Chen, Ge Yang, Haowei Guo, Haozhe Wang, Hongyu Li, Huicheng Jiang, Jiale Hong, Jialv Zou, Jiamu Li, Jianping Lin, Jiaxing Liu, Jie Yang, Jing Jin, Jun Kuang, Juncheng She, Kunming Luo, Kuofeng Gao, Lin Qiu, Linsen Guo, Mianqiu Huang, Qi Li, Qian Wang, Rumei Li, Siyu Ren, Wei Wang, Wenlong He, Xi Chen, Xiao Liu, Xiaoyu Li, Xu Huang, Xuanyu Zhu, Xuezhi Cao, Yaoming Zhu, Yifei Cao, Yimeng Jia, Yizhen Jiang, Yufei Gao, Zeyang Hu, Zhenlong Yuan, Zijian Zhang, Ziwen Wang
General AI
The prevailing Next-Token Prediction (NTP) paradigm has driven the success of large language models through discrete autoregressive modeling. However, contemporary multimodal systems remain language-centric, often treating non-linguistic modalities as external attachments, leading to fragmented architectures and subopt…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.0
2026-03-30 · Alkis Sygkounas, Rishi Hazra, Andreas Persson, Pedro Zuidberg Dos Martires, Amy Loutfi
Research Track A · General AI
A central challenge in building continually improving agents is that training environments are typically static or manually constructed. This restricts continual learning and generalization beyond the training distribution. We address this with COvolve, a co-evolutionary framework that leverages large language models (…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.0
2026-03-30 · Deepak Akkil, Mowafak Allaham, Amal Raj, Tamer Abuelsaad, Ravi Kokku
Research Track B · General AI
Reliable evaluation of AI agents operating in complex, real-world environments requires methodologies that are robust, transparent, and contextually aligned with the tasks agents are intended to perform. This study identifies persistent shortcomings in existing AI agent evaluation practices that are particularly acute …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.0
2026-04-07 · Chenghao Li, Jun Liu, Songbo Zhang, Huadong Jian, Hao Ni, Lik-Hang Lee, Sung-Ho Bae, Guoqing Wang, Yang Yang, Chaoning Zhang
General AI
Multimodal LLM agents operating in complex game environments must continually reuse past experience to solve new tasks efficiently. In this work, we propose Echo, a transfer-oriented memory framework that enables agents to derive actionable knowledge from prior interactions rather than treating memory as a passive repo…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.0
2026-04-09 · Ziqi Cai, Taoyu Yang, Zheng Chang, Si Li, Han Jiang, Shuchen Weng, Boxin Shi
General AI
Diffusion models have achieved remarkable progress in video generation, but their controllability remains a major limitation. Key scene factors such as layout, lighting, and camera trajectory are often entangled or only weakly modeled, restricting their applicability in domains like filmmaking and virtual production wh…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.0
2026-04-14 · Chuang Peng, Wei Zhang, Renshuai Tao, Xinhao Zhang, Jian Yang
Research Track B · General AI
Text-based web agents offer computational efficiency for autonomous web navigation, yet developing robust agents remains challenging due to the noisy and heterogeneous nature of real-world HTML. Standard Supervised Fine-Tuning (SFT) approaches fail in two critical dimensions: they lack discrimination capabilities to re…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.0
2026-04-15 · Aaron Pache, Mark CW van Rossum
Research Track A · General AI
Synaptic plasticity is metabolically expensive, yet animals continuously update their internal models without exhausting energy reserves. However, when artificial neural networks are trained, the network parameters are typically updated on every sample that is presented, even if the sample was classified correctly. Ins…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.0
2026-04-27 · Hongxin Li, Yuntao Chen, Zhaoxiang Zhang
Research Track B · General AI
Graphical User Interface (GUI) element grounding (precisely locating elements on screenshots based on natural language instructions) is fundamental for agents interacting with GUIs. Deploying this capability directly on resource-constrained devices like mobile phones is increasingly critical for GUI agents requiring lo…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.0
2026-05-07 · Xinmiao Huang, Jinwei Hu, Rajarshi Roy, Changshun Wu, Yi Dong, Xiaowei Huang
Research Track B · General AI
Large language model (LLM) agents now execute long, tool-using tasks where final outcome checks can arrive too late for intervention. Online warning requires lightweight prefix monitors over heterogeneous traces, but hand-authored event schemas are brittle and deployment-time LLM judging is costly. We introduce PrefixG…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.0
2026-05-08 · Zhichao Liu, Wenbo Pan, Haining Yu, Ge Gao, Tianqing Zhu, Xiaohua Jia
Research Track B · General AI
Browser agents are increasingly deployed in long-horizon tasks, which require executing extended action chains to accomplish user goals. However, this prolonged execution process provides attackers with more opportunities to inject malicious instructions. Existing prompt injection attacks against browser agents expose …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-03-25 · Zhuoran Li, Zhiyang Li, Kaijun Zhou, Jinyu Gu
Research Track A · General AI
Despite the promise of Vision-Language-Action (VLA) models as generalist robotic controllers, their robustness against perceptual noise and environmental variations in out-of-distribution (OOD) tasks remains fundamentally limited by the absence of long-term memory, causal failure attribution, and dynamic intervention c…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-03-26 · Zehao Wang, Huaide Jiang, Shuaiwu Dong, Yuping Wang, Hang Qiu, Jiachen Li
General AI
Human driving behavior is inherently personal, which is shaped by long-term habits and influenced by short-term intentions. Individuals differ in how they accelerate, brake, merge, yield, and overtake across diverse situations. However, existing end-to-end autonomous driving systems either optimize for generic objectiv…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-03-27 · Wonyoung Lee, Wooseong Jeong, Kuk-Jin Yoon
General AI
Model merging combines independently fine-tuned checkpoints without joint multi-task training. In the era of foundation-model, fine-tuning with Low-Rank Adaptation (LoRA) is prevalent, making LoRA merging a promising target. Existing approaches can work in homogeneous settings where all target tasks are classification …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-03-30 · Haozhe Qi, Kevin Qu, Mahdi Rad, Rui Wang, Alexander Mathis, Marc Pollefeys
General AI
Long video understanding remains challenging for Multi-modal Large Language Models (MLLMs) due to high memory costs and context-length limits. Prior approaches mitigate this by scoring and selecting frames/tokens within short clips, but they lack a principled mechanism to (i) compare relevance across distant video clip…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-03-30 · Philip Schroeder, Thomas Weng, Karl Schmeckpeper, Eric Rosen, Stephen Hart, Ondrej Biza
General AI
Vision-language models (VLMs) have shown impressive capabilities across diverse tasks, motivating efforts to leverage these models to supervise robot learning. However, when used as evaluators in reinforcement learning (RL), today's strongest models often fail under partial observability and distribution shift, enablin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-04-02 · Minda Zhao, Yutong Yang, Chufei Peng, Rachel Gonsalves, Weiyue Li, Ruyi Yang, Zhixi Liu, Mengyu Wang
General AI
Emotional tone is pervasive in human communication, yet its influence on large language model (LLM) behaviour remains unclear. Here, we examine how first-person emotional framing in user-side queries affect LLM performance across six benchmark domains, including mathematical reasoning, medical question answering, readi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-04-02 · Klemens Iten, Bruce Lee, Chenhao Li, Lenart Treven, Andreas Krause, Bhavya Sukhija
General AI
Learning-based control methods typically assume stationary system dynamics, an assumption often violated in real-world systems due to drift, wear, or changing operating conditions. We study reinforcement learning for control under time-varying dynamics. We consider a continual model-based reinforcement learning setting…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-04-02 · Masafumi Enomoto, Ryoma Obara, Haochen Zhang, Masafumi Oyamada
Research Track B · General AI
Web agents based on large language models (LLMs) rely on observations of web pages -- commonly represented as HTML -- as the basis for identifying available actions and planning subsequent steps. Prior work has treated the verbosity of HTML as an obstacle to performance and adopted observation reduction as a standard p…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-04-04 · Hessen Bougueffa Eutamene, Abdellah Zakaria Sellam, Abdelmalik Taleb-Ahmed, Abdenour Hadid
Research Track A · General AI
Detecting AI-generated images remains a significant challenge because detectors trained on specific generators often fail to generalize to unseen models; however, while pixel-level artifacts vary across models, frequency-domain signatures exhibit greater consistency, providing a promising foundation for cross-generator…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-04-06 · Parsa Hosseini, Sumit Nawathe, Mahdi Salmani, Meisam Razaviyayn, Soheil Feizi
General AI
Large reasoning models rely on long chain-of-thought generation to solve complex problems, but extended reasoning often incurs substantial computational cost and can even degrade performance due to overthinking. A key challenge is determining when the model should stop reasoning and produce the final answer. In this wo…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-04-06 · Mingzhe Du, Luu Anh Tuan, Dong Huang, See-kiong Ng
Research Track A · General AI
The accelerating pace of scientific publishing makes it increasingly difficult for researchers to stay current. We present Paper Espresso, an open-source platform that automatically discovers, summarizes, and analyzes trending arXiv papers. The system uses large language models (LLMs) to generate structured summaries w…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-04-06 · LM-Provers, Yuxiao Qu, Amrith Setlur, Jasper Dekoninck, Edward Beeching, Jia Li, Ian Wu, Lewis Tunstall, Aviral Kumar
General AI
Proprietary AI systems have recently demonstrated impressive capabilities on complex proof-based problems, with gold-level performance reported at the 2025 International Mathematical Olympiad (IMO). However, the training pipelines behind these systems remain largely undisclosed, and their reliance on large "internal" m…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-04-09 · Kaiyuan Tian, Yu Tang, Gongqingjian Jiang, Baihui Liu, Yifu Gao, Xialin Su, Linbo Qiao, Dongsheng Li
General AI
Full-parameter fine-tuning of large language models is constrained by substantial GPU memory requirements. Low-rank adaptation methods mitigate this challenge by updating only a subset of parameters. However, these approaches often limit model expressiveness and yield lower performance than full-parameter fine-tuning. …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-04-09 · Onkar Susladkar, Dong-Hwan Jang, Tushar Prakash, Adheesh Juvekar, Vedant Shah, Ayush Barik, Nabeel Bashir, Muntasir Wahed, Ritish Shrirao, Ismini Lourentzou
General AI
We introduce RewardFlow, an inversion-free framework that steers pretrained diffusion and flow-matching models at inference time through multi-reward Langevin dynamics. RewardFlow unifies complementary differentiable rewards for semantic alignment, perceptual fidelity, localized grounding, object consistency, and human…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-05-07 · Daniel Zheng, Ingrid von Glehn, Yori Zwols, Iuliya Beloshapka, Lars Buesing, Daniel M. Roy, Martin Wattenberg, Bogdan Georgiev, Tatiana Schmidt, Andrew Cowie, Fernanda Viegas, Dimitri Kanevsky, Vineet Kahlon, Hartmut Maennel, Sophia Alj, George Holland, Alex Davies, Pushmeet Kohli
General AI
We introduce the AI co-mathematician, a workbench for mathematicians to interactively leverage AI agents to pursue open-ended research. The AI co-mathematician is optimized to provide holistic support for the exploratory and iterative reality of mathematical workflows, including ideation, literature search, computation…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-05-07 · Apurva Gandhi, Satyaki Chakraborty, Xiangjun Wang, Aviral Kumar, Graham Neubig
General AI
We introduce Recursive Agent Optimization (RAO), a reinforcement learning approach for training recursive agents: agents that can spawn and delegate sub-tasks to new instantiations of themselves recursively. Recursive agents implement an inference-time scaling algorithm that naturally allows agents to scale to longer c…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.5
2026-03-24 · Xinyao Wu, Zhe Xu, Cheng Chen, Jiawei Ma, Yefeng Zheng, Raymond Kai-yu Tong
Research Track A · General AI
Class-incremental learning (CIL) in medical image-guided diagnosis requires retaining prior diagnostic knowledge while adapting to newly emerging disease categories, which is critical for scalable clinical deployment. This problem is particularly challenging due to heterogeneous data and privacy constraints that preven…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 11.5
2026-04-08 · Wonseon Lim, Jaesung Lee, Dae-Won Kim
Research Track A · General AI
Continual learning (CL) on edge devices requires not only high accuracy but also training-time efficiency to support on-device adaptation under strict memory and computational constraints. While prompt-based continual learning (PCL) is parameter-efficient and achieves competitive accuracy, prior work has focused mainly…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.5
2026-04-10 · Han Luo, Guy Laban
General AI
Large language models are increasingly deployed in multi-turn settings such as tutoring, support, and counseling, where reliability depends on preserving consistent roles, personas, and goals across long horizons. This requirement becomes critical when LLMs are used to generate synthetic dialogues for training and eval…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.5
2026-04-15 · Tianshuo Yang, Guanyu Chen, Yutian Chen, Zhixuan Liang, Yitian Liu, Zanxin Chen, Chunpu Xu, Haotian Liang, Jiangmiao Pang, Yao Mu, Ping Luo
General AI
While end-to-end Vision-Language-Action (VLA) models offer a promising paradigm for robotic manipulation, fine-tuning them on narrow control data often compromises the profound reasoning capabilities inherited from their base Vision-Language Models (VLMs). To resolve this fundamental trade-off, we propose HiVLA, a visu…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.5
2026-04-15 · Haoran Xu, Kaiwen Hu, Somayeh Sojoudi, Amy Zhang
General AI
We study behavior-regularized reinforcement learning (RL), where regularization toward a reference distribution (the dataset in offline RL or the base model in LLM RL finetuning) is essential to prevent value over-optimization caused by erroneous out-of-distribution extrapolation. Existing methods either rely on repara…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.5
2026-04-22 · Shanshan Zhong, Yi Lu, Jingjie Ning, Yibing Wan, Lihan Feng, Yuyi Ao, Leonardo F. R. Ribeiro, Markus Dreyer, Sean Ammirati, Chenyan Xiong
Research Track A · General AI
Skills have become the de facto way to enable LLM agents to perform complex real-world tasks with customized instructions, workflows, and tools, but how to learn them automatically and effectively remains unclear. We introduce SkillLearnBench, the first benchmark for evaluating continual skill learning methods, compris…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.5
2026-04-23 · Itay Nakash, George Kour, Ateret Anaby-Tavor
General AI
Large language models (LLMs) are increasingly deployed as customer-facing agents, yet evaluating their reliability remains challenging due to stochastic, multi-turn interactions. Current evaluation protocols rely on linear Monte Carlo rollouts of complete agent-user conversations to estimate success. However, this appr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.5
2026-05-06 · William T. Redman, Erik C. Johnson, Brian Robinson
Research Track A · General AI
Identifying and exploiting common features across domains is at the heart of the human ability to make analogies, and is believed to be crucial for the ability to continually learn. To do this successfully, general and flexible computational strategies must be developed. While the extent to which Transformer neural net…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.4
2026-04-29 · Yibin Luo, Shiwei Gao, Huichuan Zheng, Youyou Lu, Jiwu Shu
General AI
Fine-tuning Large Language Models (LLMs) on consumer-grade GPUs is highly cost-effective, yet constrained by limited GPU memory and slow PCIe interconnects. Pipeline parallelism combined with CPU offloading mitigates these hardware bottlenecks by reducing communication overhead. However, existing PP schedules suffer fr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.4
2026-04-30 · Haofei Yu, Yining Zhao, Lenore Blum, Manuel Blum, Paul Pu Liang
Research Track B · General AI
Despite remarkable advances, today's AI systems remain narrow in scope, falling short of the flexible, adaptive, and multisensory intelligence that characterizes human capabilities. This gap has fueled longstanding debates about whether AI might one day achieve human-like generality or even consciousness, and whether t…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 11.4
2026-04-30 · Hanzhong Guo, Jie Wu, Jie Liu, Yu Gao, Zilyu Ye, Linxiao Yuan, Xionghui Wang, Yizhou Yu, Weilin Huang
General AI
While Reinforcement Learning from Human Feedback (RLHF) has become a pivotal paradigm for text-to-image generation, its application to image editing remains largely unexplored. A key bottleneck is the lack of a robust general reward model for all editing tasks. Existing edit reward models usually give overall scores wi…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.4
2026-05-01 · Indraneil Paul, Glavaš Glavas, Iryna Gurevych
General AI
Reward models (RMs) have become an indispensable fixture of the language model (LM) post-training playbook, enabling policy alignment and test-time scaling. Research on the application of RMs in code generation, however, has been comparatively sparse, with existing work largely focusing on execution feedback. This choi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.4
2026-05-04 · Zhisheng Tang, Mayank Kejriwal
Research Track B · General AI
Research funding discovery remains fundamentally fragmented: researchers navigate disparate agency portals (e.g., in the United States, NSF, NIH, DARPA, Grants.gov, and many others) with heterogeneous interfaces, search capabilities, and data schemas. We present a compound AI system that unifies this landscape through …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-14 · Yiyang Huang, Yitian Zhang, Yizhou Wang, Mingyuan Zhang, Liang Shi, Huimin Zeng, Yun Fu
General AI
Despite significant progress in video-language modeling, hallucinations remain a persistent challenge in Video Large Language Models (Vid-LLMs), referring to outputs that appear plausible yet contradict the content of the input video. This survey presents a comprehensive analysis of hallucinations in Vid-LLMs and intro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-14 · Erfan Baghaei Potraghloo, Seyedarmin Azizi, Souvik Kundu, Massoud Pedram
General AI
Instruction-tuned large language models produce helpful, structured responses, but how robust is this helpfulness when trivially constrained? We show that simple lexical constraints (banning a single punctuation character or common word) cause instruction-tuned LLMs to collapse their responses, losing 14--48% of compre…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-14 · Joel Fokou
General AI
Autonomous AI agents are rapidly transitioning from experimental tools to operational infrastructure, with projections that 80% of enterprise applications will embed AI copilots by the end of 2026. As agents gain the ability to execute real-world actions (reading files, running commands, making network requests, modify…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-16 · Marcel Wagenländer, Otto White, Britannio Jarrett, Pedro Silvestre, Yanda Tao, Guo Li, Huanzhou Zhu, Llúis Vilanova, Peter Pietzuch
General AI
Agentic workflows carry out complex tasks by orchestrating multiple large language models (LLMs) and tools. Serving such workflows at a target throughput with low latency is challenging because they can be defined using arbitrary agentic frameworks and exhibit unpredictable execution times: execution may branch, fan-ou…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-16 · Xiao-Liang Qi
General AI
This article argues that the most important significance of the AI revolution, especially the rise of large language models, lies not simply in automation, but in a fundamental change in how complex information and human know-how are carried, replicated, and shared. From this perspective, AI for Science is especially i…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-17 · Yi Lin, Yihao Ding, Yonghui Wu, Yifan Peng
General AI
Automated 3D radiology report generation often suffers from clinical hallucinations and a lack of the iterative verification found in human practice. While recent Vision-Language Models (VLMs) have advanced the field, they typically operate as monolithic "black-box" systems without the collaborative oversight character…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-20 · Xirui Li, Ming Li, Derry Xu, Wei-Lin Chiang, Ion Stoica, Cho-Jui Hsieh, Tianyi Zhou
General AI
Constructing environments for training and evaluating claw-like agents remains a manual, human-intensive process that does not scale. We argue that what is needed is not just a dataset, but an automated pipeline capable of generating diverse, verified environments on demand. To this end, we introduce ClawEnvKit, an aut…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-20 · Manan Gupta, Dhruv Kumar
General AI
Large language models frequently commit unrecoverable reasoning errors mid-generation: once a wrong step is taken, subsequent tokens compound the mistake rather than correct it. We introduce $\textbf{Latent Phase-Shift Rollback}$ (LPSR): at each generation step, we monitor the residual stream at a critical layer lcrit,…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-20 · Daniela Baiamonte, Elena Fano, Matteo Gabburo, Stefano Simonazzi, Leonardo Rigutini, Andrea Zugarini
General AI
Vision Language Models (VLMs) achieved rapid progress in the recent years. However, despite their growth, VLMs development is heavily grounded on English, leading to two main limitations: (i) the lack of multilingual and multimodal datasets for training, and (ii) the scarcity of comprehensive evaluation benchmarks acro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-21 · Xianming Li, Zongxi Li, Tsz-fung Andrew Lee, Jing Li, Haoran Xie, Qing Li
General AI
Parameter-efficient fine-tuning (PEFT) reduces the training cost of full-parameter fine-tuning for large language models (LLMs) by training only a small set of task-specific parameters while freezing the pretrained backbone. However, existing approaches, such as Low-Rank Adaptation (LoRA), achieve adaptation by inserti…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-22 · Hanqi Li, Lu Chen, Kai Yu
General AI
As LLMs are increasingly integrated into agentic systems, they must adhere to dynamically defined, machine-interpretable interfaces. We evaluate LLMs as in-context interpreters: given a novel context-free grammar, can LLMs generate syntactically valid, behaviorally functional, and semantically faithful outputs? We intr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-23 · Irene Aldridge, Jolie An, Riley Burke, Michael Cao, Chia-Yi Chien, Kexin Deng, Ruipeng Deng, Yichen Gao, Olivia Guo, Shunran He, Zheng Li, George Lin, Weihang Lin, Percy Lyu, Alex Ng, Qi Wang, Hanxi Xiao, Dora Xu, Yuanyuan Xue, Sheng Zhang, Sirui Zhang, Yun Zhang, Sirui Zhao, Xiaolong Zhao, Yihan Zhao, Waner Zheng
General AI
The emergence of agentic artificial intelligence (AI) represents a fundamental transformation in financial markets, characterized by autonomous systems capable of reasoning, planning, and adaptive decision-making with minimal human intervention. This comprehensive survey synthesizes recent advances in agentic AI across…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-23 · Isabella Liu, An-Chieh Cheng, Rui Yan, Geng Chen, Ri-Zhao Qiu, Xueyan Zou, Sha Yi, Hongxu Yin, Xiaolong Wang, Sifei Liu
General AI
Long-horizon manipulation remains challenging for vision-language-action (VLA) policies: real tasks are multi-step, progress-dependent, and brittle to compounding execution errors. We present LoHo-Manip, a modular framework that scales short-horizon VLA execution to long-horizon instruction following via a dedicated ta…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-23 · Jiseon Kim, Jea Kwon, Luiz Felipe Vecchietti, Wenchao Dong, Jaehong Kim, Meeyoung Cha
General AI
Human moral judgment is context-dependent and modulated by interpersonal relationships. As large language models (LLMs) increasingly function as decision-support systems, determining whether they encode these social nuances is critical. We characterize machine behavior using the Whistleblower's Dilemma by varying two e…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-23 · Naheed Rayhan, Sohely Jahan
General AI
Large language models (LLMs) are increasingly integrated into sensitive workflows, raising the stakes for adversarial robustness and safety. This paper introduces Transient Turn Injection(TTI), a new multi-turn attack technique that systematically exploits stateless moderation by distributing adversarial intent across …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-24 · Jiajun Yu, Guodong Liu, Li Wang, Pengxiang Zhou, Wentao Liu, Yin He, Chao Xu, Fei Gao, Yanjun Cao
General AI
Parallel trajectory optimization via the Alternating Direction Method of Multipliers (ADMM) has emerged as a scalable approach to long-horizon motion planning. However, existing frameworks typically decompose the problem into parallel subproblems based on a predefined fixed structure. Such structural rigidity often cau…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-24 · Longju Bai, Zhemin Huang, Xingyao Wang, Jiao Sun, Rada Mihalcea, Erik Brynjolfsson, Alex Pentland, Jiaxin Pei
General AI
The wide adoption of AI agents in complex human workflows is driving rapid growth in LLM token consumption. When agents are deployed on tasks that require a significant amount of tokens, three questions naturally arise: (1) Where do AI agents spend the tokens? (2) Which models are more token-efficient? and (3) Can agen…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-27 · Parsa Ashrafi Fashi, Utkarsh Saxena, Mehdi Rezagholizadeh, Aref Jafari, Akash Haridas, Mingyu Yang, Vansh Bhatia, Guihong Li, Vikram Appia, Emad Barsoum
General AI
Hybrid sequence models that combine efficient Transformer components with linear sequence modeling blocks are a promising alternative to pure Transformers, but most are still pretrained from scratch and therefore fail to reuse existing Transformer checkpoints. We study upcycling as a practical path to convert pretraine…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-27 · Zhenyu Zhao, Aparna Balagopalan, Adi Agrawal, Dilshoda Yergasheva, Waseem Alshikh, Daniel M. Bikel
General AI
Given the increased use of LLMs in financial systems today, it becomes important to evaluate the safety and robustness of such systems. One failure mode that LLMs frequently display in general domain settings is that of sycophancy. That is, models prioritize agreement with expressed user beliefs over correctness, leadi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-05-11 · Wei Chow, Linfeng Li, Xian Sun, Lingdong Kong, Zefeng Li, Qi Xu, Hang Song, Tian Ye, Xian Wang, Jinbin Bai, Shilin Xu, Xiangtai Li, Junting Pan, Shaoteng Liu, Ran Zhou, Tianshu Yang, Songhua Liu
Research Track A · General AI
Diffusion models dominate image editing, yet their global denoising mechanism entangles edited regions with surrounding context, causing modifications to propagate into areas that should remain intact. We propose a fundamentally different approach by leveraging Masked Generative Transformers (MGTs), whose localized tok…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 11.3
2026-05-12 · Bokang Yang, Xinyi Sun, Kaituo Feng, Xingping Dong, Dongming Wu, Xiangyu Yue
General AI
Visual perception connects high-level semantic understanding to pixel-level perception, but most existing settings assume that the decisive evidence for identifying a target is already in the image or frozen model knowledge. We study a more practical yet harder open-world case where a visible object must first be resol…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 11.3
2026-05-12 · Gunjan, Sidahmed Benabderrahmane, Talal Rahwan
General AI
Large Language Models (LLMs) can generate fluent political text at scale, raising concerns about synthetic discourse during crises and social conflict. Existing AI-text detection often focuses on sentence-level cues such as perplexity, burstiness, or token irregularities, but these signals may weaken as generative syst…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 11.2
2026-04-29 · My Thi Diem Phan, Trung Tuyen Truong, Hoai Phuong Ha, Dat Thanh Nguyen
General AI
Norway's electricity market is heavily dominated by hydropower, but the 2021--2022 energy crisis and stronger integration with Continental Europe have fundamentally altered price formation, reducing the reliability of forecasting models calibrated on historical data. Despite the critical need for updated models, a unif…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.2
2026-04-29 · Darren Fürst, Sebastian Steindl, Ulrich Schäfer
General AI
Speech Sound Disorders (SSD) affect roughly five percent of children, yet speech-language pathologists face severe staffing shortages and unmanageable caseloads. We test a hierarchical approach to SSD classification on the granular multi-task SLPHelmUltraSuitePlus benchmark. We propose a cascading approach from binary …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.2
2026-04-29 · Raj Kumar Ranabhat, Tayler D Ross, Tony Jiao, Jeremie Larouche, Joel Finkelstein, Michael Hardisty
General AI
Surgical training involves didactic teaching, mentor-led learning, surgical skills laboratories, and direct exposure to surgery; however, increasing clinical pressures have limited operating room (OR) exposure. This work leverages virtual reality (VR) to provide a safe and immersive training environment. Existing VR tr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.2
2026-04-29 · Wanyue Zhang, Wenxiang Wu, Wang Xu, Jiaxin Luo, Helu Zhi, Yibin Huang, Shuo Ren, Zitao Liu, Jiajun Zhang
General AI
Vision-language models (VLMs) have shown strong performance on static visual understanding, yet they still struggle with dynamic spatial reasoning that requires imagining how scenes evolve under egocentric motion. Recent efforts address this limitation either by scaling spatial supervision with synthetic data or by cou…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.2
2026-04-30 · Andac Demir, Erik W. Anderson, Jeremy L. Jenkins, Srayanta Mukherjee
General AI
In this work, we introduce CellxPert, a scalable multimodal foundation model that unifies single-cell and spatial multi-omics within a common representation space. CellxPert jointly encodes transcriptomic (scRNA-seq), chromatin-accessibility (ATAC-seq), and surface-proteomic (CITE-seq) measurements, while directly inco…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 11.2
2026-04-30 · Chenxin Li, Zhengyang Tang, Huangxin Lin, Yunlong Lin, Shijue Huang, Shengyuan Liu, Bowen Ye, Rang Li, Lei Li, Benyou Wang, Yixuan Yuan
General AI
LLM agents are expected to complete end-to-end units of work across software tools, business services, and local workspaces. Yet many agent benchmarks freeze a curated task set at release time and grade mainly the final response, making it difficult to evaluate agents against evolving workflow demand or verify whether …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.2
2026-04-30 · Han Liu, Shanghao Shi, Yevgeniy Vorobeychik, Chongjie Zhang, Ning Zhang
General AI
Low-Rank Adaptation (LoRA), which leverages the insight that model updates typically reside in a low-dimensional space, has significantly improved the training efficiency of Large Language Models (LLMs) by updating neural network layers using low-rank matrices. Since the generation of adversarial examples is an optimiz…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.2
2026-04-30 · Jun Yeon Won, Xin Jin, Shiqing Ma, Zhiqiang Lin
General AI
Large Language Models (LLMs) have achieved remarkable progress in recent years, driving their adoption across a wide range of domains, including computer security. In reverse engineering, LLMs are increasingly applied to critical tasks such as function and variable name recovery and type inference. However, despite the…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.2
2026-04-30 · Zainab Rehan, Christian Medeiros Adriano, Sona Ghahremani, Holger Giese
General AI
Rule-based systems remain central in safety-critical domains but often struggle with scalability, brittleness, and goal misspecification. These limitations can lead to reward hacking and failures in formal verification, as AI systems tend to optimize for narrow objectives. In previous research, we developed a neuro-sym…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.2
2026-05-01 · Yutao Hou, Yihan Jiang, Yuhan Xie, Jian Yang, Liwen Zhang, Hailiang Huang, Guanhua Chen, Yun Chen
General AI
Large language models (LLMs) are increasingly applied in financial scenarios. However, they may produce harmful outputs, including facilitating illegal activities or unethical behavior, posing serious compliance risks. To systematically evaluate LLM safety in finance, we propose FinSafetyBench, a bilingual (English-Chi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.2
2026-05-01 · Theodore Papamarkou, Pierre Alquier, Matthias Bauer, Wray Buntine, Andrew Davison, Gintare Karolina Dziugaite, Maurizio Filippone, Andrew Y. K. Foong, Vincent Fortuin, Dimitris Fouskakis, Jes Frellsen, Eyke Hüllermeier, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Nikita Kotelevskii, Salem Lahlou, Yingzhen Li, Fang Liu, Clare Lyle, Thomas Möllenhoff, Konstantina Palla, Maxim Panov, Yusuf Sale, Kajetan Schweighofer, Artem Shelmanov, Siddharth Swaroop, Martin Trapp, Willem Waegeman, Andrew Gordon Wilson, Alexey Zaytsev
General AI
LLMs excel at predictive tasks and complex reasoning tasks, but many high-value deployments rely on decisions under uncertainty, for example, which tool to call, which expert to consult, or how many resources to invest. While the usefulness and feasibility of Bayesian approaches remain unclear for LLM inference, this p…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.2
2026-05-02 · Zhiwen Ruan, Yichao Du, Jianjie Zheng, Longyue Wang, Yun Chen, Peng Li, Jinsong Su, Yang Liu, Guanhua Chen
General AI
A promising paradigm for adapting instruction-tuned language models is to learn task-specific updates on a pretrained base model and subsequently merge them into the instruction-tuned model. However, existing approaches typically treat the instruction-tuned model as a passive target that is only involved at the final m…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.2
2026-05-03 · Zongqian Li, Yixuan Su, Han Zhou, Zihao Fu, Nigel Collier
General AI
Parameter-efficient fine-tuning methods like Low-Rank Adaptation (LoRA) have become essential for deploying large language models, yet their static parameter allocation remains suboptimal for inputs of varying complexity. We present Flexi-LoRA, a novel framework that dynamically adjusts LoRA ranks based on input comple…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.2
2026-05-03 · Yiyao Wang, Sixian Zhang, Keming Zhang, Xinhang Song, Songjie Du, Shuqiang Jiang
General AI
Existing zero-shot Object Goal Navigation (ObjectNav) methods often exploit commonsense knowledge from large language or vision-language models to guide navigation. However, such knowledge arises from internet-scale text rather than embodied 3D experience, and episodic observations collected during navigation are typic…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.2
2026-05-04 · Yu-Ju Tsai, Brian Price, Qing Liu, Luis Figueroa, Daniil Pakhomov, Zhihong Ding, Scott Cohen, Ming-Hsuan Yang
General AI
Personalized image completion aims to restore occluded regions in personal photos while preserving identity and appearance. Existing methods either rely on generic inpainting models that often fail to maintain identity consistency, or assume that suitable reference images are explicitly provided. In practice, suitable …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.2
2026-05-04 · Frederic Grabowski, Jacek Szczerbiński, Maciej Jaśkowski, Kalina Jasińska-Kobus, Paweł Dąbrowski-Tumański, Tomasz Jetka, Bartosz Topolski
General AI
Molecular property models increasingly support high-stakes drug-discovery decisions, but their outputs are often difficult to audit: classical predictors return scores without rationale, while language models can produce fluent explanations weakly grounded in the input molecule. We introduce Bolek, a compact multimodal…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.2
2026-05-04 · Danil Tokhchukov, Veronika Morozova, Gonzalo Ferrer
General AI
Traditional Simultaneous Localization and Mapping (SLAM) algorithms rely heavily on the static environment assumption, which severely limits their applicability in real-world spaces populated by moving entities, such as pedestrians. In this work, we propose DynoSLAM, a tightly-coupled Dynamic GraphSLAM architecture tha…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.0
2026-03-11 · Gallil Maimon, Ori Yoran, Felix Kreuk, Michael Hassid, Gal Cohen, Pierre Chambon, Yossi Adi
General AI
A promising research direction in enabling LLMs to generate consistently correct code involves addressing their inability to properly estimate program execution, particularly for code they generate. In this work, we demonstrate that Code LLMs can be trained to simulate program execution in a step-by-step manner and tha…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.0
2026-03-27 · Nicholas Edwards, Sebastian Schuster
General AI
As Large Language Model (LLM) agents are increasingly deployed in open-ended domains like software engineering, they frequently encounter underspecified instructions that lack crucial context. While human developers naturally resolve underspecification by asking clarifying questions, current agents are largely optimize…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.0
2026-03-28 · Zhuoyang Qian, Wei Shi, Xu Lin, Li Ling, Meng Luo, Ziming Wang, Zhiwei Zhang, Tengyue Xu, Gaoge Liu, Zhentao Zhang, Shuo Zhang, Ziqi Wang, Zheng Feng, Yan Luo, Shu Xu, Yongjin Chen, Zhibo Feng, Zhuo Chen, Bruce Yuan, Biao Wu, Harry Wang, Kris Chen
General AI
Generating scientific manuscripts requires maintaining alignment between narrative reasoning, experimental evidence, and visual artifacts across the document lifecycle. Existing language-model generation pipelines rely on unconstrained text synthesis with validation applied only after generation, often producing struct…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.0
2026-03-31 · Weixian Xu, Tiantian Mi, Yixiu Liu, Yang Nan, Zhimeng Zhou, Lyumanshan Ye, Lin Zhang, Yu Qiao, Pengfei Liu
General AI
Can AI accelerate the development of AI itself? While recent agentic systems have shown strong performance on well-scoped tasks with rapid feedback, it remains unclear whether they can tackle the costly, long-horizon, and weakly supervised research loops that drive real AI progress. We present ASI-Evolve, an agentic fr…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.0
2026-03-31 · Wenli Li, Kai Zhao, Haoran Jiang, Enquan Yang, Yi Su, Dan Zeng
General AI
Vision-language models (VLMs) have been widely adopted for 3D question answering (3D QA). In typical pipelines, visual tokens extracted from multiple viewpoints are concatenated with language tokens and jointly processed by a large language model (LLM) for inference. However, aggregating multi-view observations inevita…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.0
2026-04-01 · Benjamin Turtel, Paul Wilczewski, Kris Skotheim
General AI
Anticipating supply chain disruptions before they materialize is a core challenge for firms and policymakers alike. A key difficulty is learning to reason reliably about infrequent, high-impact events from noisy and unstructured inputs - a setting where general-purpose models struggle without task-specific adaptation. …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.0
2026-04-17 · Ulrich Tan
Research Track A · General AI
We introduce the Tan-HWG framework (Hebbian-Wasserstein-Geometry), a geometric theory of Hebbian plasticity in which memory states are modeled as probability measures evolving through Wasserstein minimizing movements. Hebbian learning rules are formalized as Hebbian energies satisfying a sequential stability condition,…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.0
2026-04-23 · Mohammed Safi Ur Rahman Khan, Sanjay Suryanarayanan, Tushar Anand, Mitesh M. Khapra
General AI
Large Vision-Language Models (VLMs) are increasingly used to evaluate outputs of other models, for image-to-text (I2T) tasks such as visual question answering, and text-to-image (T2I) generation tasks. Despite this growing reliance, the reliability of these Evaluator VLMs remains under explored. In this work, we system…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.0
2026-04-25 · Yihan Wang, Lei Li, Yao Lai, Jing Wang, Yan Lu
General AI
Analog circuit design relies heavily on reusing existing intellectual property (IP), yet searching across heterogeneous representations such as SPICE netlists, schematics, and functional descriptions remains challenging. Existing methods are largely limited to exact matching within a single modality, failing to capture…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 11.0
2026-04-27 · Hongxin Li, Xiping Wang, Jingran Su, Zheng Ju, Yuntao Chen, Qing Li, Zhaoxiang Zhang
Research Track B · General AI
Autonomous agents capable of navigating Graphical User Interfaces (GUIs) hold the potential to revolutionize digital productivity. However, achieving true digital autonomy extends beyond reactive element matching; it necessitates a predictive mental model of interface dynamics and the ability to foresee the "digital wo…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.0
2026-05-08 · Andrea Gurioli, Federico Pennino, Maurizio Gabbrielli
General AI
Embedding-based code retrieval often suffers when encoders overfit to surface syntax. Prior work mitigates this by using LLMs to rephrase queries and corpora into a normalized style, but leaves two questions open: how much representational shift helps, and when is the per-query LLM call justified? We study a hierarchy …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.8
2026-02-01 · Alberto Castelo, Zahra Zanjani Foumani, Ailin Fan, Keat Yang Koay, Vibhor Malik, Yuanzheng Zhu, Han Li, Meysam Feghhi, Ronie Uliana, Shuang Xie, Zhaoyu Zhang, Angelo Ocana Martins, Mingyu Zhao, Francis Pelland, Jonathan Faerman, Nikolas LeBlanc, Aaron Glazer, Andrew McNamara, Lingyun Wang, Zhong Wu
Research Track B · General AI
A/B testing remains the gold standard for evaluating e-commerce UI changes, yet it diverts traffic, takes weeks to reach significance, and risks harming user experience. We introduce SimGym, a scalable system for rapid offline A/B testing using traffic-grounded synthetic buyers powered by Large Language Model agents op…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.8
2026-03-24 · Qianlong Lan, Anuj Kaul
Research Track B · General AI
Deploying large language models (LLMs) as autonomous browser agents exposes a significant attack surface in the form of Indirect Prompt Injection (IPI). Cloud-based defenses can provide strong semantic analysis, but they introduce latency and raise privacy concerns. We present the Cognitive Firewall, a three-stage spli…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-03-26 · Linyue Pan, Lexiao Zou, Shuo Guo, Jingchen Ni, Hai-Tao Zheng
General AI
Agent performance increasingly depends on \emph{harness engineering}, yet harness design is usually buried in controller code and runtime-specific conventions, making it hard to transfer, compare, and study as a scientific object. We ask whether the high-level control logic of an agent harness can instead be externaliz…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-03-26 · Liping Yi, Zhiming Zhao, Qinghua Hu
General AI
Social learning highlights that learning agents improve not in isolation, but through interaction and structured knowledge exchange with others. When introduced into machine learning, this principle gives rise to social machine learning (SML), where multiple agents collaboratively learn by sharing abstracted knowledge.…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-03-26 · Jinbo Xing, Zeyinzi Jiang, Yuxiang Tuo, Chaojie Mao, Xiaotang Gai, Xi Chen, Jingfeng Zhang, Yulin Pan, Zhen Han, Jie Xiao, Keyu Yan, Chenwei Xie, Chongyang Zhong, Kai Zhu, Tong Shen, Lianghua Huang, Yu Liu, Yujiu Yang
General AI
Recent unified models have made unprecedented progress in both understanding and generation. However, while most of them accept multi-modal inputs, they typically produce only single-modality outputs. This challenge of producing interleaved content is mainly due to training data scarcity and the difficulty of modeling …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-03-30 · Min Wang, Ata Mahjoubfar
General AI
Agentic vision-language models increasingly act through extended interactions, but most evaluations still focus on single-image, single-turn correctness. We introduce AMIGO (Agentic Multi-Image Grounding Oracle Benchmark), a long-horizon benchmark for hidden-target identification over galleries of visually similar imag…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-03-30 · Yanyan Yan, Yang Feng, Jiangshan Liu, Di Liu, Zixi Liu, Hao Teng, Baowen Xu
General AI
The growing adoption of Rust for its memory safety and performance has increased the demand for effective migration of legacy C codebases. However, existing rule-based translators (e.g., \ctorust) often generate verbose, non-idiomatic code that preserves unsafe C semantics, limiting readability, maintainability, and pr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-03-31 · Mst. Fahmida Sultana Naznin, Adnan Ibney Faruq, Mushfiqur Rahman, Niloy Kumar Mondal, Md. Mehedi Hasan Shawon, Md Rakibul Hasan
General AI
Automated radiology report summarization aims to distill verbose findings into concise clinical impressions, but existing multimodal models often struggle with visual noise and fail to meaningfully improve over strong text-only baselines in the FINDINGS $\to$ IMPRESSION transformation. We challenge two prevailing assum…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-03-31 · Zhuowen Liang, Xiaotian Lin, Zhengxuan Zhang, Yuyu Luo, Haixun Wang, Nan Tang
General AI
Large language models (LLMs) are widely applied to data analytics over documents, yet direct reasoning over long, noisy documents remains brittle and error-prone. Hence, we study document question answering (QA) that consolidates dispersed evidence into a structured output (e.g., a table, graph, or chunks) to support r…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-03-31 · Kaleb Newman, Tyler Zhu, Olga Russakovsky
General AI
Video diffusion models exhibit emergent reasoning capabilities like solving mazes and puzzles, yet little is understood about how they reason during generation. We take a first step towards understanding this and study the internal planning dynamics of video models using 2D maze solving as a controlled testbed. Our inv…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-04-02 · Qiyao Zhang, Shuhua Zheng, Jianli Sun, Chengxiang Li, Xianke Wu, Zihan Song, Zhiyong Cui, Yisheng Lv, Yonglin Tian
General AI
Embodied visual tracking is crucial for Unmanned Aerial Vehicles (UAVs) executing complex real-world tasks. In dynamic urban scenarios with complex semantic requirements, Vision-Language-Action (VLA) models show great promise due to their cross-modal fusion and continuous action generation capabilities. To benchmark mu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-04-02 · Syed Ahmed, Bharathi Vokkaliga Ganesh, Jagadish Babu P, Karthick Selvaraj, Praneeth Talluri, Sanket Hingne, Anubhav Kumar, Anushka Yadav, Pratham Kumar Verma, Kiranmayee Janardhan, Mandanna A N
General AI
Understanding how Large Language Models (LLMs) process information from prompts remains a significant challenge. To shed light on this "black box," attention visualization techniques have been developed to capture neuron-level perceptions and interpret how models focus on different parts of input data. However, many ex…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-04-03 · Yunfei Bai, Amit Dhanda, Shekhar Jain
General AI
The recent advancements in Vision Language Models (VLMs) have demonstrated progress toward true intelligence requiring robust reasoning capabilities. Beyond pattern recognition, linguistic reasoning must integrate with visual comprehension, particularly for Chart Question Answering (CQA) tasks involving complex data vi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-04-06 · Songyuan Yang, Huibin Tan, Kailun Yang, Wenjing Yang, Shaowu Yang
General AI
We introduce AnyUser, a unified robotic instruction system for intuitive domestic task instruction via free-form sketches on camera images, optionally with language. AnyUser interprets multimodal inputs (sketch, vision, language) as spatial-semantic primitives to generate executable robot actions requiring no prior map…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-04-09 · Yuxuan Zhang, Yubo Wang, Yipeng Zhu, Penghui Du, Junwen Miao, Xuan Lu, Wendong Xu, Yunzhuo Hao, Songcheng Cai, Xiaochen Wang, Huaisong Zhang, Xian Wu, Yi Lu, Minyi Lei, Kai Zou, Huifeng Yin, Ping Nie, Liang Chen, Dongfu Jiang, Wenhu Chen, Kelsey R. Allen
General AI
AI agents may be able to automate your inbox, but can they automate other routine aspects of your life? Everyday online tasks offer a realistic yet unsolved testbed for evaluating the next generation of AI agents. To this end, we introduce ClawBench, an evaluation framework of 153 simple tasks that people need to accom…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-04-28 · Jiahang Lin, Shichun Liu, Chengjun Pan, Lizhi Lin, Shihan Dou, Xuanjing Huang, Hang Yan, Zhenhua Han, Tao Gui
General AI
Harnesses have become a central determinant of coding-agent performance, shaping how models interact with repositories, tools, and execution environments. Yet automating harness engineering is hard: a heterogeneous action space, sparse and noisy evaluation signal, multi-million-token trajectories, and edits whose effec…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-04-28 · Ajmain Inqiad Alam, Palash Roy, Chanchal K. Roy, Banani Roy, Kevin A. Schneider
General AI
The accelerating adoption of Large Language Models (LLMs) in software engineering (SE) has brought with it a silent crisis: unsustainable computational cost. While these models demonstrate remarkable capabilities in different SE tasks, they are unmanageably large, slow to deploy, memory-intensive, and carbon-heavy. Thi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-04-28 · Jinxiang Meng, Shaoping Huang, Fangyu Lei, Jingyu Guo, Haoxiang Liu, Jiahao Su, Sihan Wang, Yao Wang, Enrui Wang, Ye Yang, Hongze Chai, Jinming Lv, Anbang Yu, Huangjing Zhang, Yitong Zhang, Yiming Huang, Zeyao Ma, Shizhu He, Jun Zhao, Kang Liu
General AI
Real-world data visualization (DV) requires native environmental grounding, cross-platform evolution, and proactive intent alignment. Yet, existing benchmarks often suffer from code-sandbox confinement, single-language creation-only tasks, and assumption of perfect intent. To bridge these gaps, we introduce DV-World, a…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-04-28 · Pahal D. Patel, Sanmay Ganguly
General AI
Graph neural networks such as ParticleNet and transformer based networks on point clouds such as ParticleTransformer achieve state-of-the-art performance on jet tagging benchmarks at the Large Hadron Collider, yet the physical reasoning behind their predictions remains opaque. We present different methods, i.e. perturb…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-04-28 · Nazia Shehnaz Joynab, Soneya Binta Hossain
General AI
Resolution of complex post-production issues in large-scale open-source software (OSS) projects requires significant cognitive effort, as developers need to go through long, unstructured and fragmented issue discussion threads before that. In this paper, we present SWE-MIMIC-Bench, an issue trajectory dataset generated…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-05-06 · Srikar Kashyap Pulipaka
General AI
We present our system for SemEval-2026 Task 9: Multilingual Polarization Detection, a binary classification task spanning 22 languages. Our approach fine-tunes separate Gemma~3 models (12B and 27B parameters) per language using Low-Rank Adaptation (LoRA), augmented with synthetic data generated by a large language mode…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.8
2026-05-07 · Xiaofang Xiao, Guangchao Li, Guangrong Zhao, Qi Lin, Wen Ma, Hongkai Wen, Yanxiang Wang, Yiran Shen
General AI
Automatic sign language recognition (SLR) has become a key enabler of inclusive human-computer interaction, fostering seamless communication between deaf individuals and hearing communities. Despite significant advances in multimodal learning, existing SLR research remains dominated by vision-based datasets, which are …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 10.5
2026-02-10 · Talor Abramovich, Maor Ashkenazi, Carl, Putterman, Benjamin Chislett, Tiyasa Mitra, Bita Darvish Rouhani, Ran Zilberstein, Yonatan Geifman
General AI
Speculative Decoding (SD) has emerged as a critical technique for accelerating Large Language Model (LLM) inference. Unlike deterministic system optimizations, SD performance is inherently data-dependent, meaning that diverse and representative workloads are essential for accurately measuring its effectiveness. Existin…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.5
2026-03-11 · Hyungjoo Chae, Jungsoo Park, Alan Ritter
Research Track B · General AI
Training autonomous web agents is fundamentally limited by the environments they learn from: real-world websites are unsafe to explore, hard to reset, and rarely provide verifiable feedback. We propose VeriEnv, a framework that treats language models as environment creators, automatically cloning real-world websites in…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.5
2026-03-22 · Alfred Shen, Aaron Shen
Research Track A · General AI
Current AI agent frameworks commit early to a single interaction protocol, a fixed tool integration strategy, and static user models, limiting their deployment across diverse interaction paradigms. To address these constraints, we introduce STEM Agent (Self-adapting, Tool-enabled, Extensible, Multi-agent), a modular ar…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.5
2026-04-02 · Kang-Sin Choi
Research Track A · General AI
We propose LSCP, a self-gated post-training framework for autonomous knowledge acquisition: learning only what a model does not already know, verified against what it does know, at a strength proportional to conviction, with no external oracle. When a passage produces anomalously high per-token loss, LSCP flags it, gen…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.5
2026-04-08 · Jiaming Cheng, Duong Tung Nguyen
Research Track A · General AI
Deploying large language model (LLM) inference at scale requires jointly selecting base models, provisioning heterogeneous GPUs, configuring parallelism, and distributing workloads under tight latency, accuracy, and budget constraints. Exact mixed-integer linear programming (MILP) approaches guarantee optimality but sc…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-04-12 · Song Jin, Juntian Zhang, Xun Zhang, Zeying Tian, Fei Jiang, Guojun Yin, Wei Lin, Yong Liu, Rui Yan
General AI
Recent advancements in Vision-Language Models (VLMs) have revolutionized general visual understanding. However, their application in the food domain remains constrained by benchmarks that rely on coarse-grained categories, single-view imagery, and inaccurate metadata. To bridge this gap, we introduce DiningBench, a hie…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-04-12 · Udari Madhushani Sehwag, Elaine Lau, Haniyeh Ehsani Oskouie, Shayan Shabihi, Erich Liang, Andrea Toledo, Guillermo Mangialardi, Sergio Fonrouge, Ed-Yeremai Hernandez Cardona, Paula Vergara, Utkarsh Tyagi, Chen Bo Calvin Zhang, Pavi Bhatter, Nicholas Johnson, Furong Huang, Ernesto Gabriel Hernandez Montoya, Bing Liu
General AI
Accelerating scientific discovery requires the identification of which experiments would yield the best outcomes before committing resources to costly physical validation. While existing benchmarks evaluate LLMs on scientific knowledge and reasoning, their ability to predict experimental outcomes - a task where AI coul…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-04-15 · Team HY-World, Chenjie Cao, Xuhui Zuo, Zhenwei Wang, Yisu Zhang, Junta Wu, Zhenyang Liu, Yuning Gong, Yang Liu, Bo Yuan, Chao Zhang, Coopers Li, Dongyuan Guo, Fan Yang, Haiyu Zhang, Hang Cao, Jianchen Zhu, Jiaxin Lin, Jie Xiao, Jihong Zhang, Junlin Yu, Lei Wang, Lifu Wang, Lilin Wang, Linus, Minghui Chen, Peng He, Penghao Zhao, Qi Chen, Rui Chen, Rui Shao, Sicong Liu, Wangchen Qin, Xiaochuan Niu, Xiang Yuan, Yi Sun, Yifei Tang, Yifu Sun, Yihang Lian, Yonghao Tan, Yuhong Liu, Yuyang Yin, Zhiyuan Min, Tengfei Wang, Chunchao Guo
General AI
We introduce HY-World 2.0, a multi-modal world model framework that advances our prior project HY-World 1.0. HY-World 2.0 accommodates diverse input modalities, including text prompts, single-view images, multi-view images, and videos, and produces 3D world representations. With text or single-view image inputs, the mo…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-04-16 · Quyen Tran, Hai Nguyen, Hoang Phan, Quan Dao, Linh Ngo, Khoat Than, Dinh Phung, Dimitris Metaxas, Trung Le
General AI
In online incremental learning, data continuously arrives with substantial distributional shifts, creating a significant challenge because previous samples have limited replay value when learning a new task. Prior research has typically relied on either a single adaptive centroid or multiple fixed centroids to represen…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-04-18 · Xinru Yan, Boxi Cao, Yaojie Lu, Hongyu Lin, Weixiang Zhou, Le Sun, Xianpei Han
General AI
Native Omni-modal Large Language Models (OLLMs) have shifted from pipeline architectures to unified representation spaces. However, this native integration gives rise to a critical yet underexplored phenomenon: modality preference. To bridge this gap, we first systematically quantify modality preference of OLLMs using …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-04-19 · Liyang Wang, Zeyu Zhang, Hao Tang
General AI
Scene graph representations enable structured visual understanding by modeling objects and their relationships, and have been widely used for multiview and 3D scene reasoning. Existing methods such as MSG learn scene graph embeddings in Euclidean space using contrastive learning and attention based association. However…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.5
2026-04-20 · Weixi Tong, Yifeng Di, Tianyi Zhang
Research Track B · General AI
Existing web agents typically initiate exploration from the root URL, which is inefficient for complex websites with deep hierarchical structures. Without a global view of the website's structure, agents frequently fall into navigation traps, explore irrelevant branches, or fail to reach target information within a lim…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 10.5
2026-04-20 · Yu Zhang, Chuyang Sun, Kehai Chen, Xuefeng Bai, Yang Xiang, Min Zhang
General AI
Large Vision-Language Models (LVLMs) still struggle with vision hallucination, where generated responses are inconsistent with the visual input. Existing methods either rely on large-scale annotated data for fine-tuning, which incurs massive computational overhead, or employ static post-hoc strategies that overlook the…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-04-22 · Hardy Chen, Nancy Lau, Haoqin Tu, Shuo Yan, Xiangyan Liu, Zijun Wang, Juncheng Wu, Michael Qizhe Shieh, Alvaro A. Cardenas, Cihang Xie, Yuyin Zhou
General AI
Frontier coding agents are increasingly used in workflows where users supervise progress primarily through repeated improvement of a public score, namely the reported score on a public evaluation file with labels in the workspace, rather than through direct inspection of the agent's intermediate outputs. We study wheth…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.5
2026-04-23 · Jebacyril Arockiaraj, Dhruv Parikh, Viktor Prasanna
Research Track A · General AI
On-device continual learning (CL) is critical for edge AI systems operating on non-stationary data streams, but most existing methods rely on backpropagation or exemplar-heavy classifiers, incurring substantial compute, memory, and latency overheads. Hyperdimensional computing (HDC) offers a lightweight alternative thr…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-04-24 · Zhengxu Yu, Yu Fu, Zhiyuan He, Yuxuan Huang, Lee Ka Yiu, Meng Fang, Weilin Luo, Jun Wang
General AI
Individual agent capabilities have advanced rapidly through modular skills and tool integrations, yet multi-agent systems remain constrained by fixed team structures, tightly coupled coordination logic, and session-bound learning. We argue that this reflects a deeper absence: a principled organisational layer that gove…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-04-25 · Yizheng Huang, Wenjun Zeng, Aditi Kumaresan, Zi Wang
General AI
Evaluating generative AI models is increasingly resource-intensive due to slow inference, expensive raters, and a rapidly growing landscape of models and benchmarks. We propose ProEval, a proactive evaluation framework that leverages transfer learning to efficiently estimate performance and identify failure cases. ProE…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.5
2026-04-26 · Tin Nguyen, Thang T. Truong, Runtao Zhou, Trung Bui, Chirag Agarwal, Anh Totti Nguyen
Research Track B · General AI
Users browsing the web daily struggle to quickly locate relevant information in cluttered pages, complete unfamiliar multi-step tasks, and stay focused amid distracting content. State-of-the-art AI assistants (e.g., ChatGPT, Gemini, Claude) and browser agents (e.g., OpenAI Operator, Browser Use) can answer questions an…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-04-26 · Qi Li, Bo Yin, Weiqi Huang, Ruhao Liu, Bojun Zou, Runpeng Yu, Jingwen Ye, Weihao Yu, Xinchao Wang
General AI
Vision-Language-Action (VLA) models are emerging as a unified substrate for embodied intelligence. This shift raises a new class of safety challenges, stemming from the embodied nature of VLA systems, including irreversible physical consequences, a multimodal attack surface across vision, language, and state, real-time…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-04-27 · Yiming Zhang, Jiacheng Chen, Jiaqi Tan, Yongsen Mao, Wenhu Chen, Angel X. Chang
General AI
Current evaluations of spatial intelligence can be systematically invalid under modern vision-language model (VLM) settings. First, many benchmarks derive question-answer (QA) pairs from point-cloud-based 3D annotations originally curated for traditional 3D perception. When such annotations are treated as ground truth …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.5
2026-05-06 · Sohom Datta, Alex Nahapetyan, William Enck, Alexandros Kapravelos
Research Track B · General AI
Large language models (LLMs) are increasingly being integrated into web browsers to create agentic browsing systems that execute actions on behalf of the user. Prior work considering the security of agentic browsers focuses exclusively on indirect prompt-injection attacks. However, by failing to consider traditional we…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 10.5
2026-05-12 · Xinjie Shen, Rongzhe Wei, Peizhi Niu, Haoyu Wang, Ruihan Wu, Eli Chien, Bo Li, Pin-Yu Chen, Pan Li
General AI
Hidden malicious intent in multi-turn dialogue poses a growing threat to deployed large language models (LLMs). Rather than exposing a harmful objective in a single prompt, increasingly capable attackers can distribute their intent across multiple benign-looking turns. Recent studies show that even modern commercial mo…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.3
2026-04-08 · Jagadeesh Chundru
Research Track B · General AI
LLM-driven web agents operating through continuous inference loops -- repeatedly querying a model to evaluate browser state and select actions -- exhibit a fundamental scalability constraint for repetitive tasks. We characterize this as the Rerun Crisis: the linear growth of token expenditure and API latency relative t…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.3
2026-04-13 · Hugh Blayney, Álvaro Arroyo, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Michael M. Bronstein, Xiaowen Dong
General AI
Reasoning has become a central capability in large language models. Recent research has shown that reasoning performance can be improved by looping an LLM's layers in the latent dimension, resulting in looped reasoning language models. Despite promising results, few works have investigated how their internal dynamics d…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-13 · Minh Le-Anh, Cuong Chi Le, Tien N. Nguyen
General AI
Automated Program Repair (APR) has recently benefited from large language models (LLMs). However, most LLM-based APR approaches still rely primarily on coarse end-to-end signals from test-suite outcomes to guide repair, providing limited insight into where a program's internal logic deviates from its intended behavior.…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-13 · Donghao Zhou, Guisheng Liu, Hao Yang, Jiatong Li, Jingyu Lin, Xiaohu Huang, Yichen Liu, Xin Gao, Cunjian Chen, Shilei Wen, Chi-Wing Fu, Pheng-Ann Heng
General AI
In this work, we study Human-Object Interaction Video Generation (HOIVG), which aims to synthesize high-quality human-object interaction videos conditioned on text, reference images, audio, and pose. This task holds significant practical value for automating content creation in real-world applications, such as e-commer…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-13 · Federico Bottino, Carlo Ferrero, Nicholas Dosio, Pierfrancesco Beneventano
General AI
Organizational knowledge used by AI agents typically lacks epistemic structure: retrieval systems surface semantically relevant content without distinguishing binding decisions from abandoned hypotheses, contested claims from settled ones, or known facts from unresolved questions. We argue that the ceiling on organizat…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-14 · Farbod Alinezhad, Jianfei Cao, Gary J. Young, Brady Post
General AI
Predicting counterfactual outcomes in longitudinal data, where sequential treatment decisions heavily depend on evolving patient states, is critical yet notoriously challenging due to complex time-dependent confounding and inadequate uncertainty quantification in existing methods. We introduce the Causal Diffusion Mode…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-14 · Yecheng Wu, Song Han, Hai Cai
General AI
On-policy distillation (OPD) has emerged as an efficient post-training paradigm for large language models. However, standard OPD requires a live teacher inference server throughout training, resulting in substantial infrastructure overhead. In this work, we investigate whether on-policy distillation can be performed of…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-14 · Syed Md Mukit Rashid, Abdullah Al Ishtiaq, Kai Tu, Yilu Dong, Tianwei Wu, Ali Ranjbar, Tianchang Yang, Najrin Sultana, Shagufta Mehnaz, Syed Rafiul Hussain
General AI
Logical vulnerabilities in software stem from flaws in program logic rather than memory safety, which can lead to critical security failures. Although existing automated program repair techniques primarily focus on repairing memory corruption vulnerabilities, they struggle with logical vulnerabilities because of their …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-16 · Fabrizio Genilotti, Arianna Stropeni, Gionata Grotto, Francesco Borsatti, Manuel Barusco, Davide Dalle Pezze, Gian Antonio Susto
General AI
The reliability of a machine vision system for autonomous driving depends heavily on its training data distribution. When a vehicle encounters significantly different conditions, such as atypical obstacles, its perceptual capabilities can degrade substantially. Unlike many domains where errors carry limited consequence…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-16 · Jianxuan Yang, Xinyue Guo, Zhi Cheng, Kai Wang, Lipan Zhang, Jinjie Hu, Qiang Ji, Yihua Cao, Yihao Meng, Zhaoyue Cui, Mengmei Liu, Meng Meng, Jian Luan
General AI
Recent advances in video-to-audio (V2A) generation enable high-quality audio synthesis from visual content, yet achieving robust and fine-grained controllability remains challenging. Existing methods suffer from weak textual controllability under visual-text conflict and imprecise stylistic control due to entangled tem…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-16 · Boyan Li, Ou Ocean Kun Hei, Yue Yu, Yuyu Luo
General AI
While Large Language Models (LLMs) demonstrate impressive proficiency in generating SQL queries, they fundamentally lack the capability to self-evaluate correctness without an execution oracle. This limitation creates a stark Generation-Selection Gap, where high potential accuracy (Pass@K) fails to translate into execu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-17 · Tianqi Luo, Leixian Shen, Yuyu Luo
General AI
Agentic visual analytics (VA) represents an emerging class of systems in which large language model (LLM)-driven agents autonomously plan, execute, evaluate, and iterate across the full visual analytics pipeline. By shifting users from low-level tool operations to high-level analytical goals expressed through natural l…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-17 · Thomas Bayer, Alexander Lohr, Sarah Weiß, Bernd Michelberger, Wolfram Höpken
General AI
Explaining Machine Learning (ML) results in a transparent and user-friendly manner remains a challenging task of Explainable Artificial Intelligence (XAI). In this paper, we present a method to enhance the interpretability of ML models by using a Knowledge Graph (KG). We store domain-specific data along with ML results…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-19 · Ziao Zhang, Kou Shi, Shiting Huang, Avery Nie, Yu Zeng, Yiming Zhao, Zhen Fang, Qishen Su, Haibo Qiu, Wei Yang, Qingnan Ren, Shun Zou, Wenxuan Huang, Lin Chen, Zehui Chen, Feng Zhao
Research Track A · General AI
As the capability frontier of autonomous agents continues to expand, they are increasingly able to complete specialized tasks through plug-and-play external skills. Yet current benchmarks mostly test whether models can use provided skills, leaving open whether they can discover skills from experience, repair them after…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-20 · Raghvendra Kumar, Devankar Raj, Sriparna Saha
General AI
India's linguistic landscape, spanning 22 scheduled languages and hundreds of marginalized dialects, has driven rapid growth in NLP datasets, benchmarks, and pretrained models. However, no dedicated survey consolidates resources developed specifically for Indian languages. Existing reviews either focus on a few high-re…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-20 · Aditya Arora, Akshita Gupta, Pau Rodriguez, Marcus Rohrbach
General AI
Story Visualization aims to generate a sequence of images that faithfully depicts a textual narrative that preserve character identity, spatial configuration, and stylistic coherence as the narratives unfold. Maintaining such cross-frame consistency has traditionally relied on explicit memory banks, architectural expan…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-20 · Weicheng Lin, Yi Zhang, Jiawei Dang, Liang-Jie Zhang
General AI
Low-Rank Adaptation (LoRA) has become a widely adopted parameter-efficient fine-tuning method for large language models, with its effectiveness largely influenced by the allocation of ranks and scaling factors, as well as initialization. Existing LoRA variants typically address only one of these factors, often at the c…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-21 · Yuan Zhuang, Yuexin Bian, Sihong He, Jie Feng, Qing Su, Songyang Han, Jonathan Petit, Shihao Ji, Yuanyuan Shi, Fei Miao
General AI
Scaling critic capacity is a promising direction for enhancing off-policy reinforcement learning (RL). However, larger critics are prone to overfitting and unstable in replay-buffer-based bootstrap training. This paper leverages Low-Rank Adaptation (LoRA) as a structural-sparsity regularizer for off-policy critics. Our…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-21 · Mehdi Maboudi, Said Harb, Jackson Ferrao, Kourosh Khoshelham, Yelda Turkan, Karam Mawas
General AI
Point cloud registration involves aligning one point cloud with another or with a three-dimensional (3D) model, enabling the integration of multimodal data into a unified representation. This is essential in applications such as construction monitoring, autonomous driving, robotics, and virtual or augmented reality (VR…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-21 · Abhinav Agarwal
General AI
LLM-assisted defect discovery has a precision crisis: plausible-but-wrong reports overwhelm maintainers and degrade credibility for real findings. We present Refute-or-Promote, an inference-time reliability pattern combining Stratified Context Hunting (SCH) for candidate generation, adversarial kill mandates, context a…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-23 · Praval Sharma, Ashok Samal, Leen-Kiat Soh, Deepti Joshi
General AI
Event extraction identifies the central aspects of events from text. It supports event understanding and analysis, which is crucial for tasks such as informed decision-making in emergencies. Therefore, it is necessary to develop automated event extraction approaches. However, existing datasets for algorithm development…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-23 · Alibay Osmanli, Zixu Cheng, Shaogang Gong
General AI
Physical video understanding requires more than naming an event correctly. A model can answer a question about pouring, sliding, or collision from textual regularities while still failing to localize the event in time or space. We introduce a grounded benchmark for physical video understanding that extends the what--wh…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-23 · Yen-Siang Wu, Rundong Luo, Jingsen Zhu, Tao Tu, Ali Farhadi, Matthew Wallingford, Yu-Chiang Frank Wang, Steve Marschner, Wei-Chiu Ma
General AI
How can we tell whether a video has been sped up or slowed down? How can we generate videos at different speeds? Although videos have been central to modern computer vision research, little attention has been paid to perceiving and controlling the passage of time. In this paper, we study time as a learnable visual conc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-23 · Jun Wang, Ziyin Zhang, Rui Wang, Hang Yu, Peng Di
General AI
Real-time detection and mitigation of technical anomalies are critical for large-scale cloud-native services, where even minutes of downtime can result in massive financial losses and diminished user trust. While customer incidents serve as a vital signal for discovering risks missed by monitoring, extracting actionabl…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-24 · Negar Arabzadeh, Andrew Drozdov, Michael Bendersky, Matei Zaharia
General AI
Large Language Models (LLMs) have made query reformulation ubiquitous in modern retrieval and Retrieval-Augmented Generation (RAG) pipelines, enabling the generation of multiple semantically equivalent query variants. However, executing the full pipeline for every reformulation is computationally expensive, motivating …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-24 · Hyo Jin Jon, Longbin Jin, Eun Yi Kim
General AI
CLIP has demonstrated strong generalization in visual domains through natural language supervision, even for video action recognition. However, most existing approaches that adapt CLIP for action recognition have primarily focused on temporal modeling, often overlooking spatial perception. In real-world scenarios, visu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-24 · Chengyang Li, Kaiyi Xiong, Yuan Xu, Lei Qian, Yizhou Wang, Wentao Zhu
General AI
Embodied foundation models have achieved significant breakthroughs in robotic manipulation, yet they still depend heavily on large-scale robot demonstrations. Although recent works have explored leveraging human data to alleviate this dependency, effectively extracting transferable knowledge remains a significant chall…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-24 · Manyi Zhang, Ji-Fu Li, Zhongao Sun, Xiaohao Liu, Zhenhua Dong, Xianzhi Yu, Haoli Bai, Xiaobo Xia
General AI
Autonomous agent systems such as OpenClaw introduce significant efficiency challenges due to long-context inputs and multi-turn reasoning. This results in prohibitively high computational and monetary costs in real-world development. While quantization is a standard approach for reducing cost and latency, its impact on…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-27 · Zhou Ziheng, Huacong Tang, Jinyuan Zhang, Haowei Lin, Bangcheng Yang, Qian Long, Fang Sun, Yizhou Sun, Yitao Liang, Ying Nian Wu, Demetri Terzopoulos, Xiaofeng Gao
Research Track A · General AI
Discovering causal regularities and applying them to build functional systems--the discovery-to-application loop--is a hallmark of general intelligence, yet evaluating this capacity has been hindered by the vast complexity gap between scientific discovery and real-world engineering. We introduce SciCrafter, a Minecraft…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-27 · Amal Akli, Mike Papadakis, Maxime Cordy, Yves Le Traon
General AI
Large language models are widely used for code generation, yet they rely on an implicit assumption that the task descriptions are sufficiently detailed and well-formed. However, in practice, users may provide defective descriptions, which can have a strong effect on code correctness. To address this issue, we develop S…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-27 · Shiyi Zhang, Yiji Cheng, Tiankai Hang, Zijin Yin, Runze He, Yu Xu, Wenxun Dai, Yunlong Lin, Chunyu Wang, Qinglin Lu, Yansong Tang
General AI
Unified multi-modal understanding/generative models have shown improved image editing performance by incorporating fine-grained understanding into their Chain-of-Thought (CoT) process. However, a critical question remains underexplored: what forms of CoT and training strategy can jointly enhance both the understanding …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-05-12 · Yabo Zhang, Kunchang Li, Dewei Zhou, Xinyu Huang, Xun Wang
General AI
While recent advancements in multimodal language models have enabled image generation from expressive multi-image instructions, existing methods struggle to maintain performance under complex interleaved instructions. This limitation stems from the structural separation of images and text in current paradigms, which fo…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.2
2026-04-29 · Zylan Benjert, Júlia Komjáthy, Johannes Lengler, John Lapinskas, Ulysse Schaller
General AI
It is a fundamental question in epidemiology to estimate, model and predict the growth rate of a pandemic. Analogously, analysing the diffusion of innovation, (fake) news, memes, and rumours is of key importance in the social sciences. The resulting epidemic growth curves can be classified according to their growth rat…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.2
2026-04-29 · Manar Aljohani, Brandon Ho, Kenneth McKinley, Dennis Ren, Xuan Wang
General AI
Accurate and consistent Emergency Severity Index (ESI) assignment remains a persistent challenge in emergency departments, where highly variable free-text triage documentation contributes to mistriage and workflow inefficiencies. This study evaluates whether open-source small language models (SLMs) can serve as reliabl…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.2
2026-04-29 · Yiqi Liu, Noelle Crawford, Michael Wang, Jilong Xue, Jian Huang
General AI
To overcome the well-known memory bottleneck of AI chips, 3D stacked architectures that employ advanced packaging technology with high-density through-silicon vias (TSVs) pins have proven to be a promising solution. The 3D-stacked AI chip enables ultra-high memory bandwidth between compute and memory by stacking numero…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.2
2026-04-30 · Lincan Li, Zheng Chen, Yushun Dong
General AI
Electroencephalogram (EEG) signals are vital for automated seizure detection, but their inherent noise makes robust representation learning challenging. Existing graph construction methods, whether correlation-based or learning-based, often generate redundant or irrelevant edges due to the noisy nature of EEG data. Thi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.2
2026-05-01 · Jinpai Zhao, Nishant Panda, Yen Ting Lin, Eirik Valseth, Diane Oyen, Clint Dawson
General AI
We introduce HyCOP, a modular framework that learns parametric PDE solution operators by composing simple modules (advection, diffusion, learned closures, boundary handling) in a query-conditioned way. Rather than learning a monolithic map, HyCOP learns a policy over short programs - which module to apply and for how l…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.2
2026-05-04 · Mingming Zha, Xiaofeng Wang
General AI
Autonomous LLM agents operate as long-running processes with persistent workspaces, memory files, scheduled task state, and messaging integrations. These features create a new propagation risk: attacker-influenced content can be written into persistent agent state, re-enter the LLM decision context through scheduled au…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.2
2026-05-04 · Quang Hieu Pham, Yang He, Ping Nie, Canwen Xu, Davood Rafiei, Yuepeng Wang, Xi Ye, Jocelyn Qiaochu Chen
General AI
Text-to-SQL over large analytical databases requires navigating complex schemas, resolving ambiguous queries, and grounding decisions in actual data. Most current systems follow a fixed pipeline where schema elements are retrieved once upfront and the database is only revisited for post-hoc repair, limiting recovery fr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.2
2026-05-04 · Tienyu Chang, Zhen Chen, Renjie Liang, Jinyu Ding, Jie Xu, Sunu Mathew, Amir Reza Hajrasouliha, Andrew J. Saykin, Ruogu Fang, Yu Huang, Jiang Bian, Qingyu Chen
General AI
The advent of foundation models has heralded a new era in medical artificial intelligence (AI), enabling the extraction of generalizable representations from large-scale unlabeled datasets. However, current ophthalmic AI paradigms are predominantly constrained to single-modality inference, thereby creating a dissonance…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.2
2026-05-04 · Yingtian Shi, Abivishaq Balasubramanian, Jessica Herring, Jiachen Li, Juan Macias Romero, Rosemarie Santa Gonzalez, Varun Mishra, Agata Rozga, Xiang Zhi Tan, Thomas Plötz
General AI
Human activity recognition (HAR) in smart homes remains challenging because many daily activities exhibit similar local sensor patterns, while minimally intrusive sensing provides sparse and ambiguous observations. As a result, methods based on short temporal or event windows often fail to capture the broader temporal …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.2
2026-05-04 · Pehuén Moure, Niclas Pokel, Bilal Bounajma, Yingqiang Gao, Roman Boehringer, Longbiao Cheng, Shih-Chii Liu
General AI
Automatic speech recognition (ASR) systems remain brittle on dysarthric and other atypical speech. Recent audio-language models raise the possibility of improving performance by conditioning on additional clinical context at inference time, but it is unclear whether these models can make use of such information. We int…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.0
2026-01-14 · Saber Zerhoudi, Michael Granitzer
Research Track B · General AI
A fundamental tension exists between the demand for sophisticated AI assistance in web search and the need for user data privacy. Current centralized models require users to transmit sensitive browsing data to external services, which limits user control. In this paper, we present a browser extension that provides a vi…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.0
2026-03-04 · Soroush Nasiriany, Sepehr Nasiriany, Abhiram Maddukuri, Yuke Zhu
Research Track A · General AI
Recent advances in robot learning have accelerated progress toward generalist robots that can perform everyday tasks in human environments. Yet it remains difficult to gauge how close we are to this vision. The field lacks a reproducible, large-scale benchmark for systematic evaluation. To fill this gap, we present Rob…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 10.0
2026-03-14 · Seokmin Lee, Yunghee Lee, Byeonghyun Pak, Byeongju Woo
General AI
For robotic agents operating in dynamic environments, learning visual state representations from streaming video observations is essential for sequential decision making. Recent self-supervised learning methods have shown strong transferability across vision tasks, but they do not explicitly address what a good visual …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 10.0
2026-03-15 · Lok-Lam Ieong, Chia-Chien Chen, Chih-Kai Yang, Yu-Han Huang, An-Yu Cheng, Hung-yi Lee
General AI
Chain-of-thought (CoT) prompting has been extended to large audio-language models (LALMs) to elicit reasoning, yet enhancing its effectiveness without training remains challenging. We study inference-time model steering as a training-free approach to improve LALM reasoning. We introduce three strategies using diverse i…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 10.0
2026-03-19 · Ke-Han Lu, Szu-Wei Fu, Chao-Han Huck Yang, Zhehuai Chen, Sung-Feng Huang, Chih-Kai Yang, Yi-Cheng Lin, Chi-Yuan Hsiao, Wenze Ren, En-Pei Hu, Yu-Han Huang, An-Yu Cheng, Cheng-Han Chiang, Yu Tsao, Yu-Chiang Frank Wang, Hung-yi Lee
General AI
Large language models (LLMs) have been widely used as knowledge backbones of Large Audio Language Models (LALMs), yet how much auditory knowledge they encode through text-only pre-training and how this affects downstream performance remains unclear. We study this gap by comparing different LLMs under two text-only and …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.0
2026-03-22 · Liang Ding
Research Track B · General AI
LLM agents fail on the majority of real-world tasks -- GPT-4o succeeds on fewer than 15% of WebArena navigation tasks and below 55% pass@1 on ToolBench (Zhou et al., 2024; Qin et al., 2024) -- yet every failed trajectory is routinely discarded, wasting the dominant source of collected experience. We introduce AgentHER,…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.0
2026-03-24 · Wanying Mo, Jijia Lai, Xiaoming Wang
Research Track B · General AI
Browser agents built on LLMs can act in web interfaces, yet most remain confined to a single chat surface (e.g., a sidebar). This mismatch with real browsing can increase context-switching and reduce user control. We introduce \textbf{IntentWeave}, a design space of ten spatial paradigms for embedding agentic assistanc…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.0
2026-03-26 · Jiaqing Zhang, Hao Wang, Mingjia Yin, Bo Chen, Qinglin Jia, Rui Zhou, Ruiming Tang, ChaoYi Ma, Enhong Chen
Research Track A · General AI
Modern deep recommender models are trained under a continual learning paradigm, relying on massive and continuously growing streaming behavioral logs. In large-scale platforms, retraining models on full historical data for architecture comparison or iteration is prohibitively expensive, severely slowing down model deve…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 10.0
2026-03-27 · Zhaochong An, Orest Kupyn, Théo Uscidda, Andrea Colaco, Karan Ahuja, Serge Belongie, Mar Gonzalez-Franco, Marta Tintore Gazulla
General AI
Large-scale video diffusion models achieve impressive visual quality, yet often fail to preserve geometric consistency. Prior approaches improve consistency either by augmenting the generator with additional modules or applying geometry-aware alignment. However, architectural modifications can compromise the generaliza…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.0
2026-03-30 · Zhang Li, Zhibo Lin, Qiang Liu, Ziyang Zhang, Shuo Zhang, Zidun Guo, Jiajun Song, Jiarui Zhang, Xiang Bai, Yuliang Liu
General AI
We introduce Multilingual Document Parsing Benchmark, the first benchmark for multilingual digital and photographed document parsing. Document parsing has made remarkable strides, yet almost exclusively on clean, digital, well-formatted pages in a handful of dominant languages. No systematic benchmark exists to evaluat…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.0
2026-03-31 · Qiyao Wang, Hongbo Wang, Longze Chen, Zhihao Yang, Guhong Chen, Hamid Alinejad-Rokny, Hui Li, Yuan Lin, Min Yang
General AI
Scientific idea generation (SIG) is critical to AI-driven autonomous research, yet existing approaches are often constrained by a static retrieval-then-generation paradigm, leading to homogeneous and insufficiently divergent ideas. In this work, we propose FlowPIE, a tightly coupled retrieval-generation framework that …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.0
2026-03-31 · Patrice Bechard, Orlando Marquez Ayala, Emily Chen, Jordan Skelton, Sagar Davasam, Srinivas Sunkara, Vikas Yadav, Sai Rajeswar
Research Track B · General AI
There has been growing interest in building agents that can interact with digital platforms to execute meaningful enterprise tasks autonomously. Among the approaches explored are tool-augmented agents built on abstractions such as Model Context Protocol (MCP) and web agents that operate through graphical interfaces. Ye…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.0
2026-04-06 · Haoxuan Han, Weijie Wang, Zeyu Zhang, Yefei He, Bohan Zhuang
Research Track A · General AI
Recent advancements in Vision-Language Models (VLMs) have significantly pushed the boundaries of Visual Question Answering (VQA).However,high-resolution details can sometimes become noise that leads to hallucinations or reasoning errors. In this paper,we propose Degradation-Driven Prompting (DDP), a novel framework tha…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.0
2026-04-08 · Yuechen Jiang, Enze Zhang, Md Mohsinul Kabir, Qianqian Xie, Stavroula Golfomitsou, Konstantinos Arvanitis, Sophia Ananiadou
General AI
Recent advances in vision-language models (VLMs) have improved image captioning for cultural heritage. However, inferring structured cultural metadata (e.g., creator, origin, period) from visual input remains underexplored. We introduce a multi-category, cross-cultural benchmark for this task and evaluate VLMs using an…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.0
2026-04-08 · Jianhui Liu, Haoze Sun, Wenbo Li, Yanbing Zhang, Rui Yang, Zhiliang Zhu, Yijun Yang, Shenghe Zheng, Nan Jiang, Jiaxiu Jiang, Haoyang Huang, Tien-Tsin Wong, Nan Duan, Xiaojuan Qi
General AI
Spatial understanding is a fundamental cornerstone of human-level intelligence. Nonetheless, current research predominantly focuses on domain-specific data production, leaving a critical void: the absence of a principled, open-source engine capable of fully unleashing the potential of high-quality spatial data. To brid…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.0
2026-04-27 · NVIDIA, Amala Sanjay Deshmukh, Kateryna Chumachenko, Tuomas Rintamaki, Matthieu Le, Tyler Poon, Danial Mohseni Taheri, Ilia Karmanov, Guilin Liu, Jarno Seppanen, Arushi Goel, Mike Ranzinger, Greg Heinrich, Guo Chen, Lukas Voegtle, Philipp Fischer, Timo Roman, Karan Sapra, Collin McCarthy, Shaokun Zhang, Fuxiao Liu, Hanrong Ye, Yi Dong, Mingjie Liu, Yifan Peng, Piotr Zelasko, Zhehuai Chen, Nithin Rao Koluguri, Nune Tadevosyan, Lilit Grigoryan, Ehsan Hosseini Asl, Pritam Biswas, Leili Tavabi, Yuanhang Su, Zhiding Yu, Peter Jin, Alexandre Milesi, Netanel Haber, Yao Xu, Sarah Amiraslani, Nabin Mulepati, Eric Tramel, Jaehun Jung, Ximing Lu, Brandon Cui, Jin Xu, Zhiqi Li, Shihao Wang, Yuanguo Kuang, Huck Yang, Boyi Li, Hongxu Yin, Song Han, Pavlo Molchanov, Adi Renduchintala, Charles Wang, David Mosallanezhad, Soumye Singhal, Luis Vega, Katherine Cheung, Sreyan Ghosh, Yian Zhang, Alexander Bukharin, Venkat Srinivasan, Johnny Greco, Andre Manoel, Maarten Van Segbroeck, Suseella Panguliri, Rohit Watve, Divyanshu Kakwani, Shubham Pachori, Jeffrey Glick, Radha Sri-Tharan, Aileen Zaman, Khanh Nguyen, Shi Chen, Jiaheng Fang, Qing Miao, Wenfei Zhou, Yu Wang, Zaid Pervaiz Bhat, Varun Praveen, Arihant Jain, Ramanathan Arunachalam, Tomasz Kornuta, Ashton Sharabiani, Amy Shen, Wei Huang, Yi-Fu Wu, Ali Roshan Ghias, Huiying Li, Brian Yu, Nima Tajbakhsh, Chen Cui, Wenwen Gao, Li Ding, Terry Kong, Manoj Kilaru, Anahita Bhiwandiwalla, Marek Wawrzos, Daniel Korzekwa, Pablo Ribalta, Grzegorz Chlebus, Besmira Nushi, Ewa Dobrowolska, Maciej Jakub Mikulski, Kunal Dhawan, Steve Huang, Jagadeesh Balam, Yongqiang Wang, Nikolay Karpov, Valentin Mendelev, George Zelenfroynd, Meline Mkrtchyan, Omri Almog, Bhavesh Pawar, Rameshwar Shivbhakta, Sudeep Sabnis, Ashrton Sharabiani, Negar Habibi, Geethapriya Venkataramani, Pamela Peng, Prerit Rodney, Serge Panev, Richard Mazzarese, Nicky Liu, Michael Fukuyama, Andrii Skliar, Roger Waleffe, Duncan Riach, Yunheng Zou, Jian Hu, Hao Zhang, Binfeng Xu, Yuhao Yang, Zuhair Ahmed, Carlo del Mundo, Chad Voegele, Zhiyu Cheng, Nave Assaf, Daniel Afrimi, Natan Bagrov, Ran Zilberstein, Ofri Masad, Eugene Khvedchenia, Borys Tymchenko, Tomer Asida, Parth Mannan, Victor Cui, Michael Evans, Katherine Luna, Jie Lou, Pinky Xu, Guyue Huang, Michael Boone, Pradeep Thalasta, Adeola Adesoba, Dina Yared, Christopher Parisien, Leon Derczynski, Shaona Ghosh, Wes Feely, Micah Schaffer, Barnaby Simkin, Tomasz Grzegorzek, Rishabh Garg, Aastha Jhunjhunwala, Sergei Kolchenko, Farzan Memarian, Haran Kumar, Shiv Kumar, Isabel Hulseman, Anjali Shah, Kari Briski, Padmavathy Subramanian, Joey Conway, Udi Karpas, Jane Polak Scowcroft, Annie Surla, Shilpa Ammireddy, Ellie Evans, Jesse Oliver, Tom Balough, Chia-Chih Chen, Sandip Bhaskar, Alejandra Rico, Bardiya Sadeghi, Seph Mard, Meredith Price, Laya Sleiman, Saori Kaji, Wesley Helmholz, Wendy Quan, Michael Lightstone, Jonathan Cohen, Jian Zhang, Oleksii Kuchaiev, Boris Ginsburg, Jan Kautz, Eileen Long, Mohammad Shoeybi, Mostofa Patwary, Oluwatobi Olabiyi, Andrew Tao, Bryan Catanzaro
Research Track B · General AI
We introduce Nemotron 3 Nano Omni, the latest model in the Nemotron multimodal series and the first to natively support audio inputs alongside text, images, and video. Nemotron 3 Nano Omni delivers consistent accuracy improvements over its predecessor, Nemotron Nano V2 VL, across all modalities, enabled by advances in …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 10.0
2026-04-28 · Arnon Mazza, Elad Levi
General AI
Deploying guardrails for custom policies remains challenging, as generic safety models fail to capture task-specific requirements, while prompting LLMs suffers from inconsistent boundary-case performance and high inference costs. Training custom classifiers achieves both accuracy and efficiency, yet demands substantial…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2025-12-08 · Alisha Ukani, Hamed Haddadi, Ali Shahin Shamsabadi, Peter Snyder
Research Track B · General AI
This paper presents a systematic evaluation of the privacy behaviors and attributes of eight recent, popular browser agents. Browser agents are software that automate Web browsing using large language models and ancillary tooling. However, the automated capabilities that make browser agents powerful also make them high…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.8
2026-03-03 · Patrick J. Mineault, Thomas L. Griffiths, Sean Escola
Research Track A · General AI
We propose that the jagged intelligence landscape of modern AI systems arises from a missing training signal that we call "cognitive dark matter" (CDM): brain functions that meaningfully shape behavior yet are hard to infer from behavior alone. We identify key CDM domains-metacognition, cognitive flexibility, episodic …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.8
2026-03-26 · Geeyang Tay, Wentao Ma, Jaewon Lee, Yuzhi Tang, Daniel Lee, Weisu Yin, Dongming Shen, Silin Meng, Yi Zhu, Mu Li, Alex Smola
General AI
Automatic speech recognition (ASR) systems have achieved near-human accuracy on curated benchmarks, yet still fail in real-world voice agents under conditions that current evaluations do not systematically cover. Without diagnostic tools that isolate specific failure factors, practitioners cannot anticipate which condi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-03-26 · Yan Li, Zezi Zeng, Ziwei Zhou, Xin Gao, Muzhao Tian, Yifan Yang, Mingxi Cheng, Qi Dai, Yuqing Yang, Lili Qiu, Zhendong Wang, Zhengyuan Yang, Xue Yang, Lijuan Wang, Ji Li, Chong Luo
General AI
Recent advances in image generation models have expanded their applications beyond aesthetic imagery toward practical visual content creation. However, existing benchmarks mainly focus on natural image synthesis and fail to systematically evaluate models under the structured and multi-constraint requirements of real-wo…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-03-26 · Hai X. Pham, David T. Hoffmann, Ricardo Guerrero, Brais Martinez
General AI
Contrastive vision-language (V&L) models remain a popular choice for various applications. However, several limitations have emerged, most notably the limited ability of V&L models to learn compositional representations. Prior methods often addressed this limitation by generating custom training data to obtain hard neg…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-03-26 · Yuxing Lu, Xukai Zhao, Wei Wu, Jinzhuo Wang
General AI
The knowledge base in a retrieval-augmented generation (RAG) system is typically assembled once and never revised, even though the facts a query requires are often fragmented across documents and buried in irrelevant content. We argue that the knowledge base should be treated as a trainable component and propose WriteB…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-03-30 · Omer Dahary, Benaya Koren, Daniel Garibi, Daniel Cohen-Or
General AI
Modern Text-to-Image (T2I) diffusion models have achieved remarkable semantic alignment, yet they often suffer from a significant lack of variety, converging on a narrow set of visual solutions for any given prompt. This typicality bias presents a challenge for creative applications that require a wide range of generat…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-03-31 · Yufeng Li, Rrubaa Panchendrarajan, Arkaitz Zubiaga
General AI
Verifiable claim detection asks whether a claim expresses a factual statement that can, in principle, be assessed against external evidence. As an early filtering stage in automated fact-checking, it plays an important role in reducing the burden on downstream verification components. However, existing approaches to cl…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-03-31 · Prasanjit Dey, Soumyabrata Dev, Bianca Schoen-Phelan
General AI
We address the challenge of adapting pre-trained Large Language Models (LLMs) for multivariate time-series analysis, where their deployment is often hindered by prohibitive computational and memory demands. Our solution, One-for-All, introduces Gaussian Rank-Stabilized Low-Rank Adapters (rsLoRA) to enable parameter-eff…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-04-01 · Ankit Grover, Lodovico Giaretta, Rémi Bourgerie, Sarunas Girdzijauskas
General AI
The integration of Graph Neural Networks (GNNs) with Large Language Models (LLMs) has emerged as a promising paradigm for Graph Question Answering (GraphQA). However, effective methods for encoding complex structural information into the LLM's latent space remain an open challenge. Current state-of-the-art architecture…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-04-02 · Sarath Shekkizhar, Romain Cosentino, Adam Earle
General AI
Standard LLM benchmarks evaluate the assistant turn: the model generates a response to an input, a verifier scores correctness, and the analysis ends. This paradigm leaves unmeasured whether the LLM encodes any awareness of what follows the assistant response. We propose user-turn generation as a probe of this gap: giv…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-04-02 · Chongjie Ye, Cheng Cao, Chuanyu Pan, Yiming Hao, Yihao Zhi, Yuanming Hu, Xiaoguang Han
General AI
Recent multimodal large language models have achieved strong performance in unified text and image understanding and generation, yet extending such native capability to 3D remains challenging due to limited data. Compared to abundant 2D imagery, high-quality 3D assets are scarce, making 3D synthesis under-constrained. …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-04-07 · Yanis Labrak, David Grünert, Séverin Baroudi, Jiyun Chun, Pawel Cyrta, Sergio Burdisso, Ahmed Hassoon, David Liu, Adam Rothschild, Reed Van Deusen, Petr Motlicek, Andrew Perrault, Ricard Marxer, Thomas Schaaf
General AI
Long-context audio reasoning is underserved in both training data and evaluation. Existing benchmarks target short-context tasks, and the open-ended generation tasks most relevant to long-context reasoning pose well-known challenges for automatic evaluation. We propose a synthetic data generation pipeline designed to s…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-04-09 · Feng Luo, Yu-Neng Chuang, Guanchu Wang, Zicheng Xu, Xiaotian Han, Tianyi Zhang, Vladimir Braverman
General AI
On-policy distillation (OPD) trains student models under their own induced distribution while leveraging supervision from stronger teachers. We identify a failure mode of OPD: as training progresses, on-policy rollouts can undergo abrupt length inflation, causing truncated trajectories to dominate the training data. Th…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-04-09 · Boyang Zhang, Sebastián G. Acosta, Preston Carlson, Sacha Bron, Pierre-Loïc Doulcet, Simon Suo
General AI
AI agents are changing the requirements for document parsing. What matters is \emph{semantic correctness}: parsed output must preserve the structure and meaning needed for autonomous decisions, including correct table structure, precise chart data, semantically meaningful formatting, and visual grounding. Existing benc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-04-28 · Chu-Cheng Lin, Eugene Ie
General AI
Adapting reasoning models to new tasks during post-training with only output-level supervision stalls under reinforcement learning from verifiable rewards (RLVR) when the initial success probability $p_0$ is small. Using the Tsallis $q$-logarithm, we define a loss family $J_Q$ that interpolates between RLVR (at $q{=}0$…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-04-28 · Zeming Dong, Yuejun Guo, Qiang Hu, Yao Zhang, Maxime Cordy, Hao Liu, Mike Papadakis, Yongqiang Lyu
General AI
Source code and its accompanying comments are complementary yet naturally aligned modalities-code encodes structural logic while comments capture developer intent. However, existing vulnerability detection methods mostly rely on single-modality code representations, overlooking the complementary semantic information em…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-04-28 · Steve Coyne
General AI
Preference-based alignment methods, most prominently Reinforcement Learning with Human Feedback (RLHF), use the judgments of human annotators to shape large language model behaviour. However, the normative role of these judgments is rarely made explicit. I distinguish three conceptual models of that role. The first is …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-04-28 · Even Eilertsen, Vasileios Mavroeidis, Gudmund Grov
General AI
Security analysts are overwhelmed by the volume of alerts and the low context provided by many detection systems. Early-stage investigations typically require manual correlation across multiple log sources, a task that is usually time-consuming. In this paper, we present an experimental, agentic workflow that leverages…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-04-28 · Clinton Enwerem, Shreya Kalyanaraman, John S. Baras, Calin Belta
General AI
Contact variability, sensing uncertainty, and external disturbances make grasp execution stochastic. Expected-quality objectives ignore tail outcomes and often select grasps that fail under adverse contact realizations. Risk-sensitive POMDPs address this failure mode, but many use particle-filter beliefs that scale poo…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-05-07 · Ryan Wang, Akshita Bhagia, Sewon Min
General AI
Large language models are typically deployed as monolithic systems, requiring the full model even when applications need only a narrow subset of capabilities, e.g., code, math, or domain-specific knowledge. Mixture-of-Experts (MoEs) seemingly offer a potential alternative by activating only a subset of experts per inpu…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.8
2026-05-07 · Yixuan Wang, Dan Guralnik, Warren Dixon
General AI
Safety-critical autonomy in adversarial settings demands more than Lyapunov stability of tracking error signals. An agent executing a goal-directed trajectory is intrinsically legible to a passive observer running online Bayesian inference, because the contractive dynamics of any Lyapunov basin of attraction concentrat…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.8
2026-05-07 · Amir Ivry
General AI
Large audio language models (LALMs) are increasingly used to reason over long audio clips, yet deployment often compresses audio before inference to reduce memory and latency. The risk is that compression can leave aggregate accuracy acceptable while sharply degrading answers for a deployment-critical query family. We …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.5
2026-04-09 · Mu Nan, Muquan Yu, Weijian Mai, Jacob S. Prince, Hossein Adeli, Rui Zhang, Jiahang Cao, Benjamin Becker, John A. Pyles, Margaret M. Henderson, Chunfeng Song, Nikolaus Kriegeskorte, Michael J. Tarr, Xiaoqing Hu, Andrew F. Luo
Research Track A · General AI
Visual decoding from brain signals is a key challenge at the intersection of computer vision and neuroscience, requiring methods that bridge neural representations and computational models of vision. A field-wide goal is to achieve generalizable, cross-subject models. A major obstacle towards this goal is the substanti…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.5
2026-04-10 · Aarush Sinha, Arion Das, Soumyadeep Nag, Charan Karnati, Shravani Nag, Chandra Vadhan Raj, Aman Chadha, Vinija Jain, Suranjana Trivedy, Amitava Das
General AI
As large language models (LLMs) are increasingly deployed as autonomous agents, understanding how strategic behavior emerges in multi-agent environments has become an important alignment challenge. We take a neutral empirical stance and construct a controlled environment in which strategic behavior can be directly obse…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.5
2026-04-13 · Shuquan Lian, Juncheng Liu, Yazhe Chen, Yuhong Chen, Hui Li
General AI
Prior representative ReAct-style approaches in autonomous Software Engineering (SWE) typically lack the explicit System-2 reasoning required for deep analysis and handling complex edge cases. While recent reasoning models demonstrate the potential of extended Chain-of-Thought (CoT), applying them to the multi-turn SWE …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.5
2026-04-15 · Julian Killingback, Ofer Meshi, Henry Li, Hamed Zamani, Maryam Karimzadehgan
Research Track A · General AI
Traditional Retrieval-Augmented Generation (RAG) approaches generally assume that retrieval and generation occur on powerful servers removed from the end user. While this reduces local hardware constraints, it introduces significant drawbacks: privacy concerns regarding data access, recurring maintenance and storage co…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.5
2026-04-16 · Ido Galil, Moshe Kimhi, Ran El-Yaniv
General AI
Deep Neural Networks (DNNs) can be catastrophically disrupted by flipping only a handful of parameter bits. We introduce Deep Neural Lesion (DNL), a data-free and optimizationfree method that locates critical parameters, and an enhanced single-pass variant, 1P-DNL, that refines this selection with one forward and backw…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.5
2026-04-16 · Yifu Chen, Shengpeng Ji, Qian Chen, Tianle Liang, Yangzhuo Li, Ziqing Wang, Wen Wang, Jingyu Lu, Haoxiao Wang, Xueyi Pu, Fan Zhuo, Zhou Zhao
General AI
End-to-end spoken dialogue models have garnered significant attention because they offer a higher potential ceiling in expressiveness and perceptual ability than cascaded systems. However, the intelligence and expressiveness of current open-source spoken dialogue models often remain below expectations. Motivated by the…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.5
2026-04-19 · Qingcheng Zeng, Yuheng Lu, Zeqi Zhou, Heli Qi, Puxuan Yu, Fuheng Zhao, Hitomi Yanaka, Weihao Xuan, Naoto Yokoya
General AI
Code-switching is a pervasive linguistic phenomenon in global communication, yet modern information retrieval systems remain predominantly designed for, and evaluated within, monolingual contexts. To bridge this critical disconnect, we present a holistic study dedicated to code-switching IR. We introduce CSR-L (Code-Sw…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.5
2026-04-20 · Qingcheng Zeng, Puxuan Yu, Aman Mehta, Fuheng Zhao, Rajhans Samdani
General AI
Instruction-following information retrieval (IF-IR) studies retrieval systems that must not only find documents relevant to a query, but also obey explicit user constraints such as required attributes, exclusions, or output preferences. However, most retrievers are trained primarily for semantic relevance and often fai…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.5
2026-04-22 · Wenhong Zhu, Ruobing Xie, Rui Wang, Pengfei Liu
General AI
Knowledge distillation (KD) is a powerful paradigm for compressing large language models (LLMs), whose effectiveness depends on intertwined choices of divergence direction, optimization strategy, and data regime. We break down the design of existing KD methods and present a unified view that establishes connections bet…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.5
2026-04-22 · Daniele Corradetti, Renato Corradetti
Research Track A · General AI
We present a biologically detailed extension of the classical Hopfield/Marr auto-associative memory model for CA3, implementing ten populations (two asymmetric pyramidal subtypes, eight GABAergic interneuron classes), forty-seven compartments, multi-rule plasticity (recurrent Hebb, BCM anti-saturation, mossy-fiber shor…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-04-23 · Yao Zhang, Zhuchenyang Liu, Thomas Ploetz, Yu Xiao
General AI
The world knowledge and reasoning capabilities of text-based large language models (LLMs) are advancing rapidly, yet current approaches to human motion understanding, including motion question answering and captioning, have not fully exploited these capabilities. Existing LLM-based methods typically learn motion-langua…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.5
2026-04-23 · Kwan Yun, Changmin Lee, Ayeong Jeong, Youngseo Kim, Seungmi Lee, Junyong Noh
General AI
Creative face stylization aims to render portraits in diverse visual idioms such as cartoons, sketches, and paintings while retaining recognizable identity. However, current identity encoders, which are typically trained and calibrated on natural photographs, exhibit severe brittleness under stylization. They often mis…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-04-24 · Yaxuan Li, Zhongyi Zhou, Yefei Chen, Yaokai Xue, Yichen Zhu
General AI
Evaluating robotics policies across thousands of environments and thousands of tasks is infeasible with existing approaches. This motivates the need for a new methodology for scalable robotics policy evaluation. In this paper, we propose dWorldEval, which uses a discrete diffusion world model as a scalable evaluation p…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.5
2026-04-27 · Mohammadmehdi Ataei, Farzaneh Askari, Kamal Rahimi Malekshan, Pradeep Kumar Jayaraman
General AI
Computer-Aided Design (CAD) models are defined by their construction history: a parametric recipe that encodes design intent. However, existing large-scale 3D datasets predominantly consist of boundary representations (B-Reps) or meshes, stripping away this critical procedural information. To address this scarcity, we …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-05-06 · Yu-Cheng Lin, Yu-Chao Hsu, I-Shan Tsai, Chun-Hua Lin, Kuo-Chung Peng, Jiun-Cheng Jiang, Yun-Yuan Wang, Tzung-Chi Huang, Tai-Yue Li, Kuan-Cheng Chen, Samuel Yen-Chi Chen, Nan-Yow Chen
General AI
High-performance computing (HPC) is increasingly important for scalable quantum chemistry workflows that couple classical generative models, quantum circuit simulation, and selected configuration interaction postprocessing. We present the generative quantum-inspired Kolmogorov-Arnold eigensolver (GQKAE), a parameter-ef…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.5
2026-05-12 · Jiafei Lyu, Zichuan Lin, Scott Fujimoto, Kai Yang, Yangkun Chen, Saiyong Yang, Zongqing Lu, Deheng Ye
General AI
Model-based representations recently stand out as a promising framework that embeds latent dynamics information into the representations for downstream off-policy actor-critic learning. It implicitly combines the advantages of both model-free and model-based approaches while avoiding the training costs associated with …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.5
2026-05-12 · Haoyu Zhang, Zeyu Zhang, Zedong Zhou, Yang Zhao, Hao Tang
General AI
Transformer-based 3D reconstruction has emerged as a powerful paradigm for recovering geometry and appearance from multi-view observations, offering strong performance across challenging visual conditions. As these models scale to larger backbones and higher-resolution inputs, improving their efficiency becomes increas…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.4
2026-04-29 · Zhen Zhang, Changyi Yang, Zijie Xia, Zhen Yang, Chengzhi Liu, Zhaotiao Weng, Yepeng Liu, Haobo Chen, Jin Pan, Chenyang Zhao, Yuheng Bu, Alkesh Patel, Zhe Gan, Xin Eric Wang
General AI
Token serves as the fundamental unit of computation in modern autoregressive models, and generation length directly influences both inference cost and reasoning performance. Despite its importance, existing approaches lack fine-grained length modeling, operating primarily at the coarse-grained sequence level. We introd…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.4
2026-05-01 · Minghui Chen, Chenxu Yang, Hengjie Zhu, Dayan Wu, Zheng Lin, Qingyi Si
General AI
Large Vision-Language Models (LVLMs) often suffer from hallucinations, generating descriptions that include visual details absent from the input image. Recent preference alignment methods typically rely on supervision distilled from stronger models such as GPT. However, this offline paradigm introduces a Supervision-Pe…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.4
2026-05-01 · Houyuan Chen, Hong Li, Xianghao Kong, Tianrui Zhu, Shaocong Xu, Weiqing Xiao, Yuwei Guo, Chongjie Ye, Lvmin Zhang, Hao Zhao, Anyi Rao
General AI
Recent progress has shown that video diffusion models (VDMs) can be repurposed for diverse multimodal graphics tasks. However, existing methods often train separate models for each problem setting, which fixes the input-output mapping and limits the modeling of correlations across modalities. We present UniVidX, a unif…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.4
2026-05-04 · Yangfu Li, Yuning Gong, Hongjian Zhan, Teng Li, Yuanhuiyi Lyu, Tianyi Chen, Qi Liu, Ziyuan Huang, Zhihang Zhong, Dandan Zheng, Yue Lu
General AI
Despite the success of Large-Vision Language Models (LVLMs), general optimization objectives (e.g., standard MLE) fail to constrain visual trajectories, leading to language bias and hallucination. To mitigate this, current methods introduce geometric priors from visual experts as additional supervision. However, we obs…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.3
2026-04-12 · Wenhao Zhang, Lin Mu, Li Ni, Peiquan Jin, Yiwen Zhang
General AI
Low-rank adaptation (LoRA) is a widely used strategy for efficient fine-tuning of large language models (LLMs), but its strictly linear structure fundamentally limits expressive capacity. The bilinear formulation of weight updates captures only first-order dependencies between low-rank factors, restricting the modeling…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.3
2026-04-13 · J. Oppliger, M. Stifter, A. Rüegg, I. Biało, L. Martinelli, P. G. Freeman, D. Prabhakaran, J. Zhao, Q. Wang, J. Chang
General AI
Automation underpins progress across scientific and industrial disciplines. Yet, automating tasks requiring interpretation of abstract visual information remain challenging. For example, crystal alignment strongly relies on humans with the ability to comprehend diffraction patterns. Here we introduce an autonomous syst…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-13 · Chenxi Qing, Junxi Wu, Zheng Liu, Yixiang Qiu, Hongyao Yu, Bin Chen, Hao Wu, Shu-Tao Xia
General AI
Recently, large language models (LLMs) are capable of generating highly fluent textual content. While they offer significant convenience to humans, they also introduce various risks, like phishing and academic dishonesty. Numerous research efforts have been dedicated to developing algorithms for detecting AI-generated …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-13 · Wei Zhao, Zhe Li, Peixin Zhang, Jun Sun
General AI
Tool-augmented Large Language Model (LLM) agents have demonstrated impressive capabilities in automating complex, multi-step real-world tasks, yet remain vulnerable to indirect prompt injection. Adversaries exploit this weakness by embedding malicious instructions within tool-returned content, which agents directly inc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-13 · Adam Stein, Davis Brown, Hamed Hassani, Mayur Naik, Eric Wong
General AI
To identify safety violations, auditors often search over large sets of agent traces. This search is difficult because failures are often rare, complex, and sometimes even adversarially hidden and only detectable when multiple traces are analyzed together. These challenges arise in diverse settings such as misuse campa…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-13 · Shiyu Teng, Jiaqing Liu, Hao Sun, Yu Li, Shurong Chai, Ruibo Hou, Tomoko Tateyama, Lanfen Lin, Yen-Wei Chen
General AI
Depression remains widely underdiagnosed and undertreated because stigma and subjective symptom ratings hinder reliable screening. To address this challenge, we propose a coarse-to-fine, multi-stage framework that leverages large language models (LLMs) for accurate and interpretable detection. The pipeline performs bin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-14 · Jiahao Shao, Anam Nawaz Khan, Christopher Brett, Tom Berg, Xueping Li, Bing Yao
General AI
Pathology reports serve as the definitive record for breast cancer staging, yet their unstructured format impedes large-scale data curation. While Large Language Models (LLMs) offer semantic reasoning, their deployment is often limited by high computational costs and hallucination risks. This study introduces a paramet…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.3
2026-04-16 · Zoe Fingleton, Nazanin Siavash, Armin Moin
General AI
In this paper, we focus on automating two of the widely used Verification and Validation (V&V) activities in the Software Development Lifecycle (SDLC): Software testing and software inspection (also known as review). Concerning the former, we concentrate on automated test case generation using Large Language Models (LL…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-16 · Mélanie Roschewitz, Kenneth Styppa, Yitian Tao, Jiwoong Sohn, Jean-Benoit Delbrouck, Benjamin Gundersen, Nicolas Deperrois, Christian Bluethgen, Julia Vogt, Bjoern Menze, Farhad Nooralahzadeh, Michael Krauthammer, Michael Moor
General AI
Vision-language models (VLM) have markedly advanced AI-driven interpretation and reporting of complex medical imaging, such as computed tomography (CT). Yet, existing methods largely relegate clinicians to passive observers of final outputs, offering no interpretable reasoning trace for them to inspect, validate, or re…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-16 · Aihua Li
General AI
Flow matching retains the generation quality of diffusion models while enabling substantially faster inference, making it a compelling paradigm for generative modeling. However, when applied to language modeling, it exhibits fundamental limitations in representing complex latent distributions with irregular geometries,…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-16 · Joel Perca, Luis Sante, Juanpablo Heredia, Joao Rulff, Claudio Silva, Jorge Poco
General AI
Extracting actionable insights from long-duration urban videos is often labor-intensive: analysts must manually sift through raw footage to pinpoint target events or uncover broader behavioral trends. In this work, we present URBANCLIPATLAS, a visual analytics system for exploring long urban videos recorded at street i…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-21 · Nico Baumgart, Markus Lange-Hegermann, Jan Henze
General AI
Efficient semantic access to industrial product data is a key enabler for factory automation and emerging LLM-based agent workflows, where both human engineers and autonomous agents must identify suitable components from highly structured catalogs. However, the vocabulary mismatch between natural-language queries and a…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-21 · Segun Aroyehun, Stephan Lewandowsky, David Garcia
General AI
The pursuit of truth is central to democratic deliberation and governance, yet political discourse reflects varying epistemic orientations, ranging from evidence-based reasoning grounded in verifiable information to intuition-based reasoning rooted in beliefs and subjective interpretation. We introduce a scalable appro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-21 · Boyan Shi, Wei Chen, Shuyuan Zhao, Junfeng Shen, Shengnan Guo, Shaojiang Wang, Huaiyu Wan
General AI
The combination of Mixture-of-Experts (MoE) and Low-Rank Adaptation (LoRA) has shown significant potential for enhancing the multi-task learning capabilities of Large Language Models. However, existing methods face two primary challenges: (1)Imprecise Routing in the current MoE-LoRA method fails to explicitly match inp…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-21 · Austin Coursey, Abel Diaz-Gonzalez, Marcos Quinones-Grueiro, Gautam Biswas
Research Track A · General AI
Reinforcement learning (RL) offers a compelling data-driven paradigm for synthesizing controllers for complex systems when accurate physical models are unavailable; however, most existing control-oriented RL methods assume stationarity and, therefore, struggle in real-world non-stationary deployments where system dynam…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-22 · Guotao Liang, Zhangcheng Wang, Juncheng Hu, Haitao Zhou, Ziteng Xue, Jing Zhang, Dong Xu, Qian Yu
General AI
Multimodal Large Language Models (MLLMs) have shown promising capabilities in generating Scalable Vector Graphics (SVG) via direct code synthesis. However, existing paradigms typically adopt an open-loop "blind drawing" approach, where models generate symbolic code sequences without perceiving intermediate visual outco…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-22 · Hanzhi Liu, Chaofan Shou, Xiaonan Liu, Hongbo Wen, Yanju Chen, Ryan Jingyang Fang, Yu Feng
General AI
LLM agents have begun to find real security vulnerabilities that human auditors and automated fuzzers missed for decades, in source-available targets where the analyst can build and instrument the code. In practice the work is split among several agents, wired together by a harness: the program that fixes which roles e…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-23 · Songen Gu, Yuhang Zheng, Weize Li, Yupeng Zheng, Yating Feng, Xiang Li, Yilun Chen, Pengfei Li, Wenchao Ding
General AI
Recently, end-to-end robotic manipulation models have gained significant attention for their generalizability and scalability. However, they often suffer from limited robustness to camera viewpoint changes when training with a fixed camera. In this paper, we propose VistaBot, a novel framework that integrates feed-forw…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-23 · Pegah Khayatan, Jayneel Parekh, Arnaud Dapogny, Mustafa Shukor, Alasdair Newson, Matthieu Cord
General AI
Despite impressive progress in capabilities of large vision-language models (LVLMs), these systems remain vulnerable to hallucinations, i.e., outputs that are not grounded in the visual input. Prior work has attributed hallucinations in LVLMs to factors such as limitations of the vision backbone or the dominance of the…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-24 · Hong Su
General AI
Autonomous robots operating in open environments need the ability to continuously handle tasks that are not covered by predefined local methods. However, existing approaches often rely on repeated large-language-model (LLM) interaction for uncovered tasks, and even successful executions or observed successful external …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.3
2026-04-24 · Jia Li, Hongyi Deng, Yiran Zhang, Kechi Zhang, Tianqi Shao, Tiankuo Zhao, Weinan Wang, Zhi Jin, Ge Li, Yang Liu, Yingtao Fang, Yihong Dong
General AI
Writing code requires significant time and effort in software development. To automate this process, researchers have made substantial progress using Large Language Models (LLMs) for code generation. Many benchmarks like HumanEval and EvoCodeBench have been created to evaluate LLMs by requiring them to generate code fr…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.3
2026-04-27 · Dibyadip Chatterjee, Zhanzhong Pang, Fadime Sener, Yale Song, Angela Yao
General AI
Streaming video models should respond the moment an event unfolds, not after the moment has passed. Yet existing online VideoQA benchmarks remain largely retrospective. They pause the video at fixed timestamps, pose questions about current or past events, and score models only at those moments. This protocol leaves str…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-27 · Sinin Zhang, Yunfei Xie, Yuxuan Cheng, Haoyu Zhang, Tong Zhang
General AI
Vision-Language Models (VLMs) have demonstrated strong performance on textbook-style physics problems, yet they frequently fail when confronted with dynamic real-world scenarios that require temporal consistency and causal reasoning across frames. We identify two fundamental challenges underlying these failures: (1) sp…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-27 · Zijian Guo, İlker Işık, H. M. Sabbir Ahmad, Wenchao Li
General AI
Specification-guided reinforcement learning (RL) provides a principled framework for encoding complex, temporally extended tasks using formal specifications such as linear temporal logic (LTL). While recent methods have shown promising results, their ability to generalize across unseen specifications and diverse enviro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-28 · Shuxiang Cao, Zijian Zhang, Abhishek Agarwal, Grace Bratrud, Niyaz R. Beysengulov, Daniel C. Cole, Alejandro Gómez Frieiro, Elena O. Glen, Hao Hsu, Gang Huang, Raymond Jow, Greshma Shaji, Tom Lubowe, Ligeng Zhu, Luis Mantilla Calderón, Nicola Pancotti, Joel Pendleton, Brandon Severin, Charles Etienne Staub, Sara Sussman, Antti Vepsäläinen, Neel Rajeshbhai Vora, Yilun Xu, Varinia Bernales, Daniel Bowring, Elica Kyoseva, Ivan Rungger, Giulia Semeghini, Sam Stanwyck, Timothy Costa, Alán Aspuru-Guzik, Krysta Svore
Research Track A · General AI
Quantum computing calibration depends on interpreting experimental data, and calibration plots provide the most universal human-readable representation for this task, yet no systematic evaluation exists of how well vision-language models (VLMs) interpret them. We introduce QCalEval, the first VLM benchmark for quantum …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.2
2026-04-29 · Youyuan Zhang, Jialiang Sun, Hangrui Bi, Chuqin Geng, Wenjie Ma, Zhaoyu Li, Xujie Si
General AI
We introduce DreamProver, an agentic framework that leverages a "wake-sleep" program induction paradigm to discover reusable lemmas for formal theorem proving. Existing approaches either rely on fixed lemma libraries, which limit adaptability, or synthesize highly specific intermediate lemmas tailored to individual the…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.2
2026-04-29 · Fangqiang Fan, Zhicheng Zhao, Xiaoliang Ma, Chenglong Li, Jin Tang
General AI
Fine-grained RGBT image semantic segmentation is crucial for all-weather unmanned aerial vehicle (UAV) scene understanding. However, UAV RGBT semantic segmentation faces two coupled challenges: cross-modal spatial misalignment caused by sensor parallax and platform vibration, and severe semantic confusion among fine-gr…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.2
2026-04-29 · Wenxuan Ye, Yangyang Zhang, Xueli An, Georg Carle, Yunpu Ma
General AI
Small language models (SLMs) offer computational efficiency for scalable deployment, yet they often fall short of the reasoning power exhibited by their larger counterparts (LLMs). To mitigate this gap, current approaches invoke an LLM to generate tokens at points of reasoning divergence, but these external calls intro…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.2
2026-04-29 · Lingfeng Zhang, Xiaoshuai Hao, Xizhou Bu, Yingbo Tang, Hongsheng Li, Jinghui Lu, Xiu-shen Wei, Jiayi Ma, Yu Liu, Jing Zhang, Hangjun Ye, Xiaojun Liang, Long Chen, Wenbo Ding
General AI
Assisting humans in open-world outdoor environments requires robots to translate high-level natural-language intentions into safe, long-horizon, and socially compliant navigation behavior. Existing map-based methods rely on costly pre-built HD maps, while learning-based policies are mostly limited to indoor and short-h…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.2
2026-05-04 · Lingxiao Kong, Cong Yang, Oya Deniz Beyan, Zeyd Boukhers
General AI
Despite significant advances in Reinforcement Learning (RL), model performance remains highly sensitive to algorithm and hyperparameter configurations, while generalization gaps across environments complicate real-world deployment. Although prior work has studied RL generalization, the relative contribution of specific…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.2
2026-05-04 · Vicente Pelechanoa, Antoni Mestre, Manoli Albert, Miriam Gil
General AI
Deciding how to distribute work between humans and AI systems is a central challenge in organisational design. Most approaches treat this as a binary choice, yet the operational reality is richer: humans and AI routinely share tasks or take complementary roles depending on context, fatigue, and the stakes involved. Gov…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.2
2026-05-04 · Haoquan Fang, Jiafei Duan, Donovan Clay, Sam Wang, Shuo Liu, Weikai Huang, Xiang Fan, Wei-Chuan Tsai, Shirui Chen, Yi Ru Wang, Shanli Xing, Jaemin Cho, Jae Sung Park, Ainaz Eftekhar, Peter Sushko, Karen Farley, Angad Wadhwa, Cole Harrison, Winson Han, Ying-Chun Lee, Eli VanderBilt, Rose Hendrix, Suveen Ellawela, Lucas Ngoo, Joyce Chai, Zhongzheng Ren, Ali Farhadi, Dieter Fox, Ranjay Krishna
General AI
Vision-Language-Action (VLA) models aim to provide a single generalist controller for robots, but today's systems fall short on the criteria that matter for real-world deployment. Frontier models are closed, open-weight alternatives are tied to expensive hardware, reasoning-augmented policies pay prohibitive latency fo…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.2
2026-05-04 · Fatih Aksu, Laura Ciuffetti, Francesco Di Feola, Filippo Ruffini, Giulia Romoli, Fabrizia Gelardi, Arturo Chiti, Valerio Guarrasi, Paolo Soda
General AI
Accurate histological differentiation between adenocarcinoma (ADC) and squamous cell carcinoma (SCC) is critical for personalized treatment in non-small cell lung cancer (NSCLC). While [$^{18}$F]FDG PET/CT is a standard tool for the clinical evaluation of lung cancer, its utility is often limited by high costs and radi…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.0
2026-02-22 · Juan Rodriguez, Haotian Zhang, Abhay Puri, Tianyang Zhang, Rishav Pramanik, Meng Lin, Xiaoqing Xie, Marco Terral, Darsh Kaushik, Aly Shariff, Perouz Taslakian, Spandana Gella, Sai Rajeswar, David Vazquez, Christopher Pal, Marco Pedersoli
General AI
We introduce VectorGym, a comprehensive benchmark suite for Scalable Vector Graphics (SVG) that spans generation from text and sketches, complex editing, and visual understanding. VectorGym addresses the lack of realistic, challenging benchmarks aligned with professional design workflows. Our benchmark comprises four t…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.0
2026-03-27 · Antoine Edy, Max Conti, Quentin Macé
General AI
While Late Interaction models exhibit strong retrieval performance, many of their underlying dynamics remain understudied, potentially hiding performance bottlenecks. In this work, we focus on two topics in Late Interaction retrieval: a length bias that arises when using multi-vector scoring, and the similarity distrib…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.0
2026-03-30 · Kailai Feng, Yuxiang Wei, Bo Chen, Yang Pan, Hu Ye, Songwei Liu, Chenqian Yan, Yuan Gao
General AI
Diffusion models have made significant progress in both text-to-image (T2I) generation and text-guided image editing. However, these models are typically built with billions of parameters, leading to high latency and increased deployment challenges. While on-device diffusion models improve efficiency, they largely focu…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.0
2026-03-31 · Shifang Zhao, Yihan Hu, Ying Shan, Yunchao Wei, Xiaodong Cun
General AI
Editing the video content with audio alignment forms a digital human-made art in current social media. However, the time-consuming and repetitive nature of manual video editing has long been a challenge for filmmakers and professional content creators alike. In this paper, we introduce CutClaw, an autonomous multi-agen…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.0
2026-04-06 · Andrew Sellergren, Chufan Gao, Fereshteh Mahvar, Timo Kohlberger, Fayaz Jamil, Madeleine Traverse, Alberto Tono, Bashir Sadjad, Lin Yang, Charles Lau, Liron Yatziv, Tiffany Chen, Bram Sterling, Kenneth Philbrick, Richa Tiwari, Yun Liu, Madhuram Jajoo, Chandrashekar Sankarapu, Swapnil Vispute, Harshad Purandare, Abhishek Bijay Mishra, Sam Schmidgall, Tao Tu, Anil Palepu, Chunjong Park, Tim Strother, Rahul Thapa, Yong Cheng, Preeti Singh, Kat Black, Yossi Matias, Katherine Chou, Avinatan Hassidim, Kavi Goel, Joelle Barral, Tris Warkentin, Shravya Shetty, Dale Webster, Sunny Virmani, David F. Steiner, Can Kirmizibayrak, Daniel Golden
General AI
We introduce MedGemma 1.5 4B, the latest model in the MedGemma collection. MedGemma 1.5 expands on MedGemma 1 by integrating additional capabilities: high-dimensional medical imaging (CT/MRI volumes and histopathology whole slide images), anatomical localization via bounding boxes, multi-timepoint chest X-ray analysis,…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.0
2026-04-09 · Junjie Fei, Jun Chen, Zechun Liu, Yunyang Xiong, Chong Zhou, Wei Wen, Junlin Han, Mingchen Zhuge, Saksham Suri, Qi Qian, Shuming Liu, Lemeng Wu, Raghuraman Krishnamoorthi, Vikas Chandra, Mohamed Elhoseiny, Chenchen Zhu
General AI
Adapting Multimodal Large Language Models (MLLMs) for hour-long videos is bottlenecked by context limits. Dense visual streams saturate token budgets and exacerbate the lost-in-the-middle phenomenon. Existing heuristics, like sparse sampling or uniform pooling, blindly sacrifice fidelity by discarding decisive moments …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.0
2026-04-27 · Phung Gia Huy, Hai An Vu, Minh-Phuc Truong, Thang Duc Tran, Linh Ngo Van, Thanh Hong Nguyen, Trung Le
Research Track A · General AI
Representation learning is fundamental to NLP, but building embeddings that work well at different computational budgets is challenging. Matryoshka Representation Learning (MRL) offers a flexible inference paradigm through nested embeddings; however, learning such structures requires explicit coordination of how inform…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.0
2026-05-06 · Ivan Bondarenko, Roman Derunets, Oleg Sedukhin, Mikhail Komarov, Ivan Chernov, Mikhail Kulakov
General AI
We present our winning system for Task~B (generation with reference passages) in SemEval-2026 Task~8: MTRAGEval. Our method is a heterogeneous ensemble of seven LLMs with two prompting variants, where a GPT-4o-mini judge selects the best candidate per instance. We ranked 1st out of 26 teams, achieving a conditioned har…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.0
2026-05-07 · Zhengru Fang, Yanan Ma, Yu Guo, Senkang Hu, Yixian Zhang, Hangcheng Cao, Wenbo Ding, Yuguang Fang
Research Track A · General AI
When a chest X-ray shows consolidation but the question asks which finding is present, a medical vision-language model may answer "No consolidation." This is more than an incorrect choice: it is a polarity reversal that emits a clinical statement contradicting the image. We study this failure as negated-option attracti…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.0
2026-05-07 · Ajay Jaiswal, Lauren Hannah, Han-Byul Kim, Duc Hoang, Mehrdad Farajtabar, Minsik Cho
General AI
We revisit a universally accepted but under-examined design choice in every modern LLM: a token index is looked up once at the input embedding layer and then permanently discarded. This single-injection assumption induces two structural failures: (i) the Rare Token Problem, where a Zipf-type distribution of vocabulary …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-03-23 · Donald Shenaj, Federico Errica, Antonio Carta
General AI
Low Rank Adaptation (LoRA) is the de facto fine-tuning strategy to generate personalized images from pre-trained diffusion models. Choosing a good rank is extremely critical, since it trades off performance and memory consumption, but today the decision is often left to the community's consensus, regardless of the pers…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-03-24 · Wei Sun, Ting Wang, Xinran Tian, Wanshun Lan, Xuhan Feng, Haoyue Li, Fangxin Wang
General AI
Existing LLM-based Kubernetes diagnostic systems cannot learn from operational experience, operating on static knowledge bases without improving from past resolutions. We present MetaKube, an experience-aware LLM framework through three synergistic innovations: (1) an Episodic Pattern Memory Network (EPMN) that abstrac…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-03-26 · Moiz Sadiq Awan, Muhammad Haris Noor, Muhammad Salman Munaf
Research Track A · General AI
Automated benchmarks dominate the evaluation of large language models, yet no systematic study has compared user satisfaction, adoption motivations, and frustrations across competing platforms using a consistent instrument. We address this gap with a cross-platform survey of 388 active AI chat users, comparing satisfac…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.8
2026-03-26 · Sk Miraj Ahmed, Xi Yu, Yunqi Li, Yuewei Lin, Wei Xu
General AI
Accurate biodiversity identification from large-scale field data is a foundational problem with direct impact on ecology, conservation, and environmental monitoring. In practice, the core task is taxonomic prediction - inferring order, family, genus, or species from imperfect inputs such as specimen images, DNA barcode…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.8
2026-03-26 · Lei Wang, YuXin Song, Ge Wu, Haocheng Feng, Hang Zhou, Jingdong Wang, Yaxing Wang, jian Yang
General AI
Reference-to-video (R2V) generation is a controllable video synthesis paradigm that constrains the generation process using both text prompts and reference images, enabling applications such as personalized advertising and virtual try-on. In practice, existing R2V methods typically introduce additional high-level seman…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.8
2026-03-29 · Fengxiang Li, Han Zhang, Haoyang Huang, Jinghui Wang, Jinhua Hao, Kun Yuan, Mengtong Li, Minglei Zhang, Pengcheng Xu, Wenhao Zhuang, Yizhen Shao, Zongxian Feng, Can Tang, Chao Wang, Chengxiao Tong, Fan Yang, Gang Xiong, Haixuan Gao, Han Gao, Hao Wang, Haochen Liu, Hongliang Sun, Jiabao Li, Jingwen Chang, Jun Du, Junyi Peng, Leizhen Cui, Meimei Jing, Mingqi Wu, Shangpeng Yan, Shaotong Qi, Suzhe Xu, Wenxuan Zhao, Xianda Sun, Xuan Xie, Yanbo Wang, Yao Xia, Yinghan Cui, Yingpeng Chen, Yong Wang, Yuze Shi, Zhiwei Shen, Ziyu Wang, Ming Sun, Lin Ye, Bin Chen
General AI
We present KAT-Coder-V2, an agentic coding model developed by the KwaiKAT team at Kuaishou. KAT-Coder-V2 adopts a "Specialize-then-Unify" paradigm that decomposes agentic coding into five expert domains - SWE, WebCoding, Terminal, WebSearch, and General - each undergoing independent supervised fine-tuning and reinforce…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-03-30 · Gabriele Gemmi, Michele Polese, Tommaso Melodia
General AI
The large-scale deployment of 5G networks has not delivered the expected return on investment for mobile network operators, raising concerns about the economic viability of future 6G rollouts. At the same time, surging demand for Artificial Intelligence (AI) inference and training workloads is straining global compute …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.8
2026-03-30 · Maoguo Gao, Zejun Zhu, Zhiming Sun, Zhengwei Ma, Longze Yuan, Zhongjing Ma, Zhigang Gao, Jinhui Zhang, Suli Zou
General AI
Open-Vocabulary Object Navigation (OVON) requires an embodied agent to locate a language-specified target in unknown environments. Existing zero-shot methods often reason over dense frontier points under incomplete observations, causing unstable route selection, repeated revisits, and unnecessary action overhead. We pr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.8
2026-03-30 · Mohamed Elgouhary, Amr S. El-Wakeel
General AI
Pure Pursuit (PP) is a widely used path-tracking algorithm in autonomous vehicles due to its simplicity and real-time performance. However, its effectiveness is sensitive to the choice of lookahead distance: shorter values improve cornering but can cause instability on straights, while longer values improve smoothness …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.8
2026-03-30 · Minh-Thu Do, Quynh-Chau Le-Tran, Duc-Duy Nguyen-Mai, Thien-Trang Nguyen, Khanh-Duy Le, Minh-Triet Tran, Tam V. Nguyen, Trung-Nghia Le
General AI
The Four Books have shaped East Asian intellectual traditions, yet their multi-layered interpretive complexity limits their accessibility in the digital age. While traditional bilingual commentaries provide a vital pedagogical bridge, computational frameworks are needed to preserve and explore this wisdom. This paper b…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.8
2026-03-31 · Theodora Panagea, Nikolaos Koursioumpas, Lina Magoula, Ramin Khalili
General AI
Progressing toward a new generation of mobile networks, a clear focus on integrating distributed intelligence across the system is observed to drive performance, autonomy, and real-time adaptability. Federated learning (FL) stands out as a key emerging technique, enabling on-device model training while preserving data …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.8
2026-03-31 · Yudong Gao, Zongjie Li, Yuanyuanyuan, Zimo Ji, Pingchuan Ma, Shuai Wang
General AI
LLM-based coding agents rely on \emph{skills}, pre-packaged instruction sets that extend agent capabilities, yet every token of skill content injected into the context window incurs both monetary cost and attention dilution. To understand the severity of this problem, we conduct a large-scale empirical study of 55,315 …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.8
2026-04-01 · Kazuki Yano, Jun Suzuki, Shinji Watanabe
General AI
Adapting pre-trained text Large Language Models (LLMs) into Speech Language Models (Speech LMs) via continual pretraining on speech data is promising, but often degrades the original text capabilities. We propose Multimodal Depth Upscaling, an extension of an emerging strategy in continual LLM pre-training, where new t…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-04-02 · Abinitha Gourabathina, Inkit Padhi, Manish Nagireddy, Subhajit Chaudhury, Prasanna Sattigeri
General AI
For Large Language Models (LLMs) to be reliably deployed, models must effectively know when not to answer: abstain. Reasoning models, in particular, have gained attention for impressive performance on complex tasks. However, reasoning models have been shown to have worse abstention abilities. Taking the vulnerabilities…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-04-02 · Karan Taneja, Anjali Singh, Ashok K. Goel
General AI
Multimodal Large Language Models (MLLMs) offer an opportunity to support multimedia learning through conversational systems grounded in educational content. However, while conversational AI is known to boost engagement, its impact on learning in visually-rich STEM domains remains under-explored. Moreover, there is limi…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-04-02 · Saman Motamed, William Harvey, Benjamin Klein, Luc Van Gool, Zhuoning Yuan, Ta-Ying Cheng
General AI
Existing video object removal methods excel at inpainting content "behind" the object and correcting appearance-level artifacts such as shadows and reflections. However, when the removed object has more significant interactions, such as collisions with other objects, current models fail to correct them and produce impl…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-04-03 · Zhenyu Gao, Wenxi Jiang, Yutong Yan
General AI
Prior research shows that large language models (LLMs) exhibit systematic extrapolation bias when forming predictions from both experimental and real-world data, and that prompt-based approaches appear limited in alleviating this bias. We propose a supervised fine-tuning (SFT) approach that uses Low-Rank Adaptation (Lo…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-04-03 · Haotian Xiang, Bingcong Li, Qin Lu
General AI
When deploying large language models (LLMs) to safety-critical applications, uncertainty quantification (UQ) is of utmost importance to self-assess the reliability of the LLM-based decisions. However, such decisions typically suffer from overconfidence, particularly after parameter-efficient fine-tuning (PEFT) for down…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-04-06 · Rafael O. Jarczewski, Gabriel U. Talasso, Leandro Villas, Allan M. de Souza
General AI
Although Federated Learning (FL) promises privacy and distributed collaboration, its effectiveness in real-world scenarios is often hampered by the stochastic heterogeneity of clients and unpredictable system dynamics. Existing static optimization approaches fail to adapt to these fluctuations, resulting in resource un…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.8
2026-04-06 · Mohammad Zangooei, Jannis Weil, Amr Rizk, Mina Tahmasbi Arashloo, Raouf Boutaba
General AI
Deep reinforcement learning (DRL) has shown remarkable performance on complex control problems in systems and networking, including adaptive video streaming, wireless resource management, and congestion control. For safe deployment, however, it is critical to reason about how agents behave across the range of system st…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.8
2026-04-07 · Pranjal Aggarwal, Graham Neubig, Sean Welleck
General AI
Computer-use agents hold the promise of assisting in a wide range of digital economic activities. However, current research has largely focused on short-horizon tasks over a limited set of software with limited economic value, such as basic e-commerce and OS-configuration tasks. A key reason is that creating environmen…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.8
2026-04-07 · Hao Chen, Fang Qiu, Fangchao Dong, Defei Yang, Eve Bohnett, Li An
General AI
This study proposes a lightweight multimodal adaptation framework to bridge the representation gap between RGB-pretrained VLMs and thermal infrared imagery, and demonstrates its practical utility using a real drone-collected dataset. A thermal dataset was developed from drone-collected imagery and was used to fine-tune…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.8
2026-04-07 · Sangwook Lee, Sang Won Lee, Adnan Abbas, Young-Ho Kim, Yan Chen
General AI
Modern task-oriented chatbots present GUI elements alongside natural-language dialogue, yet the agent's role has largely been limited to interpreting natural-language input as GUI actions and following a linear workflow. In preference-driven, multi-step tasks such as booking a flight or reserving a restaurant, earlier …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.8
2026-04-09 · Zhiyuan Wang, Erzhen Hu, Mark Rucker, Laura E. Barnes
General AI
Personal AI tools can now be generated from natural-language requests, but they often remain isolated after creation. We present PSI, a shared-state architecture that turns independently generated modules into coherent instruments: persistent, connected, and chat-complementary artifacts accessible through both GUIs and…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.8
2026-04-28 · Jan Dubiński, Jan Betley, Anna Sztyber-Betley, Daniel Tan, Owain Evans
General AI
Finetuning a language model can lead to emergent misalignment (EM) [Betley et al., 2025b]. Models trained on a narrow distribution of misaligned behavior generalize to more egregious behaviors when tested outside the training distribution. We study a set of interventions proposed to reduce EM. We confirm that these int…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-04-28 · Lucio La Cava, Andrea Tagarelli
General AI
Machine-generated text (MGT) detection requires identifying structurally invariant signals across generation models, rather than relying on model-specific fingerprints. In this respect, we hypothesize that while large language models excel at local semantic consistency, their autoregressive nature results in a specific…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-04-28 · An Nguyen, Hoang Nguyen, Phuong Le, Hung Pham, Cuong Do, Laurent El Ghaoui
General AI
We study city-scale control of electric-vehicle (EV) ride-hailing fleets where dispatch, repositioning, and charging decisions must respect charger and feeder limits under uncertain, spatially correlated demand and travel times. We formulate the problem as a hex-grid semi-Markov decision process (semi-MDP) with mixed a…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-05-07 · Weiqing Xiao, Hong Li, Xiuyu Yang, Houyuan Chen, Wenyi Li, Tianqi Liu, Shaocong Xu, Chongjie Ye, Hao Zhao, Beibei Wang
General AI
Recent advances have shown that large-scale video diffusion models can be repurposed as neural renderers by first decomposing videos into intrinsic scene representations and then performing forward rendering under novel illumination. While promising, this paradigm fundamentally relies on accurate intrinsic decompositio…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-05-07 · Suoxin Zhang, Run He, Di Fang, Xiang Tan, Kaixuan Chen, Huiping Zhuang
General AI
Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method that places trainable low-rank adapters into frozen pre-trained models. Recent studies show that using fewer LoRA adapters may still maintain or even improve performance, but existing methods still distribute adapters broadly, leaving wh…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-05-07 · Abdelrahman Zaian, Sheethal Bhat, Mohamed Abdalkader, Andreas Maier
General AI
Diabetic Retinopathy (DR) is a leading cause of preventable blindness among working-age adults worldwide, yet most automated screening systems are limited to image-level classification and lack clinically structured reporting. We propose Retina-RAG, a low-cost modular framework that jointly performs DR severity grading…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.6
2026-05-11 · Mohammed M. H. Qazzaz, Syed Ali Zaidi, Aubida A. Al-Hameed, Abdelaziz Salama, Des Mclernon
General AI
This paper introduces a proactive Unmanned Aerial Vehicle (UAV) mobility management xApp for Open Radio Access Network (O-RAN) Near Real-Time Radio Intelligent Controller (Near-RT RIC) environments, employing Double Deep Q-Network (DDQN) reinforcement learning (RL) enhanced with transfer learning to optimise handover d…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.5
2026-02-27 · Abisheka Pitumpe, Amir Rahmati
Research Track B · General AI
Job-based smishing scams, where victims are recruited under the guise of remote job opportunities, represent a rapidly growing and understudied threat within the broader landscape of online fraud. In this paper, we present Anansi, the first scalable, end-to-end measurement pipeline designed to systematically engage wit…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-04-09 · Luozheng Qin, Jia Gong, Qian Qiao, Tianjiao Li, Li Xu, Haoyu Pan, Chao Qu, Zhiyu Tan, Hao Li
General AI
Unified multimodal models integrating visual understanding and generation face a fundamental challenge: visual generation incurs substantially higher computational costs than understanding, particularly for video. This imbalance motivates us to invert the conventional paradigm: rather than extending understanding-centr…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-04-13 · Sreyan Ghosh, Arushi Goel, Kaousheik Jayakumar, Lasha Koroshinadze, Nishit Anand, Zhifeng Kong, Siddharth Gururani, Sang-gil Lee, Jaehyeon Kim, Aya Aljafari, Chao-Han Huck Yang, Sungwon Kim, Ramani Duraiswami, Dinesh Manocha, Mohammad Shoeybi, Bryan Catanzaro, Ming-Yu Liu, Wei Ping
General AI
We present Audio Flamingo Next (AF-Next), the next-generation and most capable large audio-language model in the Audio Flamingo series, designed to advance understanding and reasoning over speech, environmental sounds and music. Compared to Audio Flamingo 3, AF-Next introduces: (i) a stronger foundational audio-languag…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-04-15 · Wangjie Gan, Miao Pan, Linbo Xi, Wenqi Zhang, Jintao Chen, Jianwei Yin, Xuhong Zhang
General AI
Large language models are typically post-trained using supervised fine-tuning (SFT) and reinforcement learning (RL), yet effectively unifying efficient knowledge injection with robust generalization remains challenging. In this work, we provide a training-dynamics analysis showing that SFT can be interpreted as a speci…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-04-16 · Yixu Huang, Tinghui Zhu, Muhao Chen
General AI
Visual reasoning models (VRMs) have recently shown strong cross-modal reasoning capabilities by integrating visual perception with language reasoning. However, they often suffer from overthinking, producing unnecessarily long reasoning chains for any tasks. We attribute this issue to Reasoning Path Redundancy in visual…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-04-16 · Haoyi Sun, Xiaoxiao Wang, Ning Mao, Qian Wang, Lifu Mu, Wen Zheng, Tao Wei, Wei Chen
General AI
Vision-Language Models (VLMs) have shown remarkable capabilities in joint vision-language understanding, but their large scale poses significant challenges for deployment in resource-constrained scenarios. Knowledge Distillation (KD) offers a viable way to improve model capabilities without increasing model size or dat…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-04-17 · Qwen Team
General AI
In this work, we present Qwen3.5-Omni, the latest advancement in the Qwen-Omni model family. Representing a significant evolution over its predecessor, Qwen3.5-Omni scales to hundreds of billions of parameters and supports a 256k context length. By leveraging a massive dataset comprising heterogeneous text-vision pairs…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-04-19 · Yuezhou Hu, Jintao Zhang
General AI
Autoregressive video diffusion is emerging as a promising paradigm for streaming video synthesis, with step distillation serving as the primary means of accelerating inference. Whether speculative decoding, the dominant acceleration strategy for large language models, can be effectively adapted to autoregressive video …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-04-20 · Rongyuan Tan, Jue Zhang, Zhuozhao Li, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang
General AI
Interpretability tools are increasingly used to analyze failures of Large Language Models (LLMs), yet prior work largely focuses on short prompts or toy settings, leaving their behavior on commonly used benchmarks underexplored. To address this gap, we study contrastive, LRP-based attribution as a practical tool for an…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-04-21 · Chaitanya Dwivedi, Binxuan Huang, Himanshu Gupta, Pratik Jayarao, Neeraj Varshney, Bing Yin
General AI
Mixture-of-Experts (MoE) has become the dominant architecture for scaling large language models: frontier models routinely decouple total parameters from per-token computation through sparse expert routing. Scaling laws show that under fixed active computation, model quality scales predictably with total parameters, an…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-04-21 · Ying Zeng, Miaosen Luo, Guangyuan Li, Yang Yang, Ruiyang Fan, Linxiao Shi, Qirui Yang, Jian Zhang, Chengcheng Liu, Siming Zheng, Jinwei Chen, Bo Li, Peng-Tao Jiang
General AI
Traditional photographic image editing typically requires users to possess sufficient aesthetic understanding to provide appropriate instructions for adjusting image quality and camera parameters. However, this paradigm relies on explicit human instruction of aesthetic intent, which is often ambiguous, incomplete, or i…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-04-24 · Harshit Joshi, Priyank Shethia, Jadelynn Dao, Monica S. Lam
General AI
Real-world document question answering is challenging. Analysts must synthesize evidence across multiple documents and different parts of each document. However, any fixed LLM context window can be exceeded as document collections grow. A common workaround is to decompose documents into chunks and assemble answers from…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.5
2026-04-24 · Hillary Mutisya, John Mugane
Research Track A · General AI
We investigate whether neural models trained exclusively on modern morphological data can recover cross-lingual lexical structure consistent with historical reconstruction. Using BantuMorph v7, a transformer over Bantu morphological paradigms, we analyze 14 Eastern and Southern Bantu languages, extract encoder embeddin…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-05-12 · Bo Yin, Qi Li, Xinchao Wang
General AI
Tool-using LLM agents fail through trajectories rather than only final responses, as they may execute unsafe tool calls, follow injected instructions, comply with harmful requests, or over-refuse benign tasks despite producing a seemingly safe answer. Existing safety-alignment signals are largely response-level or off-…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.4
2026-04-29 · Jun Guo, Qiwei Li, Peiyan Li, Zilong Chen, Nan Sun, Yifei Su, Heyun Wang, Yuan Zhang, Xinghang Li, Huaping Liu
General AI
We propose X-WAM, a Unified 4D World Model that unifies real-time robotic action execution and high-fidelity 4D world synthesis (video + 3D reconstruction) in a single framework, addressing the critical limitations of prior unified world models (e.g., UWM) that only model 2D pixel-space and fail to balance action effic…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.4
2026-04-30 · Abdelrahman Sadallah, Kareem Elozeiri, Mervat Abassy, Rania Elbadry, Mohamed Anwar, Abed Alhakim Freihat, Preslav Nakov, Fajri Koto
General AI
Poetry has long been a central art form for Arabic speakers, serving as a powerful medium of expression and cultural identity. While modern Arabic speakers continue to value poetry, existing research on Arabic poetry within Large Language Models (LLMs) has primarily focused on analysis tasks such as interpretation or m…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.4
2026-05-01 · Daniel Song, Peter Ney, Cristina Menghini, Faizan Ahmad, Aidan Boyd, Nathaniel Li, Ziwen Han, Jean-Christophe Testud, Saisuke Okabayashi, Maeve Ryan, Jinpeng Miao, Hamza Kwisaba, Felix Binder, Spencer Whitman, Jim Gust, Esteban Arcaute, Dhaval Kapil, Jacob Kahn, Ayaz Minhas, Tristan Goodman, Lauren Deason, Alexander Vaughan, Shengjia Zhao, Summer Yue
General AI
This report documents the preparedness assessment of Code World Model (CWM), a model for code generation and reasoning about code from Meta. We conducted pre-release testing across domains identified in our Frontier AI Framework as potentially presenting catastrophic risks, and also evaluated the model's misaligned pro…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.4
2026-05-03 · Shuzheng Si, Haozhe Zhao, Yu Lei, Qingyi Wang, Dingwei Chen, Zhitong Wang, Zhenhailong Wang, Kangyang Luo, Zheng Wang, Gang Chen, Fanchao Qi, Minjia Zhang, Maosong Sun
General AI
Many real-world tasks require language models (LMs) to reason over complex contexts that exceed their parametric knowledge. This calls for context learning, where LMs directly learn relevant knowledge from the given context. An intuitive solution is inference-time skill augmentation: extracting the rules and procedures…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-10 · Gyuwon Park, DongIl Shin, SolGil Oh, SangGi Ryu, Byung-Hak Kim
General AI
The rapid evolution of Large Language Models (LLMs) has significantly impacted the field of natural language processing, but their growing complexity raises concerns about resource usage and transparency. Addressing these challenges, we participated in the NeurIPS LLM Efficiency Challenge, aiming to fine-tune a foundat…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-13 · WonJin Yoon, Kangyu Zhu, Ian Bulovic, Autumn Sehy, Yanjun Gao, Dmitriy Dligach, Majid Afshar, Timothy A. Miller
Research Track A · General AI
With the recent progress of Large Language Models (LLMs), there is a growing interest in applying these models to solve complex and challenging problems. Modern LLMs, capable of processing long contexts and generating verbalized explanations, offer significant potential in addressing real-world applications. However, a…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-13 · Lyuxing He, Eric Cai, Shobhit Aggarwal, Jianjun Wang, David Held
General AI
Recent advances in robotic manipulation have highlighted the effectiveness of learning from demonstration. However, while end-to-end policies excel in expressivity and flexibility, they struggle both in generalizing to novel object geometries and in attaining a high degree of precision. An alternative, object-centric a…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-14 · Kathakoli Sengupta, Kai Ao, Paola Cascante-Bonilla
General AI
Large Language Models (LLMs) and Vision-Language Models (VLMs) increasingly generate indoor scenes through intermediate structures such as layouts and scene graphs, yet evaluation still relies on LLM or VLM judges that score rendered views, making judgments sensitive to viewpoint, prompt phrasing, and hallucination. Wh…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-15 · Yarui Cao, Kai Liu
General AI
Fine-tuning large language models (LLMs) aims to adapt pre-trained models to specific tasks using relatively small and domain-specific datasets. Among Parameter-Efficient Fine-Tuning (PEFT) methods, Low-Rank Adaptation (LoRA) stands out by matching the performance of full fine-tuning while avoiding additional inference…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-16 · Giacomo Franchini, David Rodríguez-Martínez, Alfonso Martínez-Petersen, C. J. Pérez-del-Pulgar, Marcello Chiaberge
General AI
Autonomous robots operating in natural karstic caves face perception and navigation challenges that are qualitatively distinct from those encountered in mines or tunnels: irregular geometry, reflective wet surfaces, near-zero ambient light, and complex branching passages. Yet publicly available datasets targeting this …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-16 · Yao Tong, Jiayuan Ye, Anastasia Borovykh, Reza Shokri
General AI
Whether language models can systematically generalize remains actively debated. Yet empirical performance is jointly shaped by multiple factors such as training data, training paradigms, and inference-time strategies, making failures difficult to interpret. We introduce a controlled synthetic environment based on short…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-16 · Mengdi Wu, Xiaoyu Jiang, Oded Padon, Zhihao Jia
General AI
This paper presents Prism, the first symbolic superoptimizer for tensor programs. The key idea is sGraph, a symbolic, hierarchical representation that compactly encodes large classes of tensor programs by symbolically representing some execution parameters. Prism organizes optimization as a two-level search: it constru…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-16 · Yiyang Jiang, Li Zhang, Xiao-Yong Wei, Li Qing
General AI
Many SLT systems quietly assume that brief chunks of signing map directly to spoken-language words. That assumption breaks down because signers often create meaning on the fly using context, space, and movement. We revisit SLT and argue that it is mainly a cross-modal reasoning task, not just a straightforward video-to…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-17 · Habibeh Naderi, Behrouz Haji Soleimani, Stan Matwin
General AI
We propose HILBERT (HIerarchical Long-sequence Balanced Embedding with Reciprocal contrastive Training), a cross-attentive multimodal framework for learning document-level audio-text representations from long, segmented sequences in low-resource data settings. HILBERT leverages frozen pre-trained speech and language en…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-17 · Xiangbo Gao, Sicong Jiang, Bangya Liu, Xinghao Chen, Minglai Yang, Siyuan Yang, Mingyang Wu, Jiongze Yu, Qi Zheng, Haozhi Wang, Jiayi Zhang, Jared Yang, Jie Yang, Zihan Wang, Qing Yin, Zhengzhong Tu
General AI
As AI-assisted video creation becomes increasingly practical, instruction-guided video editing has become essential for refining generated or captured footage to meet professional requirements. Yet the field still lacks both a large-scale human-annotated dataset with complete editing examples and a standardized evaluat…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-20 · Kevin Murphy
General AI
We present BLF (Bayesian Linguistic Forecaster), an agentic system for binary forecasting that achieves state-of-the-art performance on the ForecastBench benchmark. The system is built on three ideas. (1) A Bayesian linguistic belief state: a semi-structured representation combining numerical probability estimates with…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-20 · Sijie Mai, Shiqin Han
General AI
Multimodal affective computing aims to predict humans' sentiment, emotion, intention, and opinion using language, acoustic, and visual modalities. However, current models often learn spurious correlations that harm generalization under distribution shifts or noisy modalities. To address this, we propose a causal modali…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-21 · Yusuf Çelebi, Yağız Asker, Özay Ezerceli, Mahmoud ElHussieni, Selva Taş, Reyhan Bayraktar, Fatma Betül Terzioğlu
General AI
Fine-tuning Large Language Models (LLMs) remains structurally uncertain despite parameter-efficient methods such as Low-Rank Adaptation (LoRA), as the layer-specific roles of internal representations are poorly understood, leading to heuristic decisions about where adaptation should be applied. We model the evolution o…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-21 · Zewei Zhou, Ruining Yang, Xuewei, Qi, Yiluan Guo, Sherry X. Chen, Tao Feng, Kateryna Pistunova, Yishan Shen, Lili Su, Jiaqi Ma
General AI
Vision-Language-Action (VLA) models offer a promising autonomous driving paradigm for leveraging world knowledge and reasoning capabilities, especially in long-tail scenarios. However, existing VLA models often struggle with the high latency in action generation using an autoregressive generation framework and exhibit …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-22 · Mariano Barone, Francesco Di Serio, Roberto Moio, Marco Postiglione, Giuseppe Riccio, Antonio Romano, Vincenzo Moscato
General AI
Large Language Models (LLMs) are increasingly deployed in healthcare, yet their communicative alignment with clinical standards remains insufficiently quantified. We conduct a multidimensional evaluation of general-purpose and domain-specialized LLMs across structured medical explanations and real-world physician-patie…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-22 · Hyeonwoo Kim, Jeonghwan Kim, Kyungwon Cho, Hanbyul Joo
General AI
Recent advances in video generative models enable the synthesis of realistic human-object interaction videos across a wide range of scenarios and object categories, including complex dexterous manipulations that are difficult to capture with motion capture systems. While the rich interaction knowledge embedded in these…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-22 · Joachim Baumann, Vishakh Padmakumar, Xiang Li, John Yang, Diyi Yang, Sanmi Koyejo
General AI
AI coding agents are being adopted at scale, yet we lack empirical evidence on how people actually use them and how much of their output is useful in practice. We present SWE-chat, the first large-scale dataset of real coding agent sessions collected from open-source developers in the wild. The dataset currently contai…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-22 · Yiming Bian, Joshua M. Akey
General AI
The scalability of long-context large language models is fundamentally limited by the quadratic memory cost of exact self-attention, which often leads to out-of-memory (OOM) failures on modern hardware. Existing methods improve memory efficiency to near-linear complexity, while assuming that the full query, key, and va…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-23 · Yang Hu, Vladyslav Turlo
General AI
The FAIR principles have transformed how computational data and workflows are shared in materials research, yet existing repositories can only serve pre-computed entries -- broad coverage is perpetually incomplete and cannot adapt to new questions on demand. To address these challenges, we present OptiMat Alloys, a lar…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-24 · Yuqi Pan, Jinghao Zhuang, Yupeng Feng, Fangzhi Zhong, Siyu Ding, Xuerui Qiu, Shaowei Gu, Bohan Sun, Zhiyong Qin, Yibo Zhong, Lingtao Ouyang, Kun Yang, Zehao Liu, Yuhong Chou, Shurong Wang, Anjie Hu, Han Xu, Bo Xu, Guoqi Li
General AI
Scaling context length is reshaping large-model development, yet full-attention Transformers suffer from prohibitive computation and inference bottlenecks at long sequences. A key challenge is to design foundation models that maintain performance and long-context efficiency with minimal training overhead. We introduce …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-24 · Keshav Ramji, Tahira Naseem, Ramón Fernandez Astudillo
General AI
While long, explicit chains-of-thought (CoT) have proven effective on complex reasoning tasks, they are costly to generate during inference. Non-verbal reasoning methods have emerged with shorter generation lengths by leveraging continuous representations, yet their performance lags behind verbalized CoT. We propose $\…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-27 · Aaryan Shah, Andrew Hines, Alexia Downs, Denis Bajet, Paulius Mui, Fabiano Araujo, Laura Offutt, Aida Rutledge, Elizabeth Jimenez
General AI
Objective. Clinical AI documentation systems require evaluation methodologies that are clinically valid, economically viable, and sensitive to iterative changes. Methods requiring expert review per scoring instance are too slow and expensive for safe, iterative deployment. We present a case-specific, clinician-authored…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-27 · Brandon Hsu, Daniel Beaglehole, Adityanarayanan Radhakrishnan, Mikhail Belkin
General AI
Linear activation steering is a powerful approach for eliciting the capabilities of large language models and specializing their behavior using limited labeled data. While effective, existing methods often apply a fixed steering strength to all tokens, resulting in inconsistent steering quality across diverse input pro…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-27 · Lixian Chen, Mingxuan Huang, Yanhui Chen, Junyi Lin, Yang Shi
General AI
Vision-language models transfer well in zero-shot settings, but at deployment the visual and textual branches often shift asymmetrically. Under this condition, entropy-based test-time adaptation can sharpen the fused posterior while increasing error, because an unreliable modality may still dominate fusion. We study th…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-27 · Zhiheng Liu, Weiming Ren, Xiaoke Huang, Shoufa Chen, Tianhong Li, Mengzhao Chen, Yatai Ji, Sen He, Jonas Schult, Belinda Zeng, Tao Xiang, Wenhu Chen, Ping Luo, Luke Zettlemoyer, Yuren Cong
General AI
Unified multimodal models typically rely on pretrained vision encoders and use separate visual representations for understanding and generation, creating misalignment between the two tasks and preventing fully end-to-end optimization from raw pixels. We introduce Tuna-2, a native unified multimodal model that performs …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-05-12 · Aleksandr Bredikhin, Philippe Lalanda, German Vega
General AI
Human Activity Recognition (HAR) is a core task in pervasive computing systems, where models must operate under strict computational constraints while remaining robust to heterogeneous and evolving deployment conditions. Recent advances based on Transformer architectures have significantly improved recognition performa…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-05-12 · Kexuan Shi, Hanxuan Li, Zeju Qiu, Yandong Wen, Simon Buchholz, Weiyang Liu
General AI
We introduce Pion, a spectrum-preserving optimizer for large language model (LLM) training based on orthogonal equivalence transformation. Unlike additive optimizers such as Adam and Muon, Pion updates each weight matrix through left and right orthogonal transformations, preserving its singular values throughout traini…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-05-12 · Yo Ehara
General AI
Automatic generation of educational materials using large language models (LLMs) is becoming increasingly common, but assigning difficulty levels to such materials still requires substantial human effort. LLM-as-a-Judge has therefore attracted attention, yet disagreement with human raters remains a major challenge. We …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-05-12 · Ariel Gera, Shir Ashury-Tahan, Gal Bloch, Ohad Eytan, Assaf Toledo
General AI
We explore the effectiveness of an LLM-guided query refinement paradigm for extending the usability of embedding models to challenging zero-shot search and classification tasks. Our approach refines the embedding representation of a user query using feedback from a generative LLM on a small set of documents, enabling e…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-05-12 · Yichen Zhang, Jun Li
General AI
The efficient operation of modern cellular networks hinges on the accurate analysis of spatio-temporal traffic data. Mastering these patterns is essential for core network functions, chiefly forecasting future load to pre-empt congestion and imputing missing values caused by sensor failures or transmission errors to en…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.2
2026-04-30 · Anietta Weckauff, Yuchen Zhang, Maksym Andriushchenko
General AI
Fine-tuning large language models (LLMs) on narrowly misaligned data generalizes to broadly misaligned behavior, a phenomenon termed emergent misalignment (EM). While prior work has found a correlation between harmful behavior and self-assessment in emergently misaligned models, it remains unclear how consistent this c…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.2
2026-04-30 · Yujun Wu, Dongxu Zhang, Xinchen Li, Jinhang Xu, Yiling Duan, Yumou Liu, Jiabao Pan, Xuanhe Zhou, Jingxuan Wei, Siyuan Li, Jintao Chen, Conghui He, Cheng Tan
General AI
Existing research infrastructure is fundamentally document-centric, providing citation links between papers but lacking explicit representations of methodological evolution. In particular, it does not capture the structured relationships that explain how and why research methods emerge, adapt, and build upon one anothe…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.2
2026-04-30 · Jeanne Monnier, Thomas George, Frédéric Guyard, Christèle Tarnec, Marios Kountouris
General AI
Fairness in machine learning remains challenging due to its ethical complexity, the absence of a universal definition, and the need for context-specific bias metrics. Existing methods still struggle with intersectionality, multiclass settings, and limited flexibility and generality. To address these gaps, we introduce …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.2
2026-04-30 · Zeynep Okray, Nils Otto, Anna A. Cook, Clifford Talbot, Ashwin Miriyala, Martín Klappenbach, Ciara Stern, Kieran Desmond, Paola Vargas-Gutierrez, Scott Waddell
General AI
Associating multiple sensory cues with a single experience or object is a fundamental process that improves object recognition and memory performance. However, neural mechanisms that bind sensory features during learning and augment memory expression are unknown. Here we demonstrate multisensory appetitive and aversive…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.2
2026-04-30 · Samuel Kiegeland, Vésteinn Snæbjarnarson, Tim Vieira, Ryan Cotterell
General AI
Surprisal theory links human processing effort to the predictability of an upcoming linguistic unit, but empirical work often leaves the notion of a unit underspecified. In practice, experimental stimuli are segmented into linguistically motivated units (e.g., words), while pretrained language models assign probability…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.2
2026-05-01 · Lin Che, Xi Wang, Marc Pollefeys, Konrad Schindler, Martin Raubal, Peter Kiefer
General AI
Urban perception describes how people subjectively evaluate urban environments, shaping how cities are experienced and understood. Existing computational approaches primarily model urban perception directly from street view images, but largely ignore the human perceptual process through which such judgments are formed.…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.2
2026-05-01 · Alfredo Madrid-García, Miguel Rujas
General AI
Background: Patient-facing medical chatbots based on retrieval-augmented generation (RAG) are increasingly promoted to deliver accessible, grounded health information. AI-assisted development lowers the barrier to building them, but they still demand rigorous security, privacy, and governance controls. Objective: To re…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.2
2026-05-02 · Shuaipeng Zhou, Yu Zhang
General AI
Libraries of Low-Rank Adaptation (LoRA) adapters are becoming a practical by-product of parameter-efficient adaptation. Once such adapters accumulate, a natural question is no longer how to train one adapter for one task, but how to reuse an open pool of adapters for a new task given only a small support set. Prior wor…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.2
2026-05-04 · Mario Rodríguez Béjar, B. Romera-Paredes, Jose L. Hernández-Ramos
General AI
Modern fuzzers increasingly use Large Language Models (LLMs) to generate structured inputs, but LLM-driven fuzzing is sensitive to prompt initialization and sampling variance, which can reduce exploration efficiency and lead to redundant inputs. We present FunFuzz, a multi-island evolutionary fuzzing framework that run…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.2
2026-05-04 · Shikhar Shukla
General AI
Speculative decoding accelerates large language model (LLM) inference by using a small draft model to propose candidate tokens that a larger target model verifies. A critical hyperparameter in this process is the speculation length~$γ$, which determines how many tokens the draft model proposes per step. Nearly all exis…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.0
2026-03-27 · Shihua Zhang, Qiuhong Shen, Shizun Wang, Tianbo Pan, Xinchao Wang
General AI
Empowered by large-scale training, vision-language models (VLMs) achieve strong image and video understanding, yet their ability to perform spatial reasoning in both static scenes and dynamic videos remains limited. Recent advances try to handle this limitation by injecting geometry tokens from pretrained 3D foundation…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.0
2026-03-30 · Bharath Krishnamurthy, Ajita Rattani
General AI
Recent multimodal face generation models address the spatial control limitations of text-to-image diffusion models by augmenting text-based conditioning with spatial priors such as segmentation masks, sketches, or edge maps. This multimodal fusion enables controllable synthesis aligned with both high-level semantic int…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.0
2026-04-02 · Esakkivel Esakkiraja, Sai Rajeswar, Denis Akhiyarov, Rajagopal Venkatesaramani
General AI
We consider the question: when a large language reasoning model makes a choice, did it think first and then decide to, or decide first and then think? In this paper, we present evidence that detectable, early-encoded decisions shape chain-of-thought in reasoning models. Specifically, we show that a simple linear probe …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.0
2026-04-05 · Xudong Lu, Yang Bo, Jinpeng Chen, Shuhan Li, Xintong Guo, Huankang Guan, Fang Liu, Dunyuan Xu, Peiwen Sun, Heyang Sun, Rui Liu, Hongsheng Li
General AI
Video Large Language Models (VideoLLMs) have achieved strong performance on many video understanding tasks, but most existing systems remain offline and are not well-suited for live video streams that require continuous observation and timely response. Recent streaming VideoLLMs have made progress, yet current approach…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.0
2026-04-06 · Zhengqing Yuan, Hanchi Sun, Lichao Sun, Yanfang Ye
General AI
We present MegaTrain, a memory-centric system that efficiently trains 100B+ parameter large language models at full precision on a single GPU. Unlike traditional GPU-centric systems, MegaTrain stores parameters and optimizer states in host memory (CPU memory) and treats GPUs as transient compute engines. For each layer…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.0
2026-04-08 · Qihan Ren, Peng Wang, Ruikun Cai, Shuai Shao, Dadi Guo, Yuejin Xie, Yafu Li, Quanshi Zhang, Xia Hu, Jing Shao, Dongrui Liu
General AI
A prevailing narrative in LLM post-training holds that supervised finetuning (SFT) memorizes while reinforcement learning (RL) generalizes. We revisit this claim for reasoning SFT with long chain-of-thought (CoT) supervision and find that cross-domain generalization is not absent but conditional, jointly shaped by opti…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.0
2026-04-22 · Haebin Seong, Li Yin, Haoran Zhang
General AI
AI agents are increasingly deployed on complex, domain-specific workflows -- navigating enterprise web applications that require dozens of clicks and form fills, orchestrating multi-step research pipelines that span search, extraction, and synthesis, automating code review across unfamiliar repositories, and handling c…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.0
2026-04-25 · Bingda Tang, Yuhui Zhang, Xiaohan Wang, Jiayuan Mao, Ludwig Schmidt, Serena Yeung-Levy
General AI
Aligning denoising generative models with human preferences or verifiable rewards remains a key challenge. While policy-gradient online reinforcement learning (RL) offers a principled post-training framework, its direct application is hindered by the intractable likelihoods of these models. Prior work therefore either …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.0
2026-04-26 · Zhen Ye, Xu Tan, Aoxiong Yin, Hongzhan Lin, Guangyan Zhang, Peiwen Sun, Yiming Li, Chi-Min Chan, Wei Ye, Shikun Zhang, Wei Xue
General AI
Joint audio-video generation models have shown that unified generation yields stronger cross-modal coherence than cascaded approaches. However, existing models couple modalities throughout denoising via pervasive attention, treating high-level semantics and low-level details in a fully entangled manner. This is subopti…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.0
2026-04-27 · Bo Ni, Leyao Wang, Yu Wang, Branislav Kveton, Franck Dernoncourt, Yu Xia, Hongjie Chen, Reuben Leura, Samyadeep Basu, Subhojyoti Mukherjee, Puneet Mathur, Nesreen Ahmed, Junda Wu, Li Li, Huixin Zhang, Ruiyi Zhang, Tong Yu, Sungchul Kim, Jiuxiang Gu, Zhengzhong Tu, Alexa Siu, Zichao Wang, David Seunghyun Yoon, Nedim Lipka, Namyong Park, Zihao Lin, Trung Bui, Yue Zhao, Tyler Derr, Ryan A. Rossi
General AI
User simulation has long played a vital role in computer science due to its potential to support a wide range of applications. Language, as the primary medium of human communication, forms the foundation of social interaction and behavior. Consequently, simulating conversational behavior has become a key area of study.…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.0
2026-04-28 · Zeyue Xue, Siming Fu, Jie Huang, Shuai Lu, Haoran Li, Yijun Liu, Yuming Li, Xiaoxuan He, Mengzhao Chen, Haoyang Huang, Nan Duan, Ping Luo
General AI
While large-scale video diffusion models have demonstrated impressive capabilities in generating high-resolution and semantically rich content, a significant gap remains between their pretraining performance and real-world deployment requirements due to critical issues such as prompt sensitivity, temporal inconsistency…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.0
2026-04-28 · Jiayi Guo, Linqing Wang, Jiangshan Wang, Yang Yue, Zeyu Liu, Zhiyuan Zhao, Qinglin Lu, Gao Huang, Chunyu Wang
General AI
Unified multimodal models (UMMs) integrate visual understanding and generation within a single framework. For text-to-image (T2I) tasks, this unified capability allows UMMs to refine outputs after their initial generation, potentially extending the performance upper bound. Current UMM-based refinement methods primarily…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.0
2026-05-06 · Han Wang, Jintao Zhang, Kai Jiang, Haoxu Wang, Jianfei Chen, Jun Zhu
General AI
LLM-based Triton kernel generation has attracted significant interest, yet a fundamental empirical question remains unanswered: where does this capability break down, and why? We present KernelBench-X, a benchmark designed to answer this question through category-aware evaluation of correctness and hardware efficiency …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.0
2026-05-07 · Ilya Borovik
General AI
Symbolic music datasets with matched scores and performances are essential for many music information retrieval (MIR) tasks. Yet, existing resources often cover a narrow range of composers, lack performance variety, omit note-level alignments, or use inconsistent naming formats. This work presents PianoCoRe, a large-sc…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.0
2026-05-07 · Ziyun Zeng, Yiqi Lin, Guoqiang Liang, Mike Zheng Shou
General AI
In recent years, open-source efforts like Senorita-2M have propelled video editing toward natural language instruction. However, current publicly available datasets predominantly focus on local editing or style transfer, which largely preserve the original scene structure and are easier to scale. In contrast, Backgroun…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-03-26 · Xiaofeng Mao, Shaohao Rui, Kaining Ying, Bo Zheng, Chuanhao Li, Mingmin Chi, Kaipeng Zhang
General AI
Autoregressive video diffusion models have demonstrated remarkable progress, yet they remain bottlenecked by intractable linear KV-cache growth, temporal repetition, and compounding errors during long-video generation. To address these challenges, we present PackForcing, a unified framework that efficiently manages the…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-03-31 · Max Kaufmann, David Lindner, Roland S. Zimmermann, and Rohin Shah
General AI
Chain-of-Thought (CoT) monitoring, in which automated systems monitor the CoT of an LLM, is a promising approach for effectively overseeing AI systems. However, the extent to which a model's CoT helps us oversee the model - the monitorability of the CoT - can be affected by training, for instance by the model learning …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-03-31 · Nathan Heath
General AI
Myopic Optimization with Non-myopic Approval (MONA) mitigates multi-step reward hacking by restricting the agent's planning horizon while supplying far-sighted approval as a training signal~\cite{farquhar2025mona}. The original paper identifies a critical open question: how the method of constructing approval -- partic…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-03-31 · Gianluca Aguzzi, Davide Domini, Nicolas Farabegoli, Mirko Viroli
General AI
Aggregate programming is a field-based coordination paradigm with over a decade of exploration and successful applications across domains including sensor networks, robotics, and IoT, with implementations in various programming languages, such as Protelis, ScaFi (Scala), and FCPP (C++). A recent research direction inte…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-03-31 · Iain Swift, JingHua Ye, Ruairi O'Reilly
General AI
Multimodal deep learning for cancer prognosis is commonly assumed to benefit from synergistic cross-modal interactions, yet this assumption has not been directly tested in survival prediction settings. This work adapts InterSHAP, a Shapley interaction index-based metric, from classification to Cox proportional hazards …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-03-31 · Qiyuan Zhuang, He-Yang Xu, Yijun Wang, Xin-Yang Zhao, Yang-Yang Li, Xiu-Shen Wei
General AI
Understanding object affordances is essential for enabling robots to perform purposeful and fine-grained interactions in diverse and unstructured environments. However, existing approaches either rely on retrieval, which is fragile due to sparsity and coverage gaps, or on large-scale models, which frequently mislocaliz…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-03-31 · Ming-Hua Tsai, Phat Tran
General AI
This study investigates the use of NeuralUCB for cost-aware large language model (LLM) routing. Existing routing approaches can be broadly grouped into supervised routing methods and partial-feedback methods, each with different tradeoffs in efficiency and adaptivity. We implement a NeuralUCB-based routing policy and e…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-03-31 · Iain Swift, JingHua Ye
General AI
Multimodal deep learning has improved prognostic accuracy for brain tumours by integrating histopathology and genomic data, yet the contribution of volumetric MRI within unified survival frameworks remains unexplored. This pilot study extends a bimodal framework by incorporating Fluid Attenuated Inversion Recovery (FLA…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-04-02 · Merve Karakas, Osama Hanna, Lin F. Yang, Christina Fragouli
General AI
In this paper, we consider a multi-armed bandit (MAB) instance and study how to identify the best arm when arm commands are conveyed from a central learner to a distributed agent over a discrete memoryless channel (DMC). Depending on the agent capabilities, we provide communication schemes along with their analysis, wh…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-04-02 · Daiwei Chen, Zhoutong Fu, Chengming Jiang, Haichao Zhang, Ran Zhou, Tan Wang, Chunnan Yao, Guoyao Li, Rui Cai, Yihan Cao, Ruijie Jiang, Fedor Borisyuk, Jianqiang Shen, Jingwei Wu, Ramya Korlakai Vinayak
General AI
Language models (LMs) are increasingly extended with new learnable vocabulary tokens for domain-specific tasks, such as Semantic-ID tokens in generative recommendation. The standard practice initializes these new tokens as the mean of existing vocabulary embeddings, then relies on supervised fine-tuning to learn their …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-04-02 · Alex Costanzino, Pierluigi Zama Ramirez, Giuseppe Lisanti, Luigi Di Stefano
General AI
We present ModMap, a natively multiview and multimodal framework for 3D anomaly detection and segmentation. Unlike existing methods that process views independently, our method draws inspiration from the crossmodal feature mapping paradigm to learn to map features across both modalities and views, while explicitly mode…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-04-06 · Daron Acemoglu, Tianyi Lin, Asuman Ozdaglar, James Siderius
General AI
Artificial intelligence (AI) changes social learning when aggregated outputs become training data for future predictions. To study this, we extend the DeGroot model by introducing an AI aggregator that trains on population beliefs and feeds synthesized signals back to agents. We define the learning gap as the deviation…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-04-06 · Justin Curry, Alberto Speranzon
General AI
In this paper, we develop a stratification-based semantics for Signal Temporal Logic (STL) in which each atomic predicate is interpreted as a membership test in a stratified space. This perspective reveals a novel correspondence principle between stratification theory and STL, showing that most STL formulas can be view…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-04-07 · Jingwei Zuo, Xinze Feng, Zien Liu, Kaijian Wang, Fanjiang Ye, Ye Cao, Zhuang Wang, Yuke Wang
General AI
Low-Rank Adaptation (LoRA) is now the dominant method for parameter-efficient fine-tuning of large language models, but achieving a high-quality adapter often requires systematic hyperparameter tuning because LoRA performance is highly sensitive to configuration choices. In practice, this leads to many concurrent LoRA …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-04-07 · Juyeong Hwang, Seong-Eun Hong, Jinhyun Kim, JaeYoung Seon, Giljoo Nam, Hanyoung Jang, HyeongYeop Kang
General AI
Crowds do not merely move; they decide. Human navigation is inherently contextual: people interpret the meaning of space, social norms, and potential consequences before acting. Sidewalks invite walking, crosswalks invite crossing, and deviations are weighed against urgency and safety. Yet most crowd simulation methods…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-04-07 · Tianyi Liu, Yiming Li, Wenqian Wang, Jiaojiao Wang, Chen Cai, Yi Wang, Kim-Hui Yap
General AI
Robust multimodal visual analytics remains challenging when heterogeneous modalities provide complementary but input-dependent evidence for decision-making.Existing multimodal learning methods mainly rely on fixed fusion modules or predefined cross-modal interactions, which are often insufficient to adapt to changing m…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-04-09 · Kooktae Lee
General AI
This paper addresses the decentralized non-uniform area coverage problem for multi-agent systems, a critical task in missions with high spatial priority and resource constraints. While existing density-based methods often rely on computationally heavy Eulerian PDE solvers or heuristic planning, we propose Stochastic De…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-04-09 · Joungbin An, Agrim Jain, Kristen Grauman
General AI
Video temporal grounding (VTG) is typically tackled with dataset-specific models that transfer poorly across domains and query styles. Recent efforts to overcome this limitation have adapted large multimodal language models (MLLMs) to VTG, but their high compute cost and limited video context still hinder long-video gr…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-04-09 · Mohamed Amine Kerkouri, Marouane Tliba, Bin Wang, Aladine Chetouani, Ulas Bagci, Alessandro Bruno
General AI
Scanpath similarity metrics are central to eye-movement research, yet existing methods predominantly evaluate spatial and temporal alignment while neglecting semantic equivalence between attended image regions. We present a semantic scanpath similarity framework that integrates vision-language models (VLMs) into eye-tr…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-04-28 · Sherzod Turaev, Mary John, Jaloliddin Rustamov, Zahiriddin Rustamov, Saja Aldabet, Nazar Zaki, Khaled Shuaib
General AI
Understanding learners' cognitive and affective states underpins adaptive educational systems and effective teaching. Although research links nonverbal cues to internal states, no framework calibrates them to evidence. We present the Nonverbal Syntax Framework, drawn from a systematic review of 908 studies and 17,043 c…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-05-07 · Sushant Gautam, Finn Schwall, Annika Willoch Olstad, Fernando Vallecillos Ruiz, Birk Torpmann-Hagen, Sunniva Maria Stordal Bjørklund, Leon Moonen, Klas Pettersen, Michael A. Riegler
General AI
Many deployments must compare candidate language models for safety before a labeled benchmark exists for the relevant language, sector, or regulatory regime. We formalize this setting as benchmarkless comparative safety scoring and specify the contract under which a scenario-based audit can be interpreted as deployment…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-05-07 · Jai Moondra, Ayela Chughtai, Bhargavi Lanka, Swati Gupta
General AI
Ranking LLMs via pairwise human feedback underpins current leaderboards for open-ended tasks, such as creative writing and problem-solving. We analyze ~89K comparisons in 116 languages from 52 LLMs from Arena, and show that the best-fit global Bradley-Terry (BT) ranking is misleading. Nearly 2/3 of the decisive votes c…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-04-13 · Yifan Yu, Yuqing Jian, Junxiong Wang, Zhongzhu Zhou, Donglin Zhuang, Xinyu Fang, Sri Yanamandra, Xiaoxia Wu, Qingyang Wu, Shuaiwen Leon Song, Tri Dao, Ben Athiwaratkun, James Zou, Fan Lai, Chenfeng Xu
General AI
Diffusion language models promise parallel generation, yet still lag behind autoregressive (AR) models in quality. We stem this gap to a failure of introspective consistency: AR models agree with their own generations, while DLMs often do not. We define the introspective acceptance rate, which measures whether a model …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-04-13 · Efstathios Karypidis, Spyros Gidaris, Nikos Komodakis
General AI
Accurate future video prediction requires both high visual fidelity and consistent scene semantics, particularly in complex dynamic environments such as autonomous driving. We present Re2Pix, a hierarchical video prediction framework that decomposes forecasting into two stages: semantic representation prediction and re…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-04-15 · Akira Kawabata, Saku Sugawara
General AI
Rubric-augmented verification guides reward models with explicit evaluation criteria, yielding more reliable judgments than single-model verification. However, most existing methods require costly rubric annotations, limiting scalability. Moreover, we find that rubric generation is vulnerable to a failure of cooperatio…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-04-17 · Jiaxi Bi, Tongxu Luo, Wenyu Du, Zhengyang Tang, Benyou Wang
General AI
Parallel reasoning enhances Large Reasoning Models (LRMs) but incurs prohibitive costs due to futile paths caused by early errors. To mitigate this, path pruning at the prefix level is essential, yet existing research remains fragmented without a standardized framework. In this work, we propose the first systematic tax…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.4
2026-04-29 · Ming Li, Jie Wu, Justin Cui, Xiaojie Li, Rui Wang, Chen Chen
General AI
While preference optimization is crucial for improving visual generative models, how to effectively scale this paradigm remains largely unexplored. Current open-source preference datasets contain conflicting preference patterns, where winners excel in some dimensions but underperform in others. Naively optimizing on su…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.4
2026-04-30 · Ansar Aynetdinov, Patrick Haller, Alan Akbik
General AI
Recent research has shown that filtering massive English web corpora into high-quality subsets significantly improves training efficiency. However, for high-resource non-English languages like German, French, or Japanese, aggressive filtering creates a strategic dilemma: should practitioners prioritize diversity by tra…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.4
2026-05-04 · Sinan Wang, Jinjin He, Shenyifan Lu, Ruicheng Wang, Greg Turk, Bo Zhu
General AI
We present Orbit-Space Geometric Probability Paths (OGPP), a particle-native flow-matching framework for generative modeling of particle systems. OGPP is motivated by two insights: (i) particles are defined up to permutation symmetries, so anonymous indexing inflates per-index target variance and yields curved, hard-to…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-09 · Hananel Hazan, Yanbo Zhang, Benedikt Hartl, Michael Levin
General AI
How many of a neural network's parameters actually encode task-specific information? We investigate this question with LottaLoRA, a training paradigm in which every backbone weight is drawn at random and frozen; only low-rank LoRA adapters are trained. Across nine benchmarks spanning diverse architecture families from …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-10 · Harshith Kethavath, Weiming Hu
General AI
Adapting vision-language models to remote sensing imagery presents a fundamental challenge: both the visual and linguistic distributions of satellite data lie far outside natural image pretraining corpora. Despite this, prompting remains the dominant deployment paradigm, driven by the assumption that domain-specific la…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-12 · Rahul Ahuja, Mudit Jain, Bala Murali Manoghar Sai Sudhakar, Venkatraman Narayanan, Pratik Likhar, Varun Ravi Kumar, Senthil Yogamani
General AI
Vision foundation models (VFMs) and Bird's Eye View (BEV) representation have advanced visual perception substantially, yet their internal spatial representations assume the rectilinear geometry of pinhole cameras. Fisheye cameras, widely deployed on production autonomous vehicles for their surround-view coverage, exhi…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-13 · Wanli Ma, Sivasakthy Selvakumaran, Dain G. Farrimond, Adam A. Dennis, Samuel E. Rigby
General AI
Accurate and rapid structural damage assessment (SDA) is crucial for post-disaster management, helping responders prioritise resources, plan rescues, and support recovery. Traditional field inspections, though precise, are limited by accessibility, safety risks, and time constraints, especially after large explosions. …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-13 · Ricardo Coimbra Brioso, Lorenzo Mondo, Damiano Dei, Nicola Lambri, Pietro Mancosu, Marta Scorsetti, Daniele Loiacono
General AI
Accurate delineation of the Clinical Target Volume (CTV) is essential for radiotherapy planning, yet remains time-consuming and difficult to assess, especially for complex treatments such as Total Marrow and Lymph Node Irradiation (TMLI). While deep learning-based auto-segmentation can reduce workload, safe clinical de…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-13 · Yuto Harada, Hiro Taiyo Hamada
General AI
Using psychological constructs such as the Big Five, large language models (LLMs) can imitate specific personality profiles and predict a user's personality. While LLMs can exhibit behaviors consistent with these constructs, it remains unclear where and how they are represented inside the model and how they relate to b…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-14 · Jian Han, Jinlai Liu, Jiahuan Wang, Bingyue Peng, Zehuan Yuan
General AI
While diffusion models dominate the field of visual generation, they are computationally inefficient, applying a uniform computational effort regardless of different complexity. In contrast, autoregressive (AR) models are inherently complexity-aware, as evidenced by their variable likelihoods, but are often hindered by…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-14 · Cristian Minoccheri, Emily Wittrup, Kayvan Najarian, Ryan Stidham
General AI
Computed tomography (CT) enterography is a primary imaging modality for assessing inflammatory bowel disease (IBD), yet the representational choices that best support automated analysis of this modality are unknown. We present the first study of vision-language transfer learning on abdominal CT enterography and identif…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-14 · Yaxuan Li, Yuxin Zuo, Bingxiang He, Jinqian Zhang, Chaojun Xiao, Cheng Qian, Tianyu Yu, Huan-ang Gao, Wenkai Yang, Zhiyuan Liu, Ning Ding
General AI
On-policy distillation (OPD) has become a core technique in the post-training of large language models, yet its training dynamics remain poorly understood. This paper provides a systematic investigation of OPD dynamics and mechanisms. We first identify that two conditions govern whether OPD succeeds or fails: (i) the s…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-16 · Jack Wei Lun Shi, Minghao Dang, Wawan Solihin, Justin K. W. Yeoh
General AI
Existing research on large language models (LLMs) for automated code compliance has primarily focused on performance, treating the models as black boxes and overlooking how training decisions affect their interpretive behavior. This paper addresses this gap by employing a perturbation-based attribution analysis to comp…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-17 · Yeganeh Abdollahinejad, Ahmad Mousavi, Naeemul Hassan, Kai Shu, Nathalie Japkowicz, Shahriar Khosravi, Amir Karami
General AI
The widespread dissemination of multimodal content on social media has made misinformation detection increasingly challenging, as misleading narratives often arise not only from textual or visual content alone, but also from semantic inconsistencies between modalities and their evolution over time. Existing multimodal …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-17 · Hitesh Mehta, Arjit Saxena, Garima Chhikara, Rohit Kumar
Research Track A · General AI
This paper explores the response of Large Language Models (LLMs) to user prompts with different degrees of politeness and impoliteness. The Politeness Theory by Brown and Levinson and the Impoliteness Framework by Culpeper form the basis of experiments conducted across three languages (English, Hindi, Spanish), five mo…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-17 · Pritesh Jha
General AI
We present PIIBench, a unified benchmark corpus for Personally Identifiable Information (PII) detection in natural language text. Existing resources for PII detection are fragmented across domain-specific corpora with mutually incompatible annotation schemes, preventing systematic comparison of detection systems. We co…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-17 · Haoran Feng, Yifan Niu, Zehuan Huang, Yang-Tian Sun, Chunchao Guo, Yuxin Peng, Lu Sheng
General AI
We introduce LaviGen, a framework that repurposes 3D generative models for 3D layout generation. Unlike previous methods that infer object layouts from textual descriptions, LaviGen operates directly in the native 3D space, formulating layout generation as an autoregressive process that explicitly models geometric rela…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-17 · Maks Pečnik Bambič, Nuno A. M. Araújo, Giorgio Volpe
General AI
Collective rotations are common in active matter, enhancing cohesion, transport, and mixing. They are typically attributed to chiral non-reciprocal dynamics due to intrinsic particle chirality, torque-generating interactions among units, or geometric confinement. Here, we uncover a different mechanism for rotational or…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-19 · Nwe Ni Win, Jim Basilakis, Steven Thomas, Seyhan Yazar, Laura Pierce, Stephanie Liu, Paul M. Middleton, Nasser Ghadiri, X. Rosalind Wang
General AI
Extracting clinically relevant information from unstructured medical narratives such as admission notes, discharge summaries, and emergency case histories remains a challenge in clinical natural language processing (NLP). Medical Entity Recognition (MER) identifies meaningful concepts embedded in these records. Recent …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-20 · Hao Meng, Siyuan Zheng, Shuran Zhou, Qiangqiang Wang, Yang Song
General AI
Large Language Models (LLMs) show promise in lyric-to-melody generation, but models trained with Supervised Fine-Tuning (SFT) often produce musically implausible melodies with issues like poor rhythm and unsuitable vocal ranges, a phenomenon we term "constraint violation". To address this, we propose a novel alignment …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-20 · Yunke Ao, Le Chen, Bruce D. Lee, Assefa S. Wahd, Aline Czarnobai, Philipp Fürnstahl, Bernhard Schölkopf, Andreas Krause
General AI
Proximal Policy Optimization (PPO) has become the predominant algorithm for on-policy reinforcement learning due to its scalability and empirical robustness across domains. However, there is a significant disconnect between the underlying foundations of trust region methods and the heuristic clipped objective used in P…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-20 · Terence Lim, Kumar Muthuraman, Michael Sury
General AI
We introduce a multi-agent framework intended to emulate parts of a quantitative research team and support equity factor research on large financial panel datasets. QRAFTI integrates a research toolkit for panel data with MCP servers that expose data access, factor construction, and custom coding operations as callable…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-21 · Alex Lin, Lei Gao, Narsimlu Kemsaram, Sriram Subramanian
General AI
AcoustoBots are mobile acoustophoretic robots capable of delivering mid-air haptics, directional audio, and acoustic levitation, but existing implementations rely on scripted commands and lack an intuitive interface for real-time human control. This work presents a gesture-based visual learning framework for contactles…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-21 · Robert Stanley, Avi Verma, Lillian Tsai, Konstantinos Kallas, Sam Kumar
General AI
AI agents promise to serve as general-purpose personal assistants for their users, which requires them to have access to private user data (e.g., personal and financial information). This poses a serious risk to security and privacy. Adversaries may attack the AI model (e.g., via prompt injection) to exfiltrate user da…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-22 · Casey Crane
General AI
We study the emergence of symmetric oscillatory behavior in multi-agent systems where each agent incorporates a continuous memory of its past states and past rates of change, modeled by distributed retarded and neutral delays. The closed-loop dynamics are described by a system of nonlinear neutral functional differenti…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-22 · Thorsten Hoeser, Felix Bachofer, Claudia Kuenzer
General AI
The offshore wind energy sector is expanding rapidly, increasing the need for independent, high-temporal-resolution monitoring of infrastructure deployment and operation at global scale. While Earth Observation based offshore wind infrastructure mapping has matured for spatial localization, existing open datasets lack …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-22 · Travis LaCroix
General AI
The value alignment problem for artificial intelligence (AI) is often framed as a purely technical or normative challenge, sometimes focused on hypothetical future systems. I argue that the problem is better understood as a structural question about governance: not whether an AI system is aligned in the abstract, but w…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-22 · Ruohan Liu, Shukang Yin, Tao Wang, Dong Zhang, Weiji Zhuang, Shuhuai Ren, Ran He, Caifeng Shan, Chaoyou Fu
General AI
Paralinguistic cues are essential for natural human-computer interaction, yet their evaluation in Large Audio-Language Models (LALMs) remains limited by coarse feature coverage and the inherent subjectivity of assessment. To address these challenges, we introduce SpeechParaling-Bench, a comprehensive benchmark for para…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-23 · Thibault Bañeras-Roux, Shashi Kumar, Driss Khalil, Sergio Burdisso, Petr Motlicek, Shiran Liu, Mickael Rouvier, Jane Wottawa, Richard Dufour
General AI
Automatic Speech Recognition (ASR) is traditionally evaluated using Word Error Rate (WER), a metric that is insensitive to meaning. Embedding-based semantic metrics are better correlated with human perception, but decoder-based Large Language Models (LLMs) remain underexplored for this task. This paper evaluates their …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-23 · D. Pauli, T. N. Parsons, R. K. Prinja
General AI
Massive stars with their strong ionizing radiation and strong stellar winds are the key feedback agents of the universe. Stellar winds of massive stars are often measured by fitting resonance lines in the UV using non-LTE stellar atmosphere models. So far, the line formation regions of these lines have not been measure…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-23 · Bartosz Balis, Michal Orzechowski, Piotr Kica, Michal Dygas, Michal Kuszewski
General AI
Scientific workflow systems automate execution -- scheduling, fault tolerance, resource management -- but not the semantic translation that precedes it. Scientists still manually convert research questions into workflow specifications, a task requiring both domain knowledge and infrastructure expertise. We propose an a…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-23 · Yuto Nishida, Naoki Shikoda, Yosuke Kishinami, Ryo Fujii, Makoto Morishita, Hidetaka Kamigaito, Taro Watanabe
General AI
Understanding what kinds of factual knowledge large language models (LLMs) memorize is essential for evaluating their reliability and limitations. Entity-based QA is a common framework for analyzing non-verbatim memorization, but typical evaluations query each entity using a single canonical surface form, making it dif…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-23 · Jiahui Liang, Shuoyao Wang, Shijian Gao
General AI
Efficient beam alignment is fundamental to high-throughput and reliable connectivity in Vehicle-to-Everything (V2X) systems. However, conventional beam management in dynamic vehicular topologies incurs prohibitive alignment overhead and struggles to maintain robust links under rapid mobility. To overcome these challeng…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-23 · Haolin Zhang, William Reber, Yuxuan Zhang, Guofei Gu, Jeff Huang
General AI
Modern phishing campaigns increasingly evade snapshot-based URL classifiers using interaction gates (e.g., checkbox/slider challenges), delayed content rendering, and logo-less credential harvesters. This shifts URL triage from static classification toward an interactive forensics task: an analyst must actively navigat…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-24 · Sheza Munir, Ratna Kandala, Anamta Khan, Deepti, Joyojeet Pal
General AI
Health misinformation remains one of the most pressing challenges on social media, particularly when cultural traditions intersect with scientific-sounding claims. These dynamics are not only global but also deeply local, manifesting in culturally specific controversies that require careful analysis. Motivated by this,…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-24 · Md Erfan, Md Kamal Hossain Chowdhury, Ahmed Ryan, Md Rayhanur Rahman
General AI
Large Language Models (LLMs) show promise in automated software engineering, yet their guarantee of correctness is frequently undermined by erroneous or hallucinated code. To enforce model honesty, formal verification requires LLMs to synthesize implementation logic alongside formal specifications that are subsequently…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-24 · Xiang Zhang, Xiaotian Li, Taoyue Wang, Nan Bi, Xin Zhou, Cody Zhou, Zoie Wang, Andrew Yang, Yuming Su, Jeff Cohn, Qiang Ji, Lijun Yin
General AI
Social interactions dominate our perceptions of the world and shape our daily behavior by attaching social meaning to acts as simple and spontaneous as gestures, facial expressions, voice, and speech. People mimic and otherwise respond to each other's postures, facial expressions, mannerisms, and other verbal and nonve…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-24 · Jay Yu, Shunfan Zhou, Hang Yin, Brian Seong
General AI
Blockchain wallets conventionally follow an ownership model where possession of a private key grants unilateral control. However, this assumption is brittle for emerging settings such as AI agent wallets, organizational custody, and enterprise payroll, where multiple actors must coordinate without exposing secrets or l…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-24 · Ilana Nguyen, Harini Suresh, Thema Monroe-White, Evan Shieh
General AI
Large language models (LLMs) are increasingly used for text generation tasks from everyday use to high-stakes enterprise and government applications, including simulated interviews with asylum seekers. While many works highlight the new potential applications of LLMs, there are risks of LLMs encoding and perpetuating h…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-27 · German Marin, Jatin Chaudhary
General AI
Autonomous AI agents can remain fully authorized and still become unsafe as behavior drifts, adversaries adapt, and decision patterns shift without any code change. We propose the \textbf{Informational Viability Principle}: governing an agent reduces to estimating a bound on unobserved risk $\hat{B}(x) = U(x) + SB(x) +…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-27 · Hermawan Manurung, Ibrahim Al-Kahfi, Ahmad Rizqi, Martin Clinton Tosima Manullang
General AI
Indonesian marketplace reviews mix standard vocabulary with slang, regional loanwords, numeric shorthands, and emoji, making lexicon-based sentiment tools unreliable in practice. This paper describes a two-track classification pipeline applied to the PRDECT-ID dataset, which contains 5,400 product reviews from 29 Indon…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-05-12 · Yuanda Xu, Hejian Sang, Zhengze Zhou, Ran He, Zhipeng Wang, Alborz Geramifard
General AI
In settings where labeled verifiable training data is the binding constraint, each checked example should be allocated carefully. The standard practice is to use this data directly on the model that will be deployed, for example by running GRPO on the deployment student. We argue that this is often an inefficient alloc…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 7.3
2026-05-12 · Yihao Meng, Zichen Liu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Yue Yu, Hanlin Wang, Haobo Li, Jiapeng Zhu, Yanhong Zeng, Xing Zhu, Yujun Shen, Qifeng Chen, Huamin Qu
General AI
Autoregressive video generation aims at real-time, open-ended synthesis. Yet, cinematic storytelling is not merely the endless extension of a single scene; it requires progressing through evolving events, viewpoint shifts, and discrete shot boundaries. Existing autoregressive models often struggle in this setting. Trai…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 7.3
2026-05-12 · Christen Millerdurai, Shaoxiang Wang, Yaxu Xie, Vladislav Golyanik, Didier Stricker, Alain Pagani
General AI
Reconstructing the absolute 3D pose and shape of the hands from the user's viewpoint using a single head-mounted camera is crucial for practical egocentric interaction in AR/VR, telepresence, and hand-centric manipulation tasks, where sensing must remain compact and unobtrusive. While monocular RGB methods have made pr…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 7.3
2026-05-12 · Vedang Lad, Katrin Franke, Tamar Rott Shaham, Surya Ganguli, Andreas S. Tolias, Sophia Sanborn, Nikos Karantzas
General AI
Understanding what individual neurons encode is a core question in neuroscience. In primary visual cortex (V1), mathematical models (e.g., Gabor functions) capture neural selectivity, but no comparable framework exists for higher areas. We show that natural language can fill this role: across macaque V1 and V4, the sel…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 7.2
2026-04-29 · Ezel Üsten, Anna Sieben, Mohcine Chraibi, Armin Seyfried
General AI
In pedestrian dynamics, the internal drive that propels individuals toward their goals is typically captured by a single, fixed parameter, the desired walking speed. This simplification overlooks that motivation fluctuates in response to changing spatial and social conditions within a crowd. This paper proposes a dynam…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.2
2026-04-29 · Yeheng Chen, Chaoxiang Xie, Yuling Shi, Wenhao Zeng, Yongpan Wang, Hongyu Zhang, Xiaodong Gu
General AI
LLMs have achieved strong results on both function-level code synthesis and repository-level code modification, yet a capability that falls between these two extremes -- compositional code creation, i.e., building a complete, internally structured class from a specification -- remains underserved. Current evaluations a…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.2
2026-04-29 · Md Biplob Hosen, Md Alomgeer Hussein, Md Akmol Masud, Omar Faruque, Tera L Reynolds, Lujie Karen Chen
General AI
Patient portals now give individuals direct access to their electronic health records (EHRs), yet access alone does not ensure patients understand or act on the complex clinical information contained in these records. The ArchEHR-QA 2026 shared task addresses this challenge by focusing on grounded question answering ov…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.2
2026-04-29 · Dimitris Dimakopoulos, Shay B. Cohen, Ioannis Konstas
General AI
Large language models (LLMs) acquire most of their factual knowledge during the pre-training stage, through next token prediction. Subsequent stages of post-training often introduce new facts outwith the parametric knowledge, giving rise to hallucinations. While it has been demonstrated that supervised fine-tuning (SFT…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.2
2026-04-30 · Himanshu Pandey, Ratikanta Behera
General AI
In recent years, physics-informed neural networks (PINNs) have gained significant attention for solving differential equations, although they suffer from two fundamental limitations, namely, spectral bias inherent in neural networks and loss imbalance arising from multiscale phenomena. This paper proposes an adaptive w…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.2
2026-04-30 · Vinayak Gupta, Chih-Hao Lin, Shenlong Wang, Anand Bhattad, Jia-Bin Huang
General AI
Reconstructing 3D scenes from sparse, unposed images remains challenging under real-world conditions with varying illumination and transient occlusions. Existing methods rely on scene-specific optimization using appearance embeddings or dynamic masks, which requires extensive per-scene training and fails under sparse v…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.2
2026-05-01 · Zihao Ding, Beining Wu, Jun Huang
General AI
Federated Multimodal Learning (FML) trains multimodal models across decentralized clients while keeping their image-text pairs private. However, joint embedding training entangles forgotten knowledge across both modalities and client gradient subspaces, hindering federated unlearning. Previous federated unlearning appr…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.2
2026-05-01 · Xinyuan Zhao, Yihang Wu, Ahmad Chaddad, Sarah A. Alkhodair, Reem Kateb
General AI
Gaze estimation methods commonly use facial appearances to predict the direction of a person gaze. However, previous studies show three major challenges with convolutional neural network (CNN)-based, transformer-based, and contrastive language-image pre-training (CLIP)-based methods, including late fusion of image feat…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.2
2026-05-01 · Shradha Sharma, Swapnil Dhamal, Shweta Jain
General AI
We propose a new framework for meritocratic fairness in budgeted combinatorial multi-armed bandits with full-bandit feedback (BCMAB-FBF). Unlike semi-bandit feedback, the contribution of individual arms is not received in full-bandit feedback, making the setting significantly more challenging. To compute arm contributi…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.2
2026-05-04 · Anahita Golrang, Kshitij Sharma, olga viberg
General AI
Effective pair programming depends on coordination of attention, cognitive effort, and joint regulation over time, yet most adaptive learning systems remain individual-centric and reactive. This paper introduces ProPACT, a proactive AI-driven adaptive collaborative tutor that treats collaboration itself as the object o…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.2
2026-05-05 · Alexander Vedernikov
General AI
Engagement estimation from face video remains challenging because facial evidence is often incomplete, labeled data are limited, and engagement annotations are subjective. We present PriorNet, a prior-guided framework that injects task-relevant priors at three stages of the pipeline: preprocessing, model adaptation, an…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 7.0
2026-03-23 · Yuze Qin, Qingyong Li, Zhiqing Guo, Wen Wang, Yan Liu, Yangli-ao Geng
General AI
Precipitation nowcasting is critical for disaster mitigation and aviation safety. However, radar-only models frequently suffer from a lack of large-scale atmospheric context, leading to performance degradation at longer lead times. While integrating meteorological variables predicted by weather foundation models offers…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.0
2026-03-25 · Terry Chen, Zhifan Ye, Bing Xu, Zihao Ye, Timmy Liu, Ali Hassani, Tianqi Chen, Andrew Kerr, Haicheng Wu, Yang Xu, Yu-Jung Chen, Hanfeng Chen, Aditya Kane, Ronny Krashinsky, Ming-Yu Liu, Vinod Grover, Luis Ceze, Roger Bringmann, John Tran, Wei Liu, Fung Xie, Michael Lightstone, Humphrey Shi
General AI
Agentic Variation Operators (AVO) are a new family of evolutionary variation operators that replace the fixed mutation, crossover, and hand-designed heuristics of classical evolutionary search with autonomous coding agents. Rather than confining a language model to candidate generation within a prescribed pipeline, AVO…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.0
2026-03-26 · Mohamed Eltahir, Ahmed O. Ibrahim, Obada Siralkhatim, Tabarak Abdallah, Sondos Mohamed
Research Track A · General AI
Vision-Language Models (VLMs) are powerful open-set reasoners, yet their direct use as anomaly detectors in video surveillance is fragile: without calibrated anomaly priors, they alternate between missed detections and hallucinated false alarms. We argue the problem is not the VLM itself but how it is used. VLMs should…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.0
2026-03-26 · Han Zhang, Guo-Hua Yuan, Chaohao Yuan, Tingyang Xu, Tian Bian, Hong Cheng, Wenbing Huang, Deli Zhao, Yu Rong
General AI
Modeling cellular states and predicting their responses to perturbations are central challenges in computational biology and the development of virtual cells. Existing foundation models for single-cell transcriptomics provide powerful static representations, but they do not explicitly model the distribution of cellular…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.0
2026-03-30 · Ryan Po, David Junhao Zhang, Amir Hertz, Gordon Wetzstein, Neal Wadhwa, Nataniel Ruiz
General AI
Video world models have shown immense promise for interactive simulation and entertainment, but current systems still struggle with two important aspects of interactivity: user control over the environment for reproducible, editable experiences, and shared inference where players hold influence over a common world. To …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.0
2026-04-06 · Zijun Wang, Haoqin Tu, Letian Zhang, Hardy Chen, Juncheng Wu, Xiangyan Liu, Zhenlong Yuan, Tianyu Pang, Michael Qizhe Shieh, Fengze Liu, Zeyu Zheng, Huaxiu Yao, Yuyin Zhou, Cihang Xie
General AI
OpenClaw, the most widely deployed personal AI agent in early 2026, operates with full local system access and integrates with sensitive services such as Gmail, Stripe, and the filesystem. While these broad privileges enable high levels of automation and powerful personalization, they also expose a substantial attack s…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.0
2026-04-07 · Rishab Balasubramanian, Pin-Jie Lin, Rituraj Sharma, Anjie Fang, Fardin Abdi, Viktor Rozgic, Zheng Du, Mohit Bansal, Tu Vu
General AI
We investigate whether post-trained capabilities can be transferred across models without retraining, with a focus on transfer across different model scales. We propose the Master Key Hypothesis, which states that model capabilities correspond to directions in a low-dimensional latent subspace that induce specific beha…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.0
2026-04-27 · Zhongjie Duan, Hong Zhang, Yingda Chen
General AI
Controllable diffusion methods have substantially expanded the practical utility of diffusion models, but they are typically developed as isolated, backbone-specific systems with incompatible training pipelines, parameter formats, and runtime hooks. This fragmentation makes it difficult to reuse infrastructure across t…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.0
2026-04-27 · Emaan Bilal Khan, Amy Winecoff, Miranda Bogen, Dylan Hadfield-Menell
General AI
Foundation models are routinely fine-tuned for use in particular domains, yet safety assessments are typically conducted only on base models, implicitly assuming that safety properties persist through downstream adaptation. We test this assumption by analyzing the safety behavior of 100 models, including widely deploye…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.0
2026-05-06 · Liang Yao, Fan Liu, Shengxiang Xu, Chuanyi Zhang, Rui Min, Shimin Di, Yuhui Zheng
General AI
Geospatial reasoning requires models to resolve complex spatial semantics and user intent into precise target locations for Earth observation. Recent progress has liberated the reasoning path from manual curation, allowing models to generate their own inference chains. Yet a final dependency remains: they are still sup…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 6.8
2026-03-23 · Alexandra Zelenin, Alexandra Zhuravlyova
General AI
Weight-Decomposed Low-Rank Adaptation (DoRA) extends LoRA by decoupling weight magnitude from direction, but its forward pass requires the row-wise norm of W + sBA, a computation that every major framework we surveyed implements by materializing the dense [d_out, d_in] product BA. At d_in = 8192 and rank r = 384, a sin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 6.8
2026-03-25 · Yupei Li, Shuaijie Shao, Manuel Milling, Björn Schuller
General AI
Current audio deepfake detection has achieved remarkable performance using diverse deep learning architectures such as ResNet, and has seen further improvements with the introduction of large models (LMs) like Wav2Vec. The success of large language models (LLMs) further demonstrates the benefits of scaling model parame…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-03-26 · Mingmeng Geng, Yuhang Dong, Thierry Poibeau
General AI
Through an analysis of arXiv papers, we report several shifts in word usage that are likely driven by large language models (LLMs) but have not previously received sufficient attention, such as the increased frequency of "beyond" and "via" in titles and the decreased frequency of "the" and "of" in abstracts. Due to the…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-03-26 · Jiabin Hua, Hengyuan Xu, Aojie Li, Wei Cheng, Gang Yu, Xingjun Ma, Yu-Gang Jiang
General AI
Fine-grained facial expression editing has long been limited by intrinsic semantic overlap. To address this, we construct the Flex Facial Expression (FFE) dataset with continuous affective annotations and establish FFE-Bench to evaluate structural confusion, editing accuracy, linear controllability, and the trade-off b…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-03-26 · Haoyan Yang, Mario Xerri, Solha Park, Huajian Zhang, Yiyang Feng, Sai Akhil Kogilathota, Jiawei Zhou
General AI
As large language models (LLMs) continue to advance, improving them solely through human supervision is becoming increasingly costly and limited in scalability. As models approach human-level capabilities in certain domains, human feedback may no longer provide sufficiently informative signals for further improvement. …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-03-30 · Jack Cook, Hyemin S. Lee, Kathryn Le, Junxian Guo, Giovanni Traverso, Anantha P. Chandrakasan, Song Han
General AI
NVFP4 has grown increasingly popular as a 4-bit format for quantizing large language models due to its hardware support and its ability to retain useful information with relatively few bits per parameter. However, the format is not without limitations: recent work has shown that NVFP4 suffers from its error distributio…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-03-30 · Anuj Diwan, Eunsol Choi, David Harwath
General AI
We introduce ParaSpeechCLAP, a dual-encoder contrastive model that maps speech and text style captions into a common embedding space, supporting a wide range of intrinsic (speaker-level) and situational (utterance-level) descriptors (such as pitch, texture and emotion) far beyond the narrow set handled by existing mode…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-03-30 · Oliver Aleksander Larsen, Mahyar T. Moghaddam
General AI
Modern distributed systems integrate heterogeneous services, REST APIs with different schema versions, GraphQL endpoints, and IoT devices with proprietary payloads that suffer from persistent schema mismatches. Traditional static adapters require manual coding for every schema pair and cannot handle novel combinations …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-03-30 · Aur Shalev Merin
General AI
Recurrent networks do not need Jacobian propagation to adapt online. The hidden state already carries temporal credit through the forward pass; immediate derivatives suffice if you stop corrupting them with stale trace memory and normalize gradient scales across parameter groups. An architectural rule predicts when nor…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-03-30 · Chengyin Hu, Jiaju Han, Xuemeng Sun, Qike Zhang, Yiwei Wei, Ang Li, Chunlei Meng, Xiang Chen, Jiahuan Long
General AI
Vision-language models (VLMs) rely on a shared visual-textual representation space to perform tasks such as zero-shot classification, image captioning, and visual question answering (VQA). While this shared space enables strong cross-task generalization, it may also introduce a common vulnerability: small visual pertur…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-03-31 · Breno C. Bispo, Stefania Sardellitti, Juliano B. Lima, Fernando A. N. Santos
General AI
Brain connectomics is still largely dominated by pairwise-based models, such as graphs, which cannot represent circulatory or higher-order functional interactions. In this paper, we propose a multimodal framework based on Topological Signal Processing (TSP) that models the brain as a higher-order topological domain and…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-03-31 · Sowmya Vajrala, Aakash Parmar, Prasanna R, Sravanth Kodavanti, Manjunath Arveti, Srinivas Soumitri Miriyala, Ashok Senapati
General AI
Generative Artificial Intelligence (GenAI) features such as image editing, object removal, and prompt-guided image transformation are increasingly integrated into mobile applications. However, deploying Large Vision Models (LVMs) for such tasks on resource-constrained devices remains challenging due to their high memor…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-03-31 · Timon Klein, Jonas Kusch, Sebastian Sager, Stefan Schnake, Steffen Schotthöfer
General AI
The pursuit of reducing the memory footprint of the self-attention mechanism in multi-headed self attention (MHA) spawned a rich portfolio of methods, e.g., group-query attention (GQA) and multi-head latent attention (MLA). The methods leverage specialized low-rank factorizations across embedding dimensions or attentio…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-04-02 · Abhilash Kar, Basisth Saha, Tanmay Sen, Biswabrata Pradhan
General AI
Multimodal time-to-event prediction often requires integrating sensitive data distributed across multiple parties, making centralized model training impractical due to privacy constraints. At the same time, most existing multimodal survival models produce single deterministic predictions without indicating how confiden…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-04-02 · Sten Rüdiger, Sebastian Raschka
General AI
Minor Component Adaptation (MiCA) is a novel parameter-efficient fine-tuning method for large language models that focuses on adapting underutilized subspaces of model representations. Unlike conventional methods such as Low-Rank Adaptation (LoRA), which target dominant subspaces, MiCA leverages Singular Value Decompos…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-04-02 · Hao Zhu, Di Zhou, Donna Slonim
General AI
Understanding causal dependencies in observational data is critical for informing decision-making. These relationships are often modeled as Bayesian Networks (BNs) and Directed Acyclic Graphs (DAGs). Existing methods, such as NOTEARS and DAG-GNN, often face issues with scalability and stability in high-dimensional data…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-04-06 · Dawar Khan, Alexandre Kouyoumdjian, Xinyu Liu, Omar Mena, Dominik Engel, Ivan Viola
General AI
We present ClickAIXR, a novel on-device framework for multimodal vision-language interaction with objects in extended reality (XR). Unlike prior systems that rely on cloud-based AI (e.g., ChatGPT) or gaze-based selection (e.g., GazePointAR), ClickAIXR integrates an on-device vision-language model (VLM) with a controlle…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-04-06 · Ke Shi, Yao Zhang, Feng Guo, Jinyuan Zhang, JunShuo Zhang, Shen Gao, Shuo Shang
General AI
Generative recommendation has emerged as a transformative paradigm for capturing the dynamic evolution of user intents in sequential recommendation. While flow-based methods improve the efficiency of diffusion models, they remain hindered by the ``Noise-to-Data'' paradigm, which introduces two critical inefficiencies: …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-04-07 · Qimin Zhong, Hao Liao, Haiming Qin, Mingyang Zhou, Rui Mao, Wei Chen, Naipeng Chao
General AI
Whether Large Language Models (LLMs) develop coherent internal world models remains a core debate. While conventional Next-Token Prediction (NTP) focuses on one-step-ahead supervision, Multi-Token Prediction (MTP) has shown promise in learning more structured representations. In this work, we provide a theoretical pers…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-04-07 · Andrew Kurtz, Klaudia Krawiecka
General AI
The governance of artificial intelligence has a blind spot: the machine identities that AI systems use to act. AI agents, service accounts, API tokens, and automated workflows now outnumber human identities in enterprise environments by ratios exceeding 80 to 1, yet no integrated framework exists to govern them. A sing…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-04-09 · Longxiang Jiao, Lukas Hofmann, Yiru Yang, Zhanyi Wu, Jonas Egeler
General AI
While micro-scale traffic simulations provide essential data for urban planning, they are rarely coupled with the high-fidelity visualization or auralization necessary for effective stakeholder communication. In this work, we present a real-time 4D visualization framework that couples the SUMO traffic with a photoreali…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-04-09 · Simon Gerstenecker, Andreas Geiger, Katrin Renz
General AI
Generalization under distribution shift remains a central bottleneck for closed-loop autonomous driving. Although simulators like CARLA enable safe and scalable testing, existing benchmarks rarely measure true generalization: they typically reuse training scenarios at test time. Success can therefore reflect memorizati…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-04-09 · Tao Xie, Peishan Yang, Yudong Jin, Yingfeng Cai, Wei Yin, Weiqiang Ren, Qian Zhang, Wei Hua, Sida Peng, Xiaoyang Guo, Xiaowei Zhou
General AI
This paper addresses the task of large-scale 3D scene reconstruction from long video sequences. Recent feed-forward reconstruction models have shown promising results by directly regressing 3D geometry from RGB images without explicit 3D priors or geometric constraints. However, these methods often struggle to maintain…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-04-09 · Stephen Cheng, Sarah Wiegreffe, Dinesh Manocha
General AI
Applying steering vectors to large language models (LLMs) is an efficient and effective model alignment technique, but we lack an interpretable explanation for how it works-- specifically, what internal mechanisms steering vectors affect and how this results in different model outputs. To investigate the causal mechani…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-04-27 · Fiza Naseer, Javed Ali Khan, Muhammad Yaqoob, Alexios Mylonas, Ishaya Gambo
General AI
Context: Software vulnerabilities pose significant security threats to software systems, especially as software is increasingly used across many areas of daily life, including health, government, and finance. Recently, transformer-based models have demonstrated promising results in automatic software vulnerability iden…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-05-07 · Bilal Khana, Waseem Shariff, Rory Coyne, Muhammad Ali Farooq, Peter Corcoran
General AI
As vehicles transition toward higher levels of automation, Driver Monitoring Systems (DMS) have become essential for ensuring human oversight, safety, and regulatory compliance in a vehicle. These systems rely on multimodal sensing and AI-driven inference to assess driver attention, cognitive state, and readiness to ta…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 6.6
2026-05-11 · Yu Zhe, Yang Jiayan, Wei Junhao, Yu-Lin Tsai, Wang Chen
General AI
Low-Rank Adaptation (LoRA) has become a widely used mechanism for customizing diffusion models, enabling users to inject new visual concepts or styles through lightweight parameter updates. However, LoRAs can memorize training images, causing generated outputs to reproduce copyrighted or sensitive content. This risk is…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 6.6
2026-05-11 · Bum Jun Kim, Gnankan Landry Regis N'guessan
General AI
Physics-informed neural networks (PINNs) train a single neural approximation by minimizing multiple physics- and data-derived losses, but the gradients of these losses often interfere and can stall optimization. Existing remedies typically treat this pathology either through scalar loss balancing or full-parameter-spac…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 6.5
2026-04-02 · William Hoy, Binxu Wang, Xu Pan
Research Track A · General AI
Evolution Strategies (ES) have emerged as a scalable gradient-free alternative to reinforcement learning based LLM fine-tuning, but it remains unclear whether comparable task performance implies comparable solutions in parameter space. We compare ES and Group Relative Policy Optimization (GRPO) across four tasks in bot…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.5
2026-04-02 · Zhanting Zhou, KaHou Tam, Ziqiang Zheng, Zeyu Ma
Research Track A · General AI
Multimodal recommendation systems (MRS) jointly model user-item interaction graphs and rich item content, but this tight coupling makes user data difficult to remove once learned. Approximate machine unlearning offers an efficient alternative to full retraining, yet existing methods for MRS mainly rely on a largely uni…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-03 · Anastasiia Filippova, David Grangier, Marco Cuturi, João Monteiro
General AI
Serving transformer language models with high throughput requires caching Key-Values (KVs) to avoid redundant computation during autoregressive generation. The memory footprint of KV caching is significant and heavily impacts serving costs. This work proposes to lessen these memory requirements. While recent work has l…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 6.5
2026-04-07 · Mutsumi Sasaki, Kouta Nakayama, Yusuke Miyao, Yohei Oseki, Masaru Isonuma
Research Track A · General AI
When introducing Large Language Models (LLMs) into industrial applications, such as healthcare and education, the risk of generating harmful content becomes a significant challenge. While existing machine unlearning methods can erase specific harmful knowledge and expressions, diverse harmful content makes comprehensiv…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-13 · Md Tanvirul Alam
General AI
Large vision-language models (VLMs) often rely on familiar semantic priors, but existing evaluations do not cleanly separate perception failures from rule-mapping failures. We study this behavior as semantic fixation: preserving a default interpretation even when the prompt specifies an alternative, equally valid mappi…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-13 · Shanchuan Lin, Ceyuan Yang, Zhijie Lin, Hao Chen, Haoqi Fan
General AI
We propose continuous adversarial flow models, a type of continuous-time flow model trained with an adversarial objective. Unlike flow matching, which uses a fixed mean-squared-error criterion, our approach introduces a learned discriminator to guide training. This change in objective induces a different generalized di…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-13 · Dujun Nie, Fengjiao Chen, Qi Lv, Jun Kuang, Xiaoyu Li, Xuezhi Cao, Xunliang Cai
General AI
While the shortage of explicit action data limits Vision-Language-Action (VLA) models, human action videos offer a scalable yet unlabeled data source. A critical challenge in utilizing large-scale human video datasets lies in transforming visual signals into ontology-independent representations, known as latent actions…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-13 · Seongyu Kim, Seungwoo Lee, Hyeonggon Ryu, Joon Son Chung, Arda Senocak
General AI
We address the problem of tactile localization, where the goal is to identify image regions that share the same material properties as a tactile input. Existing visuo-tactile methods rely on global alignment and thus fail to capture the fine-grained local correspondences required for this task. The challenge is amplifi…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-13 · Bingyi Cao, Koert Chen, Kevis-Kokitsi Maninis, Kaifeng Chen, Arjun Karpur, Ye Xia, Sahil Dua, Tanmaya Dabral, Guangxing Han, Bohyung Han, Joshua Ainslie, Alex Bewley, Mithun Jacob, René Wagner, Washington Ramos, Krzysztof Choromanski, Mojtaba Seyedhosseini, Howard Zhou, André Araujo
General AI
Recent progress in vision-language pretraining has enabled significant improvements to many downstream computer vision applications, such as classification, retrieval, segmentation and depth prediction. However, a fundamental capability that these models still struggle with is aligning dense patch representations with …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-14 · Deyuan Liu, Peng Sun, Yansen Han, Zhenglin Cheng, Chuyan Chen, Tao Lin
General AI
The push for efficient text to image synthesis has moved the field toward one step sampling, yet existing methods still face a three way tradeoff among fidelity, inference speed, and training efficiency. Approaches that rely on external discriminators can sharpen one step performance, but they often introduce training …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-17 · Heewon Oh
General AI
We present ArtifactNet, a lightweight framework that detects AI-generated music by reframing the problem as forensic physics -- extracting and analyzing the physical artifacts that neural audio codecs inevitably imprint on generated audio. A bounded-mask UNet (ArtifactUNet, 3.6M parameters) extracts codec residuals fro…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-17 · Tristan Kirscher, Alexandra Ertl, Klaus Maier-Hein, Xavier Coubez, Philippe Meyer, Sylvain Faisan
General AI
Pancreatic ductal adenocarcinoma (PDAC) segmentation on contrast-enhanced CT is inherently ambiguous: inter-rater disagreement among experts reflects genuine uncertainty rather than annotation noise. Standard deep learning approaches assume a single ground truth, producing probabilistic outputs that can be poorly calib…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-18 · Gabriel Jason Lee, Jathurshan Pradeepkumar, Jimeng Sun
General AI
Electroencephalography (EEG) foundation models have shown strong potential for learning generalizable representations from large-scale neural data, yet their clinical deployment is hindered by distribution shifts across clinical settings, devices, and populations. Test-time adaptation (TTA) offers a promising solution …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-20 · Difan Jiao, Yilun Liu, Ye Yuan, Zhenwei Tang, Linfeng Du, Haolun Wu, Ashton Anderson
General AI
Guard models are widely used to detect harmful content in user prompts and LLM responses. However, state-of-the-art guard models rely solely on terminal-layer representations and overlook the rich safety-relevant features distributed across internal layers. We present SIREN, a lightweight guard model that harnesses the…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-21 · Guray Ozgur, Tahar Chettaoui, Eduarda Caldeira, Jan Niklas Kolf, Marco Huber, Andrea Atzori, Naser Damer, Fadi Boutros
General AI
Face Image Quality Assessment (FIQA) aims to assess the recognition utility of face samples and is essential for reliable face recognition (FR) systems. Existing approaches require computationally expensive procedures such as multiple forward passes, backpropagation, or additional training, and only recent work has foc…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-21 · Guray Ozgur, Tahar Chettaoui, Eduarda Caldeira, Jan Niklas Kolf, Andrea Atzori, Fadi Boutros, Naser Damer
General AI
Face Image Quality Assessment is crucial for reliable face recognition systems, yet existing Vision Transformer-based approaches rely exclusively on final-layer representations, ignoring quality-relevant information captured at intermediate network depths. This paper presents the first comprehensive investigation of ho…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-22 · Adriana Aida, Walida Amer, Katarina Bankovic, Dhruv Behl, Fabian Busch, Annie Bhalla, Minh Duong, Florian Gienger, Rohan Godse, Denis Grachev, Ralf Gulde, Elisa Hagensieker, Junpeng Hu, Shivam Joshi, Tobias Knoblauch, Likith Kumar, Damien LaRocque, Keerthana Lokesh, Omar Moured, Khiem Nguyen, Christian Preyss, Ranjith Sriganesan, Vikram Singh, Carsten Sponner, Anh Tong, Dominik Tuscher, Marc Tuscher, Pavan Upputuri
General AI
Industrial robotic manipulation demands reliable long-horizon execution across embodiments, tasks, and changing object distributions. While Vision-Language-Action models have demonstrated strong generalization, they remain fundamentally reactive. By optimizing the next action given the current observation without evalu…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-23 · Xiaojie Xu, Zhengyuan Lin, Kang He, Yukang Feng, Xiaofeng Mao, Yuanyang Yin, Kaipeng Zhang, Yongtao Ge
General AI
Interactive video generation models such as Genie, YUME, HY-World, and Matrix-Game are advancing rapidly, yet every model is evaluated on its own benchmark with private scenes and trajectories, making fair cross-model comparison impossible. Existing public benchmarks offer useful metrics such as trajectory error, aesth…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.4
2026-04-29 · Hayate Iso, Tiyasa Mitra, Sudipta Mondal, Rasoul Shafipour, Venmugil Elango, Terry Kong, Yuki Huang, Seonjin Na, Izzy Putterman, Benjamin Chislett, Maor Ashkenazi, Joseph Guman, Gerald Shen, Tugrul Konuk, Ashwath Aithal, Ritika Borkar, Ran Zilberstein, Bita Rouhani
General AI
RL post-training of frontier language models is increasingly bottlenecked by autoregressive rollout generation, making rollout acceleration a central systems challenge. Many existing efficiency methods improve throughput by changing the rollout or optimization regime, for example, through off-policy execution, replay, …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.4
2026-04-29 · Naibin Gu, Chenxu Yang, Qingyi Si, Chuanyu Qin, Dingyu Yao, Peng Fu, Zheng Lin, Weiping Wang, Nan Duan, Jiaqi Wang
General AI
RLVR and OPD have become standard paradigms for post-training. We provide a unified analysis of these two paradigms in consolidating multiple expert capabilities into a single model, identifying capability loss in different ways: mixed RLVR suffers from inter-capability divergence cost, while the pipeline of first trai…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 6.4
2026-04-29 · Morayo Danielle Adeyemi, Ryan A. Rossi, Franck Dernoncourt
General AI
Fashion AI systems routinely encode the aesthetic logic of specific houses, editors, and historical moments without disclosing it. We present FASH-iCNN, a multimodal system trained on 87,547 Vogue runway images across 15 fashion houses spanning 1991-2024 that makes this cultural logic inspectable. Given a photograph of…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.4
2026-04-29 · Jiachen Liu, Jiaxin Pei, Jintao Huang, Chenglei Si, Ao Qu, Xiangru Tang, Runyu Lu, Lichang Chen, Xiaoyan Bai, Haizhong Zheng, Carl Chen, Zhiyang Chen, Haojie Ye, Yujuan Fu, Zexue He, Zijian Jin, Zhenyu Zhang, Shangquan Sun, Maestro Harmon, John Dianzhuo Wang, Jianqiao Zeng, Jiachen Sun, Mingyuan Wu, Baoyu Zhou, Chenyu You, Shijian Lu, Yiming Qiu, Fan Lai, Yuan Yuan, Yao Li, Junyuan Hong, Ruihao Zhu, Beidi Chen, Alex Pentland, Ang Chen, Mosharaf Chowdhury, Zechen Zhang
General AI
Scientific publication compresses a branching, iterative research process into a linear narrative, discarding the majority of what was discovered along the way. This compilation imposes two structural costs: a Storytelling Tax, where failed experiments, rejected hypotheses, and the branching exploration process are dis…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 6.4
2026-04-30 · Jona te Lintelo, Lichao Wu, Marina Krček, Sengim Karayalçin, Stjepan Picek
General AI
Mixture-of-Experts (MoE) architectures in Large Language Models (LLMs) have significantly reduced inference costs through sparse activation. However, this sparse activation paradigm also introduces new safety challenges. Since only a subset of experts is engaged for each input, model behavior becomes coupled to routing…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 6.4
2026-05-01 · Minchan Kwon, Sunghyun Baek, Minseo Kim, Jaemyung Yu, Dongyoon Han, Junmo Kim
General AI
Large Language Model (LLM) Red-Teaming, which proactively identifies vulnerabilities of LLMs, is an essential process for ensuring safety. Finding effective and diverse attacks in red-teaming is important, but achieving both is challenging. Generative Flow Networks (GFNs) that perform distribution matching are a promis…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 6.4
2026-05-05 · Enrico Vompa, Tanel Tammet
General AI
We investigate the scaling properties of implicit deductive reasoning over Horn clauses in depth-bounded Transformers. By systematically decorrelating provability from spurious features and enforcing algorithmic alignment, we find that in sufficiently deep models with a bidirectional prefix mask, implicit reasoning app…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 6.3
2026-04-13 · Qin Liu
General AI
Existing LLM agent frameworks lack formal semantics: there is no principled way to determine whether an agent configuration is well-formed or will terminate. We present $λ_A$, a typed lambda calculus for agent composition that extends the simply-typed lambda calculus with oracle calls, bounded fixpoints (the ReAct loop…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-13 · David Nordström, Johan Edstedt, Fredrik Kahl, Georg Bökman
General AI
Finding matching keypoints between images is a core problem in 3D computer vision. However, modern matchers struggle with large in-plane rotations. A straightforward mitigation is to learn rotation invariance via data augmentation. However, it remains unclear at which stage rotation invariance should be incorporated. I…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-14 · Yinghao Qin, Mosab Bazargani, Edmund K. Burke, Carlos A. Coello Coello, Zhongmin Song, Jun Chen
General AI
This paper tackles the Electric Capacitated Vehicle Routing Problem (E-CVRP) through a bilevel optimization framework that handles routing and charging decisions separately or jointly depending on the search stage. By analyzing their interaction, we introduce a surrogate objective at the upper level to guide the search…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-14 · Junbin Su, Ziteng Xue, Shihui Zhang, Kun Chen, Weiming Hu, Zhipeng Zhang
General AI
Parameter-efficient fine-tuning (PEFT) in multimodal tracking reveals a concerning trend where recent performance gains are often achieved at the cost of inflated parameter budgets, which fundamentally erodes PEFT's efficiency promise. In this work, we introduce SEATrack, a Simple, Efficient, and Adaptive two-stream mu…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-16 · Yunfu Deng, Yuhao Li, Josiah P. Hanna
General AI
In recent years, reinforcement learning (RL) has shown remarkable success in robotics when a fast and accurate simulator is available for a given task. When using RL and simulation, more simulator realism is generally beneficial but becomes harder to obtain as robots are deployed in increasingly complex and widescale d…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-17 · Eric Gan, Aryan Bhatt, Buck Shlegeris, Julian Stastny, Vivek Hebbar
General AI
As AI systems are increasingly used to conduct research autonomously, misaligned systems could introduce subtle flaws that produce misleading results while evading detection. We introduce ASMR-Bench (Auditing for Sabotage in ML Research), a benchmark for evaluating the ability of auditors to detect sabotage in ML resea…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-17 · Ming-Bin Chen, Jey Han Lau, Lea Frermann
General AI
Measuring the quality of public deliberation requires evaluating not only civility or argument structure, but also the informational progress of a conversation. We introduce a framework for Conversational Information Gain (CIG) that evaluates each utterance in terms of how it advances collective understanding of the ta…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-17 · Habibeh Naderi, Behrouz Haji Soleimani, Stan Matwin
General AI
Large pre-trained language models are increasingly adapted to downstream tasks using parameter-efficient fine-tuning (PEFT), but existing PEFT methods are typically deterministic and unimodal, making them poorly suited for low-resource multimodal settings where predictive uncertainty and cross-modal reliability both ma…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 6.3
2026-04-17 · Matthew Frazier, Kostadin Damevski, Lori Pollock
General AI
Secondary school students enrolled in the AP Computer Science Principles (CSP) course commonly utilize web resources (e.g., tutorials, Q\&A sites) to better understand key concepts in the curriculum. The primary obstacle to using these resources is finding information appropriate for the learning task and student's bac…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-20 · Savya Khosla, Sethuraman T, Aryan Chadha, Alex Schwing, Derek Hoiem
General AI
Despite recent progress, vision-language encoders struggle with two core limitations: (1) weak alignment between language and dense vision features, which hurts tasks like open-vocabulary semantic segmentation; and (2) high token counts for fine-grained visual representations, which limits scalability to long videos. T…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-21 · Sarah Lykke Tost, Adson Lucas de Paiva Sales, Henrik Østergaard, Vaishali Dhanoa, Gabriela Molina León
General AI
We designed and implemented InvestChat, a multimodal tablet-based application that supports stock market exploration with multiple coordinated views and an LLM-powered chat. We evaluated the application with 12 novice investors. Our findings suggest that combining natural language, touch, and pen input during stock mar…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-21 · Mengting Chen, Zhengrui Chen, Yongchao Du, Zuan Gao, Taihang Hu, Jinsong Lan, Chao Lin, Yefeng Shen, Xingjian Wang, Zhao Wang, Zhengtao Wu, Xiaoli Xu, Zhengze Xu, Hao Yan, Mingzhou Zhang, Jun Zheng, Qinye Zhou, Xiaoyong Zhu, Bo Zheng
General AI
Recent advances in image generation and editing have opened new opportunities for virtual try-on. However, existing methods still struggle to meet complex real-world demands. We present Tstars-Tryon 1.0, a commercial-scale virtual try-on system that is robust, realistic, versatile, and highly efficient. First, our syst…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-21 · Jean Mercat, Sedrick Keh, Kushal Arora, Isabella Huang, Paarth Shah, Haruki Nishimura, Shun Iwase, Katherine Liu
General AI
We present VLA Foundry, an open-source framework that unifies LLM, VLM, and VLA training in a single codebase. Most open-source VLA efforts specialize on the action training stage, often stitching together incompatible pretraining pipelines. VLA Foundry instead provides a shared training stack with end-to-end control, …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-22 · Sina Gholami, Abdulmoneam Ali, Tania Haghighi, Ahmed Arafa, Minhaj Nur Alam
General AI
Federated learning (FL) enables collaborative model training without sharing raw data; however, the presence of noisy labels across distributed clients can severely degrade the learning performance. In this paper, we propose FedSIR, a multi-stage framework for robust FL under noisy labels. Different from existing appro…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-23 · Zihan Wang, Rui Zhang, Yu Liu, Chi Liu, Qingchuan Zhao, Hongwei Li, Guowen Xu
General AI
LLM agents increasingly rely on skills to encapsulate reusable capabilities via progressively disclosed instructions. High-quality skills inject expert knowledge into general-purpose models, improving performance on specialized tasks. This quality and ease of dissemination drive the emergence of a skill economy: free s…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-23 · Zhiqiu Xu, Shibo Jin, Shreya Arya, Mayur Naik
General AI
As frontier language models attain near-ceiling performance on static mathematical benchmarks, existing evaluations are increasingly unable to differentiate model capabilities, largely because they cast models solely as solvers of fixed problem sets. We introduce MathDuels, a self-play benchmark in which models occupy …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-24 · Ashley J. Chen, Yijia Cao, Minghao Shao, Ramesh Karri, Muhammad Shafique
General AI
The emergence of large language models has enabled vibe coding, a natural language approach to programming in which users describe intent and AI generates or revises code, potentially broadening access to programming while preserving meaningful learning outcomes. We investigate its educational value through a month-lon…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-24 · Suvam Basak, Amitangshu Pal, Debopam Bhattacherjee
General AI
The May 2024 solar superstorm highlighted the vulnerability of rapidly expanding low Earth orbit (LEO) satellite networks to severe space weather events. To systematically evaluate LEO network resilience, we introduce an open-source tool, CosmicDancePro. It enables a comprehensive analysis of the effects of solar storm…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-26 · Sophie Chiang, Tom Brennan, Fethiye Irmak Dogan, Jiaee Cheong, Hatice Gunes
General AI
In recent years, the integration of multimodal machine learning in wellbeing assessment has offered transformative potential for monitoring mental health. However, with the rapid advancement of Vision-Language Models (VLMs), their deployment in clinical settings has raised concerns due to their lack of transparency and…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-27 · Zhihan Zhang, Lizi Liao
General AI
Chart-to-code generation converts a chart image into an executable plotting script, enabling faithful reproduction and editable visualizations. Existing methods are largely Python-centric, limiting practical use and overlooking a critical source of supervision: the same chart can be expressed by semantically equivalent…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-27 · Siavash Golkar, Jake Kovalic, Irina Espejo Morales, Samuel Sledzieski, Minhuan Li, Ksenia Sokolova, Geraud Krawezik, Alberto Bietti, Claudia Skok Gibbs, Roman Klypa, Shengwei Xiong, Francois Lanusse, Liam Parker, Kyunghyun Cho, Miles Cranmer, Tom Hehir, Michael McCabe, Lucas Meyer, Rudy Morel, Payel Mukhopadhyay, Mariel Pettee, Helen Qu, Jeff Shen, David Fouhey, Hadi Sotoudeh, Vikram Mulligan, Pilar Cossio, Sonya M. Hanson, Alisha N. Jones, Olga G. Troyanskaya, Shirley Ho
General AI
Biological function emerges from coupled constraints across sequence, structure, regulation, evolution, and cellular context, yet most foundation models in biology are trained within one modality or for a fixed forward task. We present MIMIC, a generative multimodal foundation model trained on our newly curated and ali…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-27 · Boyang Wang, Guangyi Xu, Zhipeng Tang, Jiahui Zhang, Zezhou Cheng
General AI
Shot Boundary Detection (SBD) aims to automatically identify shot changes and divide a video into coherent shots. While SBD was widely studied in the literature, existing state-of-the-art methods often produce non-interpretable boundaries on transitions, miss subtle yet harmful discontinuities, and rely on noisy, low-d…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-27 · Weijie Wang, Xiaoxuan He, Youping Gu, Yifan Yang, Zeyu Zhang, Yefei He, Yanbo Ding, Xirui Hu, Donny Y. Chen, Zhiyuan He, Yuqing Yang, Bohan Zhuang
General AI
Recent video foundation models demonstrate impressive visual synthesis but frequently suffer from geometric inconsistencies. While existing methods attempt to inject 3D priors via architectural modifications, they often incur high computational costs and limit scalability. We propose World-R1, a framework that aligns v…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.2
2026-04-29 · Catherine Liu, Tao Long, Asya Vaisberg, Chau Vu, Jiaju Ma, Jingyi Li
General AI
Creativity support tools (CSTs) aim to elevate the quality of artists' creative processes and artifacts. Yet most current CST evaluations overlook temporal and social aspects of tool use. To address this gap, we present a longitudinal, group-based CST evaluation through a three-week deployment of ArtKrit, a computation…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.2
2026-04-29 · Evangelia Kopadi, Dimitris Kalles
General AI
Can Neural Assemblies -- groups of neurons that fire together and strengthen through co-activation -- learn the direction of causal influence between variables? While established as a computationally general substrate for classification, parsing, and planning, neural assemblies have not yet been shown to internalize ca…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.2
2026-04-29 · Yuxuan Tian, Yurun Jin, Bin Yu, Yukun Shi, Hao Wu, Chi Harold Liu, Kai Chen, Cong Huang
General AI
Robotic manipulation critically requires reasoning about future spatial-temporal interactions, yet existing VLA policies and world-model-enhanced policies do not fully model action-relevant spatial-temporal interaction structure. We propose STARRY, a world-model-enhanced action-generation policy that aligns spatial-tem…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.2
2026-04-29 · Zhuofan Lou, Shihang Zhang, Fangle Zhu, Shengjie Ye, Pingyu Wang
General AI
We propose UAPAR, an Uncertainty-Aware Pedestrian Attribute Recognition framework. To the best of our knowledge, this is the first EDL-based uncertainty-aware framework for pedestrian attribute recognition (PAR). Unlike conventional deterministic methods, which fail to assess prediction reliability on low-quality sampl…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.2
2026-04-30 · Maykon Nunes, Emanuel Coutinho, Carla Bezerra, Ivan Machado
General AI
Angular is one of the most widely adopted frameworks for developing large-scale, dynamic web applications. As projects increase in scope and complexity, developers face growing challenges in managing architecture and maintaining clean, modular code. These challenges often lead to design flaws, commonly referred to as c…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.2
2026-04-30 · Tianyuan Wu, Chaokun Chang, Lunxi Cao, Wei Gao, Wei Wang
General AI
Autonomous agents act through sandboxed containers and microVMs whose state spans filesystems, processes, and runtime artifacts. Checkpoint and restore (C/R) of this state is needed for fault tolerance, spot execution, RL rollout branching, and safe rollback-yet existing approaches fall into two extremes: application-l…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.2
2026-04-30 · Kehong Gong, Zhengyu Wen, Dao Thien Phong, Mingxi Xu, Weixia He, Qi Wang, Ning Zhang, Zhengyu Li, Guanli Hou, Dongze Lian, Xiaoyu He, Mingyuan Zhang, Hanwang Zhang
General AI
Recent methods for arbitrary-skeleton motion capture from monocular video follow a factorized pipeline, where a Video-to-Pose network predicts joint positions and an analytical inverse-kinematics (IK) stage recovers joint rotations. While effective, this design is inherently limited, since joint positions do not fully …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.2
2026-04-30 · Junyoung Lee, Sookwan Han, Jeonghwan Kim, Inhee Lee, Mingi Choi, Jisoo Kim, Wonjung Woo, Hanbyul Joo
General AI
Human-robot collaboration has been studied primarily in dyadic or sequential settings. However, real homes require multiadic collaboration, where multiple humans and robots share a workspace, acting concurrently on interleaved subtasks with tight spatial and temporal coupling. This regime remains underexplored because …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.2
2026-05-01 · Bruce Rushing, Angela Danquah, Alireza Namazi, Arjun Dirghangi, Heman Shakeri
General AI
Effectively stratifying patient risk in chronic diseases like glaucoma is a major clinical challenge. Clinicians need tools to identify patients at high risk of progression from sparse and irregularly-sampled electronic health records (EHRs). We propose a novel deep kernel learning (DKL) architecture that leverages a G…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 6.2
2026-05-01 · Sizhe Tang, Zuyuan Zhang, Mahdi Imani, Tian Lan
General AI
Monte Carlo Tree Search (MCTS) scales poorly in cooperative multi-agent domains because expansion must consider an exponentially large set of joint actions, severely limiting exploration under realistic search budgets. We propose NonZero, which keeps multi-agent MCTS tractable by running surrogate-guided selection over…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 6.2
2026-05-01 · Guandong Li, Mengxia Ye
General AI
Image editing instructions are heterogeneous: a color swap, an object insertion, and a physical-action edit all demand different spatial coverage and different reasoning depth, yet existing reasoning-based editors apply a single fixed inference recipe to every instruction. We argue that adaptivity along both the spatia…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 6.2
2026-05-03 · Sen Fang, Hongbin Zhong, Yanxin Zhang, Dimitris N. Metaxas
General AI
Existing large-scale sign language resources typically provide supervision only at the level of raw video-text alignment and are often produced in laboratory settings. While such resources are important for semantic understanding, they do not directly provide a unified interface for open-world recognition and translati…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 6.0
2026-03-25 · Matan Ben-Yosef, Tavi Halperin, Naomi Ken Korem, Mohammad Salama, Harel Cain, Asaf Joseph, Anthony Chen, Urska Jelercic, Ofir Bibi
General AI
Controlling video and audio generation requires diverse modalities, from depth and pose to camera trajectories and audio transformations, yet existing approaches either train a single monolithic model for a fixed set of controls or introduce costly architectural changes for each new modality. We introduce AVControl, a …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.0
2026-03-25 · Qijia He, Xunmei Liu, Hammaad Memon, Ziang Li, Zixian Ma, Jaemin Cho, Jason Ren, Daniel S Weld, Ranjay Krishna
General AI
Scalable Vector Graphics (SVG) are an essential format for technical illustration and digital design, offering precise resolution independence and flexible semantic editability. In practice, however, original vector source files are frequently lost or inaccessible, leaving only "flat" rasterized versions (e.g., PNG or …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.0
2026-03-30 · Zhangqi Jiang, Zheng Sun, Xianfang Zeng, Yufeng Yang, Xuanyang Zhang, Yongliang Wu, Wei Cheng, Gang Yu, Xu Yang, Bihan Wen
General AI
Recent advances in image editing have enabled models to handle complex instructions with impressive realism. However, existing evaluation frameworks lag behind: current benchmarks suffer from narrow task coverage, while standard metrics fail to adequately capture visual consistency, i.e., the preservation of identity, …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.0
2026-04-02 · Aleksandar Cvejic, Rameen Abdal, Abdelrahman Eldesokey, Bernard Ghanem, Peter Wonka
General AI
When evaluating identity-focused tasks such as personalized generation and image editing, existing vision encoders entangle object identity with background context, leading to unreliable representations and metrics. We introduce the first principled framework to address this vulnerability using Near-identity (NearID) d…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.0
2026-04-05 · Hui Sun, Yun-Ji Zhang, Zheng Xie, Ren-Biao Liu, Yali Du, Xin-Ye Li, Ming Li
General AI
Selecting LLM-generated code candidates using LLM-generated tests is challenging because the tests themselves may be incorrect. Existing methods either treat all tests equally or rely on ad-hoc heuristics to filter unreliable tests. Yet determining test correctness requires knowing which codes are correct, creating a c…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.0
2026-04-06 · Shwai He, Guoheng Sun, Haichao Zhang, Yun Fu, Ang Li
General AI
Network pruning, which removes less important parameters or architectures, is often expected to improve efficiency while preserving performance. However, this expectation does not consistently hold across language tasks: pruned models can perform well on non-generative tasks but frequently fail in generative settings. …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.0
2026-04-06 · Yicheng Xiao, Wenhu Zhang, Lin Song, Yukang Chen, Wenbo Li, Nan Jiang, Tianhe Ren, Haokun Lin, Wei Huang, Haoyang Huang, Xiu Li, Nan Duan, Xiaojuan Qi
General AI
Image spatial editing performs geometry-driven transformations, allowing precise control over object layout and camera viewpoints. Current models are insufficient for fine-grained spatial manipulations, motivating a dedicated assessment suite. Our contributions are listed: (i) We introduce SpatialEdit-Bench, a complete…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.0
2026-04-28 · Wenqi Jia, Zekun Li, Abhay Mittal, Chengcheng Tang, Chuan Guo, Lezi Wang, James Matthew Rehg, Lingling Tao, Size An
General AI
Recent advances in text-driven human motion generation enable models to synthesize realistic motion sequences from natural language descriptions. However, most existing approaches assume identity-neutral motion and generate movements using a canonical body representation, ignoring the strong influence of body morpholog…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.0
2026-04-28 · Zhiyuan Fan, Tinghao Yu, Yuanjun Cai, Jiangtao Guan, Yun Yang, Dingxin Hu, Jiang Zhou, Xing Wu, Zhuo Han, Feng Zhang, Lilin Wang
General AI
Terminal agents have demonstrated strong potential for autonomous command-line execution, yet their training remains constrained by the scarcity of high-quality and diverse execution trajectories. Existing approaches mitigate this bottleneck by synthesizing large-scale terminal task instances for trajectory sampling. H…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.0
2026-05-07 · Rui Wang, Yue Zhang, Jiehong Lin, Kuncheng Luo, Jianan Wang, Zhongrui Wang, Xiaojuan Qi
General AI
World Action Models (WAMs) have recently emerged as a promising paradigm for robotic manipulation by jointly predicting future visual observations and future actions. However, current WAMs typically execute a fixed number of predicted actions after each model inference, leaving the robot blind to whether the imagined f…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 6.0
2026-05-08 · Yuanzhi Wang, Xuhua Ren, Jiaxiang Cheng, Bing Ma, Kai Yu, Tianxiang Zheng, Qinglin Lu, Zhen Cui
General AI
Human image animation has witnessed significant advancements, yet generating high-fidelity hand motions remains a persistent challenge due to their high degrees of freedom and motion complexity. While reinforcement learning from human feedback, particularly direct preference optimization, offers a potential solution, i…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.8
2026-03-23 · Ulugbek Shernazarov, Rostislav Svitsov, Bin Shi
General AI
Fine-tuning large language models for domain-specific tasks such as medical text summarization demands substantial computational resources. Parameter-efficient fine-tuning (PEFT) methods offer promising alternatives by updating only a small fraction of parameters. This paper compares three adaptation approaches-Low-Ran…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 5.8
2026-03-23 · Seunghan Lee, Jun Seo, Jaehoon Lee, Sungdong Yoo, Minjae Kim, Tae Yoon Lim, Dongwan Kang, Hwanil Choi, SoonYoung Lee, Wonbin Ahn
General AI
Recent advances in multimodal learning have motivated the integration of auxiliary modalities such as text or vision into time series (TS) forecasting. However, most existing methods provide limited gains, often improving performance only in specific datasets or relying on architecture-specific designs that limit gener…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-03-26 · Giulio Pisaneschi, Pierpaolo Serio, Estelle Gerbier, Andrea Dan Ryals, Lorenzo Pollini, Mario G. C. A. Cimino
General AI
This paper presents an experimental platform for studying intentional-state attribution toward a non-humanoid robot. The system combines a simulated robot, realistic task environments, and large language model-based explanatory layers that can express the same behavior in mentalistic, teleological, or mechanistic terms…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-03-26 · Cole Walsh, Rodica Ivan
General AI
Automated systems have been widely adopted across the educational testing industry for open-response assessment and essay scoring. These systems commonly achieve performance levels comparable to or superior than trained human raters, but have frequently been demonstrated to be vulnerable to the influence of construct-i…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-03-27 · Zhangyong Liang, Huanhuan Gao
General AI
In practical structural design and solid mechanics simulations, material properties inherently exhibit random variations within bounded intervals. However, evaluating mechanical responses under continuous material uncertainty remains a persistent challenge. Traditional numerical approaches, such as the Finite Element M…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.8
2026-03-27 · Wooseong Jeong, Wonyoung Lee, Kuk-Jin Yoon
General AI
Merging multiple Low-Rank Adaptation (LoRA) modules is promising for constructing general-purpose systems, yet challenging because LoRA update directions span different subspaces and contribute unevenly. When merged naively, such mismatches can weaken the directions most critical to certain task losses while overemphas…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.8
2026-03-30 · Ashwini Dasare, Nirmesh Shah, Ashishkumar Gudmalwar, Pankaj Wasnik
General AI
Evaluating AI generated dubbed content is inherently multi-dimensional, shaped by synchronization, intelligibility, speaker consistency, emotional alignment, and semantic context. Human Mean Opinion Scores (MOS) remain the gold standard but are costly and impractical at scale. We present a hierarchical multimodal archi…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-03-30 · Kun Tang, Xinquan Yang, Mianjie Zheng, Xuefen Liu, Xuguang Li, Xiaoqi Guo, Ruihan Chen, Linlin Shen, He Meng
General AI
The scarcity and high cost of expert annotations in dental imaging present a significant challenge for the development of AI in dentistry. DINOv3, a state-of-the-art, self-supervised vision foundation model pre-trained on 1.7 billion images, offers a promising pathway to mitigate this issue. However, its reliability wh…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-03-30 · Sujith Pulikodan, Abhayjeet Singh, Agneedh Basu, Lokesh Rady, Nihar Desai, Pavan Kumar J, Prajjwal Srivastav, Pranav D Bhat, Raghu Dharmaraju, Ritika Gupta, Sathvik Udupa, Saurabh Kumar, Sumit Sharma, Vaibhav Vishwakarma, Visruth Sanka, Dinesh Tewari, Harsh Dhand, Amrita Kamat, Sukhwinder Singh, Shikhar Vashishth, Partha Talukdar, Raj Acharya, Prasanta Kumar Ghosh
General AI
Project VAANI is an initiative to create an India-representative multi-modal dataset that comprehensively maps India's linguistic diversity, starting with 165 districts across the country in its first two phases. Speech data is collected through a carefully structured process that uses image-based prompts to encourage …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-03-31 · Wenyi Li, Renkai Luo, Yue Yu, Huan-ang Gao, Mingju Gao, Li Yuan, Chaoyou Fu, Hao Zhao
General AI
AI-assisted coding has rapidly reshaped software practice and research workflows, yet today's models still struggle to produce correct code for complex 3D geometric vision. If models could reliably write such code, the research of our community would change substantially. To measure progress toward that goal, we introd…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-03-31 · Mohammadhossein Khojasteh, Yifan Jiang, Stefano De Giorgis, Frank van Harmelen, Filip Ilievski
General AI
Analogical reasoning is a key driver of human generalization in problem-solving and argumentation. Yet, analogies between narrative structures remain challenging for machines. Cognitive engines for structural mapping are not directly applicable, as they assume pre-extracted entities, whereas LLMs' performance is sensit…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-04-01 · Amin Bigdeli, Mert Incesu, Negar Arabzadeh, Charles L. A. Clarke, Ebrahim Bagheri
General AI
We present ReFormeR, a pattern-guided approach for query reformulation. Instead of prompting a language model to generate reformulations of a query directly, ReFormeR first elicits short reformulation patterns from pairs of initial queries and empirically stronger reformulations, consolidates them into a compact librar…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-04-01 · Kawtar Zaher, Olivier Buisson, Alexis Joly
General AI
Building on existing approaches, we revisit Human-in-the-Loop Object Retrieval, a task that consists of iteratively retrieving images containing objects of a class-of-interest, specified by a user-provided query. Starting from a large unlabeled image collection, the aim is to rapidly identify diverse instances of an ob…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-04-02 · Feiyu Zhou, Marios Impraimakis
General AI
The wind-induced structural response forecasting capabilities of a novel transformer methodology are examined here. The model also provides a digital twin component for bridge structural health monitoring. Firstly, the approach uses the temporal characteristics of the system to train a forecasting model. Secondly, the …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-04-06 · David Nordström, Johan Edstedt, Georg Bökman, Jonathan Astermark, Anders Heyden, Viktor Larsson, Mårten Wadenbäck, Michael Felsberg, Fredrik Kahl
General AI
Local feature matching has long been a fundamental component of 3D vision systems such as Structure-from-Motion (SfM), yet progress has lagged behind the rapid advances of modern data-driven approaches. The newer approaches, such as feed-forward reconstruction models, have benefited extensively from scaling dataset siz…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-04-07 · Yulin Zou, Yan Chen, Wenyan Chen, JooYoung Park, Shivaraman Nitin, Luo Tao, Francisco Romero, Dmitrii Ustiugov
General AI
Video streaming analytics is a crucial workload for vision-language model serving, but the high cost of multimodal inference limits scalability. Prior systems reduce inference cost by exploiting temporal and spatial redundancy in video streams, but they target either the vision transformer (ViT) or the LLM with a limit…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-04-07 · Junbin Zhang, Meng Cao, Feng Tan, Yikai Lin, Yuexian Zou
General AI
Achieving fine-grained and structurally sound controllability is a cornerstone of advanced visual generation. Existing part-based frameworks treat user-provided parts as an unordered set and therefore ignore their intrinsic spatial and semantic relationships, which often results in compositions that lack structural int…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-04-07 · Lin Mu, Haiyang Wang, Li Ni, Lei Sang, Zhize Wu, Peiquan Jin, Yiwen Zhang
General AI
Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning of Large Language Models (LLMs), and recent Mixture-of-Experts (MoE) extensions further enhance flexibility by dynamically combining multiple LoRA experts. However, existing MoE-augmented LoRA methods assume that experts operate independently, often lea…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-04-09 · Yunsong Zhou, Hangxu Liu, Xuekun Jiang, Xing Shen, Yuanzhen Zhou, Hui Wang, Baole Fang, Yang Tian, Mulin Yu, Qiaojun Yu, Li Ma, Hengjie Li, Hanqing Wang, Jia Zeng, Jiangmiao Pang
General AI
Robotic manipulation with deformable objects represents a data-intensive regime in embodied learning, where shape, contact, and topology co-evolve in ways that far exceed the variability of rigids. Although simulation promises relief from the cost of real-world data acquisition, prevailing sim-to-real pipelines remain …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-04-28 · Bangzhao Shu, Arinjay Singh, Mai ElSherief
General AI
Large language models (LLMs) are increasingly used in emotionally sensitive human-AI applications, yet little is known about how emotion recognition is internally represented. In this work, we investigate the internal mechanisms of emotion recognition in LLMs using sparse autoencoders (SAEs). By analyzing sparse featur…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-03-23 · Zixian Huang, Kaichen Yang, Xu Huang, Feiyang Hao, Qiming Ge, Bowen Li, He Du, Kai Chen, Qipeng Guo
General AI
A widely adopted strategy for model enhancement is to use synthetic data generated by a stronger model for supervised fine-tuning (SFT). However, for emerging reasoning models like Qwen3-8B, this approach often fails to improve reasoning capabilities and can even lead to a substantial drop in performance. In this work,…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 5.5
2026-04-10 · Ximing Xing, Ziteng Xue, Zhenxi Li, Weicong Liang, Linqing Wang, Zhantao Yang, Tiankai Hang, Zijin Yin, Qinglin Lu, Chunyu Wang, Qian Yu
General AI
Recent large language models have shifted SVG generation from differentiable rendering optimization to autoregressive program synthesis. However, existing approaches still rely on generic byte-level tokenization inherited from natural language processing, which poorly reflects the geometric structure of vector graphics…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 5.5
2026-04-14 · Yein Park, Jungwoo Park, Jaewoo Kang
General AI
Large language models (LLMs), despite being safety-aligned, exhibit brittle refusal behaviors that can be circumvented by simple linguistic changes. As tense jailbreaking demonstrates that models refusing harmful requests often comply when rephrased in past tense, a critical generalization gap is revealed in current al…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 5.5
2026-04-14 · Jiacheng Liu, Xiaohan Zhao, Xinyi Shang, Zhiqiang Shen
General AI
Claude Code is an agentic coding tool that can run shell commands, edit files, and call external services on behalf of the user. This study describes its comprehensive architecture by analyzing the publicly available TypeScript source code and further comparing it with OpenClaw, an independent open-source AI agent syst…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 5.5
2026-04-16 · Natapong Nitarach
General AI
Majority voting over multiple LLM attempts improves mathematical reasoning, but correlated errors limit the effective sample size. A natural fix is to assign different reasoning strategies to different voters. The approach, Diverse Prompt Mixer, is tested on the AIMO 3 competition: 3 models, 23+ experiments, 50 IMO-lev…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-04-18 · Syed Muhammad Aqdas Rizvi
General AI
Decentralized Autonomous Organizations (DAOs) are inclined explore Small Language Models (SLMs) as edge-native constitutional firewalls to vet proposals and mitigate semantic social engineering. While scaling inference-time compute (System 2) enhances formal logic, its efficacy in highly adversarial, cryptoeconomic gov…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 5.5
2026-04-20 · Qifan Zhang, Dongyang Ma, Tianqing Fang, Jia Li, Jing Tang, Nuo Chen, Haitao Mi, Yan Wang
General AI
Most agents today ``self-evolve'' by following rewards and rules defined by humans. However, this process remains fundamentally dependent on external supervision; without human guidance, the evolution stops. In this work, we train agents to possess an intrinsic meta-evolution capability to spontaneously learn about uns…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-04-21 · Qingyang Zhang, Xinke Kong, Haitao Wu, Qinghua Hu, Minghao Wu, Baosong Yang, Yu Cheng, Yun Luo, Ganqu Cui, Changqing Zhang
General AI
Test-time training (TTT) adapts model parameters on unlabeled test instances during inference time, which continuously extends capabilities beyond the reach of offline training. Despite initial gains, existing TTT methods for LRMs plateau quickly and do not benefit from additional test-time compute. Without external ca…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-04-25 · Wenlong Deng, Qi Zeng, Jiaming Zhang, Minghui Chen, Zixin Ding, Christos Thrampoulidis, Boying Gong, Xiaoxiao Li
General AI
Data valuation is essential for enhancing the transparency and accountability of large language models (LLMs) and vision-language models (VLMs). However, existing methods typically rely on gradient computations, making them computationally prohibitive for billion-parameter models and precluding batch parallelization. I…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 5.4
2026-04-29 · Rui Xu, Jiepeng Wang, Hao Pan, Yang Liu, Xin Tong, Shiqing Xin, Changhe Tu, Taku Komura, Wenping Wang
General AI
In this paper, we study an under-explored but important factor of diffusion generative models, i.e., the combinatorial complexity. Data samples are generally high-dimensional, and for various structured generation tasks, additional attributes are combined to associate with data samples. We show that the space spanned b…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 5.4
2026-05-01 · Yi Wang, Xinchen Li, Pengwei Xie, Pu Yang, Buqing Nie, Yunuo Cai, Qinglin Zhang, Chendi Qu, Jeffrey Wu, Jianheng Song, Xinlin Ren, Jingshun Huang, Mingjie Pan, Siyuan Feng, Zhi Chen, Jianlan Luo
General AI
Generalist robot policies increasingly benefit from large-scale pretraining, but offline data alone is insufficient for robust real-world deployment. Deployed robots encounter distribution shifts, long-tail failures, task variations, and human correction opportunities that fixed demonstration datasets cannot fully capt…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.3
2026-04-13 · Junhao Chen, Kejun Gao, Yuehan Cui, Mingze Sun, Mingjin Chen, Shaohui Wang, Xiaoxiao Long, Fei Ma, Qi Tian, Ruqi Huang, Hao Zhao
General AI
Despite rapid progress in video generation, existing models are incapable of producing vector animation, a dominant and highly expressive form of multimedia on the Internet. Vector animations offer resolution-independence, compactness, semantic structure, and editable parametric motion representations, yet current gene…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-13 · Xingjian Ran, Shujie Zhang, Weipeng Zhong, Li Luo, Bo Dai
General AI
Generating high-fidelity 3D indoor scenes remains a significant challenge due to data scarcity and the complexity of modeling intricate spatial relations. Current methods often struggle to scale beyond training distribution to dense scenes or rely on LLMs/VLMs that lack the ability for precise spatial reasoning. Buildi…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-14 · Yaru Niu, Zhenlong Fang, Binghong Chen, Shuai Zhou, Revanth Senthilkumaran, Hao Zhang, Bingqing Chen, Chen Qiu, H. Eric Tseng, Jonathan Francis, Ding Zhao
General AI
Humanoid robots promise general-purpose assistance, yet real-world humanoid loco-manipulation remains challenging because it requires whole-body stability, dexterous hands, and contact-aware perception under frequent contact changes. In this work, we study dexterous, contact-rich humanoid loco-manipulation. We first de…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-14 · Megha Chakraborty, Darssan L. Eswaramoorthi, Madhur Thareja, Het Riteshkumar Shah, Finlay Palmer, Aryaman Bahl, Michelle A Ihetu, Amit Sheth
General AI
AI-driven education platforms have made some progress in personalisation, yet most remain constrained to static adaptation--predefined quizzes, uniform pacing, or generic feedback--limiting their ability to respond to learners' evolving understanding. This shortfall highlights the need for systems that are both context…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-14 · Nafis Fuad Shahid, Maroof Ahmed, Md Akib Haider, Saidur Rahman Sagor, Aashnan Rahman, Md Azam Hossain
General AI
Multimodal federated learning enables privacy-preserving collaborative model training across healthcare institutions. However, a fundamental challenge arises from modality heterogeneity: many clinical sites possess only a subset of modalities due to resource constraints or workflow variations. Existing approaches addre…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-14 · Yida Niu, Xinhai Chang, Xin Liu, Ziyuan Jiao, Yixin Zhu
General AI
Robots deployed in unstructured environments must coordinate whole-body motion -- simultaneously moving a mobile base and arm -- to interact with the physical world. This coupled mobility and dexterity yields a state space that grows combinatorially with scene and object diversity, demanding datasets far larger than th…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-16 · Mitch Adler, Matthew Russo, Michael Cafarella
General AI
In the past year, researchers have started to create agentic systems that can design real-world CAD-style objects in a training-free setting, a new variety of system that we call Agent-Aided Design. Generally speaking, these systems place an agent in a feedback loop in which it can write code, compile that code to an a…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-16 · Manan Gupta, Dhruv Kumar
General AI
LLM-as-judge frameworks are increasingly used for automatic NLG evaluation, yet their per-instance reliability remains poorly understood. We present a two-pronged diagnostic toolkit applied to SummEval: $\textbf{(1)}$ a transitivity analysis that reveals widespread per-input inconsistency masked by low aggregate violat…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-16 · Madhav Agarwal, Sotirios A. Tsaftaris, Laura Sevilla-Lara, Steven McDonagh
General AI
Understanding emotions is a fundamental ability for intelligent systems to be able to interact with humans. Vision-language models (VLMs) have made tremendous progress in the last few years for many visual tasks, potentially offering a promising solution for understanding emotions. However, it is surprising that even t…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-17 · Deepak Kumar, Abhishek Pratap Singh, Puneet Kumar, Xiaobai Li, Balasubramanian Raman
General AI
Understanding affective dynamics in real-world social systems is fundamental to modeling and analyzing human-human interactions in complex environments. Group affect emerges from intertwined human-human interactions, contextual influences, and behavioral cues, making its quantitative modeling a challenging computationa…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.3
2026-04-17 · Simone Heisinger, Luca Pulina, Martina Seidl
General AI
The QBF Gallery 2023, the last QBF evaluation event, continues the tradition to survey and document the state of the art in solving quantified Boolean formulas (QBFs). It provides a detailed overview by collecting newly developed solvers and formulas as benchmarks. This report documents the solvers and formulas submitt…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.3
2026-04-20 · Rui Qian, Chuanhang Deng, Qiang Huang, Jian Xiong, Mingxuan Li, Yingbo Zhou, Wei Zhai, Jintao Chen, Dejing Dou
General AI
Reasoning segmentation requires models to ground complex, implicit textual queries into precise pixel-level masks. Existing approaches rely on a single segmentation token $\texttt{<SEG>}$, whose hidden state implicitly encodes both semantic reasoning and spatial localization, limiting the model's ability to explicitly …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-20 · A. Sophia Koepke, Daniil Zverev, Shiry Ginosar, Alexei A. Efros
General AI
The Platonic Representation Hypothesis suggests that neural networks trained on different modalities (e.g., text and images) align and eventually converge toward the same representation of reality. If true, this has significant implications for whether modality choice matters at all. We show that the experimental evide…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-20 · Tatsuki Kuribayashi, Alex Warstadt, Yohei Oseki, Ethan Gotlieb Wilcox
General AI
A recent study (Kuribayashi et al., 2025) has shown that human sentence processing behavior, typically measured on syntactically unchallenging constructions, can be effectively modeled using surprisal from early layers of large language models (LLMs). This raises the question of whether such advantages of internal laye…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-20 · Anda Cao, Zhuo Gou, Yi Wang, Kaixuan Chen, Yu Wang, Can Wang, Mingli Song, Jie Song
General AI
Merging multiple Low-Rank Adaptation (LoRA) experts into a single backbone is a promising approach for efficient multi-task deployment. While existing methods strive to alleviate interference via weight interpolation or subspace alignment, they rest upon the implicit assumption that all LoRA matrices contribute constru…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-20 · Haoyu Wu, Jiwen Yu, Yingtian Zou, Xihui Liu
General AI
Video world models have achieved remarkable success in simulating environmental dynamics in response to actions by users or agents. They are modeled as action-conditioned video generation models that take historical frames and current actions as input to predict future frames. Yet, most existing approaches are limited …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-20 · Wei Yao, Haohan Ma, Hongwen Zhang, Yunlian Sun, Liangjun Xing, Zhile Yang, Yuanjun Guo, Yebin Liu, Jinhui Tang
General AI
Controllable cooperative humanoid manipulation is a fundamental yet challenging problem for embodied intelligence, due to severe data scarcity, complexities in multi-agent coordination, and limited generalization across objects. In this paper, we present SynAgent, a unified framework that enables scalable and physicall…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-21 · Isaiah Thompson, Tanmay Sen, Ritwik Bhattacharya
General AI
Modern distributed systems generate massive volumes of log data that are critical for detecting anomalies and cyber threats. However, in real world settings, these logs are often distributed across multiple organizations and cannot be centralized due to privacy and security constraints. Existing log anomaly detection m…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-21 · Nikita Kister, Pradyumna YM, István Sárándi, Jiayi Wang, Anna Khoreva, Gerard Pons-Moll
General AI
Training embodied agents to understand 3D scenes as humans do requires large-scale data of people meaningfully interacting with diverse environments, yet such data is scarce. Real-world motion capture is costly and limited to controlled settings, while existing synthetic datasets rely on simple geometric heuristics tha…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-21 · Carles Navarro, Philipp Tholke, Gianni de Fabritiis
General AI
Structure-based drug discovery faces the dual challenge of accurately capturing 3D protein-ligand interactions while navigating ultra-large chemical spaces to identify synthetically accessible candidates. In this work, we present a unified framework that addresses these challenges by combining contrastive 3D structure …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-21 · Boyu Chen, Yi Chen, Lu Qiu, Jerry Bai, Yuying Ge, Yixiao Ge
General AI
Scaling humanoid foundation models is bottlenecked by the scarcity of robotic data. While massive egocentric human data offers a scalable alternative, bridging the cross-embodiment chasm remains a fundamental challenge due to kinematic mismatches. We introduce UniT (Unified Latent Action Tokenizer via Visual Anchoring)…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-22 · William Scarbro, Ravi Mangal
General AI
Autonomous systems that rely on learned perception can make unsafe decisions when sensor readings are misclassified. We study shielding for this setting: given a proposed action, a shield blocks actions that could violate safety. We consider the common case where system dynamics are known but perception uncertainty mus…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-22 · Mohammed Zeehan Saleheen, Markus Wagner, Reza Razzaghi, Hao Wang
General AI
Reliable operation is a central motivation for deploying renewable-based microgrids. This paper presents a systematic rapid review that positions reliability as the central organizing principle for microgrid design. Specifically, this review systematically synthesizes recent literature to examine how planning assumptio…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-23 · Hao-Yu Hsu, Tianhang Cheng, Jing Wen, Alexander G. Schwing, Shenlong Wang
General AI
Understanding human activities and their surrounding environments typically relies on visual perception, yet cameras pose persistent challenges in privacy, safety, energy efficiency, and scalability. We explore an alternative: 4D perception without vision. Its goal is to reconstruct human motion and 3D scene layouts pu…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-23 · Yanran Zhang, Wenzhao Zheng, Yifei Li, Bingyao Yu, Yu Zheng, Lei Chen, Jiwen Lu, Jie Zhou
General AI
In recent years, significant progress has been made in both image generation and generated image detection. Despite their rapid, yet largely independent, development, these two fields have evolved distinct architectural paradigms: the former predominantly relies on generative networks, while the latter favors discrimin…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-27 · Mufhumudzi Muthivhi, Terence L. van Zyl
General AI
There has been growing interest in studying the complexity of Rectified Linear Unit (ReLU) based activation networks. Recent work investigates the evolution of the number of piecewise-linear partitions (linear regions) that are formed during training. However, current research is limited to examining the complexity of …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.2
2026-04-29 · Frank Ginac
General AI
The integration of Large Language Models (LLMs) into the software development lifecycle (SDLC) masks a critical socio-technical failure: Cognitive-Systemic Collapse. This paper introduces "Epistemological Debt," the hidden carrying cost incurred when engineers substitute logical derivation with passive AI verification.…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-04-29 · Carol Hanna, Karine Even-Mendoza, W. B. Langdon, Mar Zamorano López, Justyna Petke, Federica Sarro
General AI
Despite the operational importance of hot fixes, large-scale evidence on how they reshape routine maintenance workflows, particularly in the era of autonomous coding agents, remains limited. We analyse hot fixes present in over 61,000 GitHub repositories from the Hao-Li/AIDev dataset and find consistent patterns of urg…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-04-29 · Junan Lin, Paul J. Goulart, Luca Furieri
General AI
The Alternating Direction Method of Multipliers (ADMM) is a widely used method for structured convex optimization, and its practical performance depends strongly on the choice of penalty and relaxation parameters. Motivated by settings such as Model Predictive Control (MPC), where one repeatedly solves related optimiza…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-04-29 · Joss Armstrong
General AI
Category-based coordination mechanisms allocate resources by mapping a declared service category to a fixed resource profile, without observing individual demand types. We establish three results for this class of mechanisms. First, the relative welfare gap Delta satisfies a tight two-sided bound in terms of the aggreg…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-04-29 · Michael Greinecke, Karolina Vocke
General AI
We study stability notions for networked many-to-many matching markets with individually insignificant agents in distributional form. Outcomes are formulated as joint distributions over characteristics of agents and contract choices. Characteristics can lie in an arbitrary Polish space. We provide a mechanical method f…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-04-29 · Felix Eder, Zeno Maesen, Yurii Skourski, Enrico Giannini, Oksana Zaharko, Fabian O. von Rohr
General AI
The layered delafossite-like antiferromagnet AgCrSe$_2$ is a superionic conductor at high temperatures and has been reported to exhibit anomalous Hall behavior and Kondo physics at low temperatures. These extraordinary transport properties have been established almost exclusively on single crystals grown by chemical va…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-04-29 · Partha Ghose
General AI
The widespread claim that violations of Bell inequalities establish the nonlocality of nature is critically reexamined. It is argued that this conclusion is not logically compelled by either the Einstein--Podolsky--Rosen (EPR) argument or Bell's theorem. The analysis highlights the central role of counterfactual reason…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-04-30 · Lautaro Giordano, Sebastian Gonçalves, José Roberto Iglesias, María Fabiana Laguna
General AI
We present a minimal agent-based model of interacting agents characterized by their wealth to study taxation and inequality in a non-conservative economy. Wealth evolves through an extremal stochastic replacement process in which the poorest agent has its wealth replaced by a new random value, financed through a collec…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-04-30 · Vishnuprasadh Kumaravelu, Sunil Gupta, P. K. Srijith
General AI
Exponential growth in the scale of modern foundation models has led to the widespread adoption of Low-Rank Adaptation (LoRA) as a parameter-efficient fine-tuning technique. However, standard LoRA implementations disregard the varying intrinsic dimensionality of model layers and enforce a uniform rank, leading to parame…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-05-01 · Zizhong Yan, Jingrong Li, Yi Zhang
General AI
Estimating network formation models with degree heterogeneity raises two problems in empirical networks. First, agents that send no links, receive no links, or link to all remaining agents can make the fixed-effects MLE fail to exist. Trimming these agents changes the estimation sample and induces selection bias. Secon…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-05-01 · George Stoica, Sayak Paul, Matthew Wallingford, Vivek Ramanujan, Abhay Nori, Winson Han, Ali Farhadi, Ranjay Krishna, Judy Hoffman
General AI
Flow matching (FM) trains a time-dependent vector field that transports samples from a simple prior to a complex data distribution. However, for high-dimensional images, each training sample supervises only a single trajectory and intermediate point, yielding an extremely sparse and high-variance training signal. This …
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-05-01 · Laurent Hébert-Dufresne, Antoine Allard, Jean-Gabriel Young, William H. W. Thompson, Guillaume St-Onge
General AI
Complex contagions describe systems where the probability or rate of contagious transmission is a nonlinear function of the exposure to contagious agents. These models were first studied theoretically but have since been used to capture effects such as nonconformism, social reinforcement or peer pressure in empirical d…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-05-01 · Jingxi Pu, Tonghua Liu, Zhilin Guan, Siqiao Li, Yang Ming, Zheng Cong, Wei Zhang, Fangwei Li
General AI
With the development of deep learning, medical image processing has been widely used to assist clinical research. This paper focuses on the denoising problem of low-dose computed tomography using deep learning. Although low-dose computed tomography reduces radiation exposure to patients, it also introduces more noise, …
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-05-04 · Xinyang Wang
General AI
We show that a large effective number of commodities can be a source of equilibrium stability and uniqueness: expanding substitution opportunities strengthens aggregate substitution effects. We study finite dated-commodity exchange economies obtained by truncating a countably infinite-horizon environment with discounte…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-05-04 · Yadi Wen, Tianxin Li, Enji Liang, Rong Du, Yue Fu
General AI
We study example-level private supervised speech classification under a practical release constraint: training may access privileged side information, but the released model must be audio-only. This setting is important because speech systems can often exploit richer side information during development, whereas deploym…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-05-04 · Zalán Gyenis, Miklós Rédei, Leszek Wroński
General AI
In a recent paper \cite{Redei-Jing2026} the notion of conditional $p$-inaccessibility of a decision based on utility maximization was defined and examples of conditionally $p$-inaccessible decisions were given. The conditional inaccessibility of a decision based on maximizing utility calculated by a probability measure…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-05-04 · Haorui Li, Zhenghui He, Xuanzi Liu, Yang Xu, Dongsheng Liu, Jiakang Ma, Lupan Wu, Yangjie Wu, Xiongchao Tang, Tianhui Shi
General AI
Open-weight large language models (LLMs) are often described as downloadable model artifacts, but in production they are increasingly consumed as hosted APIs. This paper studies the intermediary service layer that turns a model release into an operational endpoint. Using sampled request logs, provider metadata, compati…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-05-05 · Stephen Price, Kyle Miller, Marco Musto, Kenneth Kroenlein, James Saal, Kyle Tsaknopoulos, Elke A. Rundensteiner, Danielle L. Cote
General AI
Cold spraying is an increasingly common approach for repairing and manufacturing components due to its solid-state manufacturing capabilities. However, process optimization remains difficult due to many interdependent parameters and the lack of large-scale, machine-readable data to support modeling. While the scientifi…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 5.0
2026-03-28 · Haoyu He, Yue Zhuo, Yu Zheng, Qi R. Wang
General AI
Vision-language models (VLMs) achieve strong multimodal performance, yet how computation is organized across populations of neurons remains poorly understood. In this work, we study VLMs through the lens of neural topology, representing each layer as a within-layer correlation graph derived from neuron-neuron co-activa…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 5.0
2026-03-29 · Yue Huang, Yu Jiang, Wenjie Wang, Haomin Zhuang, Xiaonan Luo, Yuchen Ma, Zhangchen Xu, Zichen Chen, Nuno Moniz, Zinan Lin, Pin-Yu Chen, Nitesh V Chawla, Nouha Dziri, Huan Sun, Xiangliang Zhang
General AI
Multi-agent systems composed of large generative models are rapidly moving from laboratory prototypes to real-world deployments, where they jointly plan, negotiate, and allocate shared resources to solve complex tasks. While such systems promise unprecedented scalability and autonomy, their collective interaction also …
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 5.0
2026-03-30 · Tianle Zeng, Hanxuan Chen, Yanci Wen, Hong Zhang
General AI
The convergence of low-altitude economies, embodied intelligence, and air-ground cooperative systems creates growing demand for simulation infrastructure capable of jointly modeling aerial and ground agents within a single physically coherent environment. Existing open-source platforms remain domain-segregated: driving…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 5.0
2026-03-30 · Haiyue Song, Masao Utiyama
General AI
Continual pre-training is widely used to adapt LLMs to target languages and domains, yet the mixture ratio of training data remains a sensitive hyperparameter that is expensive to tune: they must be fixed before training begins, and a suboptimal choice can waste weeks of compute. In this work, we propose OptiMer, which…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 5.0
2026-04-06 · Asiri Dalugoda
General AI
Agentic AI systems increasingly execute consequential actions on behalf of human principals, delegating tasks through multi-step chains of autonomous agents. No existing standard addresses a fundamental accountability gap: verifying that terminal actions in a delegation chain were genuinely authorized by a human princi…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.0
2026-04-07 · Changxin Ke, Rui Zhang, Jiaming Guo, Yuanbo Wen, Li Ding, Shuo Wang, Xuyuan Zhu, Xiong Peng, Di Huang, Zidong Du, Xing Hu, Qi Guo, Yunji Chen
General AI
Large Language Models (LLMs) achieve strong program repair performance but often suffer from over-editing, where excessive modifications overwrite correct code and hinder bug localization. We systematically quantify its impact and introduce precise repair task, which maximizes reuse of correct code while fixing only bu…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.0
2026-04-27 · Xinxin Liu, Ming Li, Zonglin Lyu, Yuzhang Shang, Chen Chen
General AI
Human visual preferences are inherently multi-dimensional, encompassing aesthetics, detail fidelity, and semantic alignment. However, existing datasets provide only single, holistic annotations, resulting in severe label noise: images that excel in some dimensions but are deficient in others are simply marked as winner…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 5.0
2026-04-27 · Laki Iinbor, Zhiyang Dou, Wojciech Matusik
General AI
We introduce Soft Anisotropic Diagrams (SAD), an explicit and differentiable image representation parameterized by a set of adaptive sites in the image plane. In SAD, each site specifies an anisotropic metric and an additively weighted distance score, and we compute pixel colors as a softmax blend over a small per-pixe…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.8
2026-03-22 · Shih-Wen Liu, Yen-Chang Chen, Wei-Ta Chu, Fu-En Yang, Yu-Chiang Frank Wang
General AI
Multi-task learning (MTL) aims to enable a single model to solve multiple tasks efficiently; however, current parameter-efficient fine-tuning (PEFT) methods remain largely limited to single-task adaptation. We introduce \textbf{Free Sinewich}, a parameter-efficient multi-task learning framework that enables near-zero-c…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 4.8
2026-03-26 · Chengshuai Yang
General AI
Designing a computational imaging system -- selecting operators, setting parameters, validating consistency -- requires weeks of specialist effort per modality, creating an expertise bottleneck that excludes the broader scientific community from prototyping imaging instruments. We introduce spec.md, a structured specif…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.8
2026-03-26 · Yannick Roy
General AI
Code production is now a commodity; the bottleneck is knowing what to build and proving it works. We present the Kitchen Loop, a framework for autonomous, self-evolving software built on a unified trust model: (1) a specification surface enumerating what the product claims to support; (2) 'As a User x 1000', where an L…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.8
2026-03-27 · Farhan Fuad Abir, Sanjeda Sara Jennifer, Niloofar Yousefi, Laura J. Brattain
General AI
We propose a hybrid diffusion-based augmentation framework to overcome the critical challenge of ultrasound data augmentation in breast ultrasound (BUS) datasets. Unlike conventional diffusion-based augmentations, our approach improves visual fidelity and preserves ultrasound texture by combining text-to-image generati…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.8
2026-03-30 · Shuang Zhou, Kai Yu, Zaifu Zhan, Huixue Zhou, Min Zeng, Feng Xie, Zhiyi Sha, Rui Zhang
General AI
Epilepsy and psychogenic non-epileptic seizures often present with similar seizure-like manifestations but require fundamentally different management strategies. Misdiagnosis is common and can lead to prolonged diagnostic delays, unnecessary treatments, and substantial patient morbidity. Although prolonged video-electr…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.8
2026-03-30 · N Alex Cayco Gajic, Arthur Pellegrino
General AI
Similarity measures are widely used to interpret the representational geometries used by neural networks to solve tasks. Yet, because existing methods compare the extrinsic geometry of representations in state space, rather than their intrinsic geometry, they may fail to capture subtle yet crucial distinctions between …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.8
2026-03-30 · Conrad Borchers, Valdemar Švábenský, Sandesh K. Kafle, Kevin K. Tang, Jan Vykopal
General AI
Instructional alignment, the match between intended cognition and enacted activity, is central to effective instruction but hard to operationalize at scale. We examine alignment in cybersecurity simulations using multimodal traces from 23 teams (76 students) across five exercise sessions. Study 1 codes objectives and t…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.8
2026-03-30 · Liliang Ren, Yang Liu, Yelong Shen, Weizhu Chen
General AI
Scaling laws for large language models depend critically on the optimizer and parameterization. Existing hyperparameter transfer laws are mainly developed for first-order optimizers, and they do not structurally prevent training instability at scale. Recent hypersphere optimization methods constrain weight matrices to …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.8
2026-03-31 · Badhan Mazumder, Sir-Lord Wiafe, Aline Kotoski, Vince D. Calhoun, Dong Hye Ye
General AI
Understanding how brain structure and function interact is key to explaining intelligence yet modeling them jointly is challenging as the structural and functional connectome capture complementary aspects of organization. We introduced Multi-scale Adaptive Graph Network (MAGNet), a Transformer-style graph neural networ…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.8
2026-04-02 · Junxuan Li, Rawal Khirodkar, Chengan He, Zhongshi Jiang, Giljoo Nam, Lingchen Yang, Jihyun Lee, Egor Zakharov, Zhaoen Su, Rinat Abdrashitov, Yuan Dong, Julieta Martinez, Kai Li, Qingyang Tan, Takaaki Shiratori, Matthew Hu, Peihong Guo, Xuhua Huang, Ariyan Zarei, Marco Pesavento, Yichen Xu, He Wen, Teng Deng, Wyatt Borsos, Anjali Thakrar, Jean-Charles Bazin, Carsten Stoll, Ginés Hidalgo, James Booth, Lucy Wang, Xiaowen Ma, Yu Rong, Sairanjith Thalanki, Chen Cao, Christian Häne, Abhishek Kar, Sofien Bouaziz, Jason Saragih, Yaser Sheikh, Shunsuke Saito
General AI
High-quality 3D avatar modeling faces a critical trade-off between fidelity and generalization. On the one hand, multi-view studio data enables high-fidelity modeling of humans with precise control over expressions and poses, but it struggles to generalize to real-world data due to limited scale and the domain gap betw…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.8
2026-04-02 · Naomi Kombol, Ivan Martinović, Siniša Šegvić, Giorgos Tolias
General AI
Foundational Vision Transformers (ViTs) have limited effectiveness in tasks requiring fine-grained spatial understanding, due to their fixed pre-training resolution and inherently coarse patch-level representations. These challenges are especially pronounced in dense prediction scenarios, such as open-vocabulary segmen…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.8
2026-04-02 · Andrew Ang, Nazym Azimbayev, Andrey Kim
General AI
Agentic AI shifts the investor's role from analytical execution to oversight. We present an agentic strategic asset allocation pipeline in which approximately 50 specialized agents produce capital market assumptions, construct portfolios using over 20 competing methods, and critique and vote on each other's output. A r…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.8
2026-04-03 · Bin Liu, Zhixiang Xiong, Zhifen He, Bo Li
General AI
Speech-driven three-dimensional (3D) facial animation synthesis aims to build a mapping from one-dimensional (1D) speech signals to time-varying 3D facial motion signals. Current methods still face challenges in maintaining lip-sync accuracy and producing realistic facial expressions, primarily due to the highly ill-po…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.8
2026-04-06 · Connor Dilgren, Sarah Wiegreffe
General AI
Latent reasoning models (LRMs) have attracted significant research interest due to their low inference cost (relative to explicit reasoning models) and theoretical ability to explore multiple reasoning paths in parallel. However, these benefits come at the cost of reduced interpretability: LRMs are difficult to monitor…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.8
2026-04-06 · Yang Li, Qiang Sheng, Zhengjia Wang, Yehan Yang, Danding Wang, Juan Cao
General AI
The misuse of large language models (LLMs) requires precise detection of synthetic text. Existing works mainly follow binary or ternary classification settings, which can only distinguish pure human/LLM text or collaborative text at best. This remains insufficient for the nuanced regulation, as the LLM-polished human t…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.8
2026-04-06 · Vadim Vashkelis, Natalia Trukhina
General AI
Mixture-of-Experts (MoE) architectures enable conditional computation by activating only a subset of model parameters for each input. Although sparse routing has been highly effective in language models and has also shown promise in vision, most vision MoE methods operate at the image or patch level. This granularity i…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.8
2026-04-06 · Sudarshan Rajagopalan, Vishal M. Patel
General AI
Pre-trained diffusion models have enabled significant advancements in All-in-One Restoration (AiOR), offering improved perceptual quality and generalization. However, diffusion-based restoration methods primarily rely on fine-tuning or Control-Net style modules to leverage the pre-trained diffusion model's priors for A…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.8
2026-04-07 · Maissam Barkeshli, Michael R. Douglas, Michael H. Freedman
General AI
Recent progress in artificial intelligence (AI) is unlocking transformative capabilities for mathematics. There is great hope that AI will help solve major open problems and autonomously discover new mathematical concepts. In this essay, we further consider how AI may open a grand perspective on mathematics by forging …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.8
2026-04-07 · Yasmeen Saeed, Ahmed Sharshar, Mohsen Guizani
General AI
Detecting cyberattacks in photovoltaic (PV) monitoring and MPPT control signals requires models that are robust to bias, drift, and transient spikes, yet lightweight enough for resource-constrained edge controllers. While deep learning outperforms traditional physics-based diagnostics and handcrafted features, standard…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.8
2026-04-07 · Zhengming Yu, Li Ma, Mingming He, Leo Isikdogan, Yuancheng Xu, Dmitriy Smirnov, Pablo Salamanca, Dao Mi, Pablo Delgado, Ning Yu, Julien Philip, Xin Li, Wenping Wang, Paul Debevec
General AI
Most digital videos are stored in 8-bit low dynamic range (LDR) formats, where much of the original high dynamic range (HDR) scene radiance is lost due to saturation and quantization. This loss of highlight and shadow detail precludes mapping accurate luminance to HDR displays and limits meaningful re-exposure in post-…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.8
2026-04-07 · Hamed Jelodar, Samita Bai, Tochukwu Emmanuel Nwankwo, Parisa Hamedi, Mohammad Meymani, Roozbeh Razavi-Far, Ali A. Ghorbani
General AI
Code decompilation analysis is a fundamental yet challenging task in malware reverse engineering, particularly due to the pervasive use of sophisticated obfuscation techniques. Although recent large language models (LLMs) have shown promise in translating low-level representations into high-level source code, most exis…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.8
2026-04-09 · Jiayuan Ye, Vitaly Feldman, Kunal Talwar
General AI
Large language models (LLMs) can struggle to memorize factual knowledge in their parameters, often leading to hallucinations and poor performance on knowledge-intensive tasks. In this paper, we formalize fact memorization from an information-theoretic perspective and study how training data distributions affect fact ac…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.8
2026-04-27 · Jun Li, Mingxuan Liu, Jiazhen Pan, Che Liu, Wenjia Bai, Cosmin I. Bercea, Julia A. Schnabel
General AI
Clinical abnormality grounding for rare diseases is often hindered by data scarcity, making supervised fine-tuning impractical and single-pass inference highly unstable. We propose Dynamic Decision Learning (DDL), a framework that enables frozen large vision-language models (LVLMs) to refine their decisions across both…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.8
2026-04-28 · Xiaodong Li, Jiawei Sheng, Jiangxia Cao, Xinghua Zhang, Wenyuan Zhang, Yong Sun, Shirui Pan, Zhihong Tian, Tingwen Liu
General AI
Cross-domain recommendation (CDR) has demonstrated to be an effective solution for alleviating the user cold-start issue. By leveraging rich user-item interactions available in a richly informative source domain, CDR could improve the recommendation performance for cold-start users in the target domain. Previous CDR ap…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.8
2026-04-28 · Shan Yu, Junyi Shu, Yuanjiang Ni, Kun Qian, Xue Li, Yang Wang, Jinyuan Zhang, Ziyi Xu, Shuo Yang, Lingjun Zhu, Ennan Zhai, Qingda Lu, Jiarong Xing, Youyou Lu, Xin Jin, Xuanzhe Liu, Harry Xu
General AI
As LLM applications grow more complex, developers are increasingly adopting multi-agent architectures to decompose workflows into specialized, collaborative components, introducing structure that constrains agent behavior and exposes useful semantic predictability. Unlike traditional LLM serving, which operates under h…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.8
2026-04-28 · Felipe Arnholda, Flavio Rocha, Lucio Prade, Cristiano Bonato Both
General AI
Network Slice as a Service (NSaaS) is a key enabler of Beyond Fifth Generation (5G) and Sixth Generation (6G) networks, supporting next-generation applications such as extended reality (XR), immersive services, and the tactile Internet. These networks must provide native support for slice-aware services across the enti…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.8
2026-04-28 · Sicheng Dai, Kai Chen, Hongwang Xiao, Shan Yu, Qiwei Ye
General AI
Recent self-supervised pre-training methods for electroencephalogram (EEG) have shown promising results. However, the pre-trained models typically require full fine-tuning on each downstream task individually to achieve good performance. In practical applications involving multiple tasks, utilizing a separate model for…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.5
2026-04-11 · Kunho Kim, Sumin Seo, Yongjun Cho, Hyungjin Chung
General AI
We propose EditCrafter, a high-resolution image editing method that operates without tuning, leveraging pretrained text-to-image (T2I) diffusion models to process images at resolutions significantly exceeding those used during training. Leveraging the generative priors of large-scale T2I diffusion models enables the de…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.5
2026-04-11 · Ivan Sedykh, Nikita Sorokin, Valentin Malykh
General AI
Recent advances in masked diffusion language models (MDLMs) narrow the quality gap to autoregressive LMs, but their sampling remains expensive because generation requires many full-sequence denoising passes with a large Transformer and, unlike autoregressive decoding, cannot benefit from KV caching. In this work, we ex…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.5
2026-04-11 · Gordon Chen, Ziqi Huang, Ziwei Liu
General AI
Video diffusion models have achieved remarkable progress in generating high-quality videos. However, these models struggle to represent the temporal succession of multiple events in real-world videos and lack explicit mechanisms to control when semantic concepts appear, how long they persist, and the order in which mul…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.5
2026-04-11 · Khai Loong Aw, Klemen Kotar, Wanhee Lee, Seungwoo Kim, Khaled Jedoui, Rahul Venkatesh, Lilian Naing Chen, Michael C. Frank, Daniel L. K. Yamins
General AI
Young children demonstrate early abilities to understand their physical world, estimating depth, motion, object coherence, interactions, and many other aspects of physical scene understanding. Children are both data-efficient and flexible cognitive systems, creating competence despite extremely limited training data, w…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.5
2026-04-14 · Liran Ringel, Yaniv Romano
General AI
Speculative decoding accelerates autoregressive language models by using a lightweight drafter to propose multiple future tokens, which the target model then verifies in parallel. DFlash shows that a block diffusion drafter can generate an entire draft block in a single forward pass and achieve state-of-the-art specula…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.5
2026-04-16 · Noor Islam S. Mohammad
General AI
Federated learning (FL) enables collaborative intrusion detection without raw data exchange, but conventional FL incurs high communication overhead from full-precision gradient transmission and remains vulnerable to gradient inference attacks. This paper presents EdgeDetect, a communication-efficient and privacy-aware …
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.5
2026-04-16 · Adam Rida
General AI
Every call to an LLM classification endpoint produces a labeled input-output pair already retained in production logs. These pairs constitute a free, growing training set: a lightweight surrogate trained on them can absorb a significant portion of future traffic at near-zero marginal inference cost. The open questions …
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.5
2026-04-17 · Yuval Haitman, Amit Efraim, Joseph M. Francos
General AI
We introduce C-GenReg, a training-free framework for 3D point cloud registration that leverages the complementary strengths of world-scale generative priors and registration-oriented Vision Foundation Models (VFMs). Current learning-based 3D point cloud registration methods struggle to generalize across sensing modalit…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.5
2026-04-17 · Zijun Wang, Haoqin Tu, Weidong Zhou, Yiyang Zhou, Xiaohuan Zhou, Bingni Zhang, Weiguo Feng, Taifeng Wang, Cihang Xie, Fengze Liu
General AI
Everyday tasks come with a target, and pretraining models around this target is what turns them into experts. In this paper, we study target-oriented language model (LM) pretraining by introducing Neuron-Activated Graph Ranking (NAG-based Ranking), a training-free and interpretable framework for target pretraining data…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.5
2026-04-19 · Kadir-Kaan Özer, René Ebeling, Markus Enzweiler
General AI
We introduce JuRe (Just Repair), a minimal denoising network for time series anomaly detection that exposes a central finding: architectural complexity is unnecessary when the training objective correctly implements the manifold-projection principle. JuRe consists of a single depthwise-separable convolutional residual …
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.5
2026-04-19 · Theodoros Kouzelis, Spyros Gidaris, Nikos Komodakis
General AI
Joint image-feature generative modeling has recently emerged as an effective strategy for improving diffusion training by coupling low-level VAE latents with high-level semantic features extracted from pre-trained visual encoders. However, existing approaches rely on a fixed representation space, constructed independen…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.5
2026-04-21 · Zhengwentai Sun, Keru Zheng, Chenghong Li, Hongjie Liao, Xihe Yang, Heyuan Li, Yihao Zhi, Shuliang Ning, Shuguang Cui, Xiaoguang Han
General AI
Human video generation remains challenging due to the difficulty of jointly modeling human appearance, motion, and camera viewpoint under limited multi-view data. Existing methods often address these factors separately, resulting in limited controllability or reduced visual quality. We revisit this problem from an imag…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.5
2026-04-21 · Faisal Alherran
General AI
Despite growing interest in Quranic data research, existing Quran datasets remain limited in both scale and diversity. To address this gap, we present Tadabur, a large-scale Quran audio dataset. Tadabur comprises more than 1400+ hours of recitation audio from over 600 distinct reciters, providing substantial variation …
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.5
2026-04-22 · Chuanyu Qin, Chenxu Yang, Qingyi Si, Naibin Gu, Dingyu Yao, Zheng Lin, Peng Fu, Nan Duan, Jiaqi Wang
General AI
Reinforcement learning with verifiable rewards (RLVR) has become a core post-training recipe. Introducing suitable off-policy trajectories into on-policy exploration accelerates RLVR convergence and raises the performance ceiling, yet finding a source of such trajectories remains the key challenge. Existing mixed-polic…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.5
2026-04-23 · Shiyan Su, Ruyi Zha, Danli Shi, Hongdong Li, Xuelian Cheng
General AI
Neural representations (NRs), such as neural fields and 3D Gaussians, effectively model volumetric data in computed tomography (CT) but suffer from severe artifacts under sparse-view settings. To address this, we propose DiffNR, a novel framework that enhances NR optimization with diffusion priors. At its core is Slice…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-11 · Qian Gao, Ruikang Zhong, Yuanwei Liu
General AI
Segmented pinching antenna assisted integrated sensing and communication (ISAC) systems enable flexible spatial resource utilization by allowing different waveguide segments to be dynamically configured for transmission and reception. However, the resulting design requires the joint optimization of antenna deployment, …
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-13 · Xiaoting Wei, Lele Kang, Xuelian Pan, Jiannan Yang
General AI
The rapid growth of open-source large language models (LLMs) has created a complex ecosystem of model inheritance and reuse. However, existing research has focused mainly on descriptive analyses of lineage evolution, with limited attention to identifying which models play a disruptive role in shaping subsequent develop…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-13 · Younghwan Cho, Richard Sowers
General AI
Koopman operator theory is a key tool in data assimilation of complex dynamical systems, with the potential to be applied to multimodal data. We formulate the problem of learning Koopman eigenfunctions from observations at arbitrary, possibly non-vanishing, time intervals as an optimization problem. Analysis of the for…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-13 · Ricardo Bessa, Rui Claro, João Trindade, João Lourenço
General AI
Large Language Models (LLMs) are redefining offensive cybersecurity by allowing the generation of harmful machine code with minimal human intervention. While attackers take advantage of dark LLMs such as XXXGPT and WolfGPT to produce malicious code, ethical hackers can follow similar approaches to automate traditional …
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-14 · Baris Sarper Tezcan, Hrishikesh Viswanath, Rubab Saher, Daniel Aliaga
General AI
Urban areas are increasingly vulnerable to thermal extremes driven by rapid urbanization and climate change. Traditionally, thermal extremes have been monitored using Earth-observing satellites and numerical modeling frameworks. For example, land surface temperature derived from Landsat or Sentinel imagery is commonly …
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-14 · Jonan Richards
General AI
Large Language Models (LLMs) have shown much promise in powering a variety of software engineering (SE) tools. Offering natural language as an intuitive interaction mechanism, LLMs have recently been employed as conversational ``programming assistants'' capable of supporting several SE activities simultaneously. As wit…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-15 · Jiun Tian Hoe, Weipeng Hu, Xudong Jiang, Yap-Peng Tan, Chee Seng Chan
General AI
Human-Object Interaction (HOI) modelling captures how humans act upon and relate to objects, typically expressed as <person, action, object> triplets. Existing approaches split into two disjoint families: HOI generation synthesises scenes from structured triplets and layout, but fails to integrate mixed conditions like…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-16 · Yury Gorishniy, Ivan Rubachev, Dmitrii Feoktistov, Artem Babenko
General AI
MLP is a heavily used backbone in modern deep learning (DL) architectures for supervised learning on tabular data, and AdamW is the go-to optimizer used to train tabular DL models. Unlike architecture design, however, the choice of optimizer for tabular DL has not been examined systematically, despite new optimizers sh…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-16 · Alex A. T. Rathke
General AI
We show that a rational agent with true and refinable knowledge of events cannot know if she knows everything or not. This epistemic limitation is not resolved by introspection about tautologies or by learning about new events.
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-16 · Zhanhao Liang, Tao Yang, Jie Wu, Chengjian Feng, Liang Zheng
General AI
This paper focuses on the alignment of flow matching models with human preferences. A promising way is fine-tuning by directly backpropagating reward gradients through the differentiable generation process of flow matching. However, backpropagating through long trajectories results in prohibitive memory costs and gradi…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-16 · Eden Frenkel, Kenneth L. McMillan, Oded Padon, Sharon Shoham
General AI
We propose an incremental approach for safety proofs that decomposes a proof with a complex inductive invariant into a sequence of simpler proof steps. Our proof system combines rules for (i) forward reasoning using inductive invariants, (ii) backward reasoning using inductive invariants of a time-reversed system, and …
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-17 · Florian Furbach, Lucas Clorius, Roland Kuhn, Hernán Melgratti, Alceste Scalas, Emilio Tuosto
General AI
Swarm protocols are a recently introduced formalism for specifying, implementing, and verifying peer-to-peer systems called swarms. A swarm consists of distributed agents called machines that communicate by asynchronous event propagation. Following a local-first model, each machine can progress without requiring contin…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-17 · Sean Hill, Felix X. -F. Ye
General AI
Stochastic dynamical systems with slow or metastable behavior evolve, on long time scales, on an unknown low-dimensional manifold in high-dimensional ambient space. Building a reduced simulator from short-burst ambient ensembles is a long-standing problem: local-chart methods like ATLAS suffer from exponential landmark…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-17 · Kyunghoo Mun, Matthew Rosenzweig
General AI
We study phase transitions for repulsive-attractive mean-field free energies on the circle. For a $\frac{1}{n+1}$-periodic interaction whose Fourier coefficients satisfy a certain decay condition, we prove that the critical coupling strength $K_c$ coincides with the linear stability threshold $K_\#$ of the uniform dist…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-17 · Shaoqing Liu, Mushuang Liu
General AI
Computational complexity has been a major challenge in game-theoretic model predictive control (GT-MPC), as real-time solutions to a game (e.g., Nash equilibria (NEs)) have to be computed at each sampling instant of an MPC. This challenge is especially critical in autonomous driving, where interactions may involve many…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-22 · Mikko Lempinen, Joni Kemppainen, Niklas Raesalmi
General AI
As artificial intelligence (AI) systems are increasingly deployed across critical domains, their security vulnerabilities pose growing risks of high-profile exploits and consequential system failures. Yet systematic approaches to evaluating AI security remain underdeveloped. In this paper, we introduce AVISE (AI Vulner…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-22 · Mattias Ehatamm, Peter Nelson, Fernanda Rivera Omana
General AI
We generalize the well-studied notion of a modular pair of a finite matroid to arbitrary families of sets in infinite matroids, and use it to develop the theory of infinite matroids in several as-yet-unexplored areas. Our results include a complete theory of single-element extensions, a description of the relationship …
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-23 · Deepank Girish, Yi Hao Chan, Sukrit Gupta, Jing Xia, Jagath C. Rajapakse
General AI
Several brain foundation models (FM) have recently been proposed to predict brain disorders by modelling dynamic functional connectivity (FC). While they demonstrate remarkable model performance and zero- or few-shot generalization, the salient features identified as potential biomarkers are yet to be thoroughly evalua…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-23 · Bingcong Li, Yilang Zhang, Georgios B. Giannakis
General AI
Low-rank adaptation (LoRA) has emerged as the de facto standard for parameter-efficient fine-tuning (PEFT) of foundation models, enabling the adaptation of billion-parameter networks with minimal computational and memory overhead. Despite its empirical success and rapid proliferation of variants, it remains elusive whi…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-23 · Kippei Mizuta, Shoichiro Tanaka, Shuhei Tanaka, Toshiharu Hatanaka
General AI
Local Optima Networks (LONs) represent the global structure of search spaces as graphs, but their construction requires iterative execution of a search algorithm to find local optima and approximate transitions between Basins of Attraction (BoAs). In continuous optimization, this high computational cost prevents system…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-24 · Yulin Liu
General AI
This study presents a structured dataset of blockchain-registered artificial intelligence agents under the ERC-8004 standard on Ethereum. The dataset integrates on-chain identity records, minting transactions, transfer events, reputation summaries, and individual feedback records, together with resolved off-chain metad…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-24 · Calvin Tsay
General AI
ReLU neural networks trained as surrogate models can be embedded exactly in mixed-integer linear programs (MILPs), enabling global optimization over the learned function. The tractability of the resulting MILP depends on structural properties of the network, i.e., the number of binary variables in associated formulatio…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-24 · Sijie Li, Shanda Li, Haowei Lin, Weiwei Sun, Ameet Talwalkar, Yiming Yang
General AI
Scaling laws are used to plan multi-million-dollar training runs, but fitting those laws can itself cost millions. In modern large-scale workflows, assembling a sufficiently informative set of pilot experiments is already a major budget-allocation problem rather than a routine preprocessing step. We formulate scaling-l…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-24 · Ariel Yuhan Ong, Iain Livingstone, Caroline Kilduff, Mertcan Sevgi, David A Merle, Eden Ruffell, Pearse A Keane, Fares Antaki
General AI
Clinicians often face workflow problems that are perceived as either too bespoke or low stakes to attract commercial attention. Historically, most do not have the technical knowledge to address these problems, but the recent emergence of "vibe coding" presents a transformative opportunity. Vibe coding refers to the co-…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-25 · Giorgio Cruciata, Luca Cruciata, Liliana Lo Presti, Jan Van Gemert, Marco La Cascia
General AI
This paper proposes a new method to improve the training efficiency of deep convolutional neural networks. During training, the method evaluates scores to measure how much each layer's parameters change and whether the layer will continue learning or not. Based on these scores, the network is scaled down such that the …
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-27 · Marco A. Gallegos-Herrada, Vianey Leos-Barajas, Jeffrey S. Rosenthal
General AI
Bayesian inference in hidden Markov models (HMMs) can be challenging due to the presence of multimodality in the likelihood function, and consequently in the joint posterior distribution, even after correcting for label switching. The parallel tempering (PT) algorithm, a state-space augmentation method, is a widely use…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-27 · Shiliang Zuo
General AI
Linear contracts are ubiquitous in practice, yet optimal contract theory often prescribes complex, nonlinear structures. We provide a distributional robustness justification for linear contracts. We study a principal-agent problem where the agent exerts costly effort across multiple tasks, generating a stochastic signa…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-27 · Griffin Pitts, Muntasir Hoq, Peter Brusilovsky, Narges Norouzi, Arto Hellas, Juho Leinonen, Bita Akram
General AI
Adaptive programming practice often relies on fixed libraries of worked examples and practice problems, which require substantial authoring effort and may not correspond well to the logical errors and partial solutions students produce while writing code. As a result, students may receive learning content that does not…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-04-27 · Senthil Rajasekaran, Jean-François Raskin, Moshe Y. Vardi
General AI
As part of an effort to apply the rigorous guarantees of formal verification to multi-agent systems, the field of equilibrium analysis, also called rational verification, studies equilibria in multiplayer games to reason about system-level properties such as safety and scalability. While most prior work focuses on dete…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.0
2026-03-15 · Jaeyo Shin, Jiwook Kim, Hyunjung Shim
General AI
Representation Alignment (REPA) has emerged as a simple way to accelerate Diffusion Transformers training in latent space. At the same time, pixel-space diffusion transformers such as Just image Transformers (JiT) have attracted growing attention because they remove a dependency on a pretrained tokenizer, and then avoi…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.0
2026-03-25 · Danil Tokhchukov, Aysel Mirzoeva, Andrey Kuznetsov, Konstantin Sobolev
General AI
In this paper, we uncover the hidden potential of Diffusion Transformers (DiTs) to significantly enhance generative tasks. Through an in-depth analysis of the denoising process, we demonstrate that introducing a single learned scaling parameter can significantly improve the performance of DiT blocks. Building on this i…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.0
2026-03-25 · Yihan Wang, Jia Deng
General AI
We introduce WAFT-Stereo, a simple and effective warping-based method for stereo matching. WAFT-Stereo demonstrates that cost volumes, a common design used in many leading methods, are not necessary for strong performance and can be replaced by warping with improved efficiency. WAFT-Stereo ranks first on ETH3D, KITTI a…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.0
2026-03-26 · Niccolò Cavagnero, Narges Norouzi, Gijs Dubbelman, Daan de Geus
General AI
Vision Foundation Models (VFMs) pre-trained at scale enable a single frozen encoder to serve multiple downstream tasks simultaneously. Recent VFM-based encoder-only models for image and video segmentation, such as EoMT and VidEoMT, achieve competitive accuracy with remarkably low latency, yet they require finetuning th…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.0
2026-03-26 · Ligong Han, Hao Wang, Han Gao, Kai Xu, Akash Srivastava
General AI
Block-diffusion language models offer a promising path toward faster-than-autoregressive generation by combining block-wise autoregressive decoding with within-block parallel denoising. However, in the few-step regime needed for practical acceleration, standard confidence-thresholded decoding is often brittle: aggressi…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.0
2026-03-27 · Ruixing Zhang, Hanzhang Jiang, Leilei Sun, Liangzhe Han, Jibin Wang, Weifeng Lv
General AI
Mobile devices continuously interact with cellular base stations, generating massive volumes of signaling records that provide broad coverage for understanding human mobility. However, such records offer only coarse location cues (e.g., serving-cell identifiers) and therefore limit their direct use in applications that…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.0
2026-03-30 · Yufei Xu, Fanxu Meng, Fan Jiang, Yuxuan Wang, Ruijie Zhou, Jiexi Wu, Zhixin Pan, Zhaohui Wang, Xiaojuan Tang, Wenjie Pei, Tongxuan Liu, Di yin, Xing Sun, Muhan Zhang
General AI
Token-level sparse attention mechanisms, exemplified by DeepSeek Sparse Attention (DSA), achieve fine-grained key selection by scoring every historical token for each query using a lightweight indexer, and then computing attention only over the selected subset. While the downstream sparse attention scales efficiently, …
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.0
2026-04-04 · Pardis Taghavi, Tian Liu, Renjie Li, Reza Langari, Zhengzhong Tu
General AI
Foundation models deliver strong perception but are often too computationally heavy to deploy, and adapting them typically requires costly annotations. We introduce a semi-supervised knowledge distillation (SSKD) framework that compresses pre-trained vision foundation models (VFMs) into compact experts using limited la…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.0
2026-04-06 · DataFlow Team, Bohan Zeng, Daili Hua, Kaixin Zhu, Yifan Dai, Bozhou Li, Yuran Wang, Chengzhuo Tong, Yifan Yang, Mingkun Chang, Jianbin Zhao, Zhou Liu, Hao Liang, Xiaochen Ma, Ruichuan An, Junbo Niu, Zimo Meng, Tianyi Bai, Meiyi Qiang, Huanyao Zhang, Zhiyou Xiao, Tianyu Guo, Qinhan Yu, Runhao Zhao, Zhengpin Li, Xinyi Huang, Yisheng Pan, Yiwen Tang, Yang Shi, Yue Ding, Xinlong Chen, Hongcheng Gao, Minglei Shi, Jialong Wu, Zekun Wang, Yuanxing Zhang, Xintao Wang, Pengfei Wan, Yiren Song, Mike Zheng Shou, Wentao Zhang
General AI
World models have garnered significant attention as a promising research direction in artificial intelligence, yet a clear and unified definition remains lacking. In this paper, we introduce OpenWorldLib, a comprehensive and standardized inference framework for Advanced World Models. Drawing on the evolution of world m…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.0
2026-04-08 · Jiyun Won, Heemin Yang, Woohyeok Kim, Jungseul Ok, Sunghyun Cho
General AI
Recent work has explored optimizing image signal processing (ISP) pipelines for various tasks by composing predefined modules and adapting them to task-specific objectives. However, jointly optimizing module sequences and parameters remains challenging. Existing approaches rely on neural architecture search (NAS) or st…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.0
2026-04-09 · Jindi Lv, Hao Li, Jie Li, Yifei Nie, Fankun Kong, Yang Wang, Xiaofeng Wang, Zheng Zhu, Chaojun Ni, Qiuping Deng, Hengtao Li, Jiancheng Lv, Guan Huang
General AI
Vision-language-action (VLA) models have advanced robot manipulation through large-scale pretraining, but real-world deployment remains challenging due to partial observability and delayed feedback. Reinforcement learning addresses this via value functions, which assess task progress and guide policy improvement. Howev…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-03-24 · Jingchen Ni, Quan Zhang, Dan Jiang, Keyu Lv, Ke Zhang, Chun Yuan
General AI
Existing camouflage object detection (COD) methods typically rely on fully-supervised learning guided by mask annotations. However, obtaining mask annotations is time-consuming and labor-intensive. Compared to fully-supervised methods, existing weakly-supervised COD methods exhibit significantly poorer performance. Eve…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-03-25 · Xiaoyu Tang, Jun Dong, Jintao Cheng, Rui Fan
General AI
Remote sensing visual grounding (RSVG) aims to localize specific targets in remote sensing images using natural language expressions. However, existing methods are restricted to single-sensor domains, i.e., either optical or synthetic aperture radar (SAR), limiting their real-world applicability. In this paper, we intr…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-03-26 · Koichi Takahashi
General AI
This paper introduces Conchordal, a bio-acoustic instrument for generative composition whose sonic agents are governed by artificial life dynamics within a psychoacoustic fitness landscape. The system is built on Direct Cognitive Coupling (DCC), a design principle requiring that generative dynamics operate directly wit…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-03-26 · Dingxi Zhang, Fangjinhua Wang, Marc Pollefeys, Haofei Xu
General AI
Accurate estimation of large displacement optical flow remains a critical challenge. Existing methods typically rely on iterative local search or/and domain-specific fine-tuning, which severely limits their performance in large displacement and zero-shot generalization scenarios. To overcome this, we introduce MegaFlow…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-03-26 · Kubra Aksoy, Adnan Rashid, Osman Hasan, Sofiene Tahar
General AI
Network topology matrices are algebraic representations of graphs that are widely used in modeling and analysis of various applications including electrical circuits, communication networks and transportation systems. In this paper, we propose to use Higher-Order-Logic (HOL) based interactive theorem proving to formali…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-03-26 · Yawen Luo, Xiaoyu Shi, Junhao Zhuang, Yutian Chen, Quande Liu, Xintao Wang, Pengfei Wan, Tianfan Xue
General AI
Multi-shot video generation is crucial for long narrative storytelling, yet current bidirectional architectures suffer from limited interactivity and high latency. We propose ShotStream, a novel causal multi-shot architecture that enables interactive storytelling and efficient on-the-fly frame generation. By reformulat…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-03-28 · Junyoung Koh, Hoyeon Moon, Dongha Kim, Seungmin Lee, Sanghyun Park, Min Song
General AI
Text-to-image models such as Stable Diffusion have achieved unprecedented levels of high-fidelity visual synthesis. As these models advance, personalization of generative models -- commonly facilitated through Low-Rank Adaptation (LoRA) with a dedicated trigger token -- has become a significant area of research. Previo…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-03-30 · Seongkyu Choi Jhonghyun An
General AI
Off-road semantic segmentation is fundamentally challenged by irregular terrain, vegetation clutter, and inherent annotation ambiguity. Unlike urban scenes with crisp object boundaries, off-road environments exhibit strong class-level similarity among terrain categories, resulting in thick and uncertain transition regi…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-03-30 · Sadra Safadoust, Fabio Tosi, Matteo Poggi, Fatma Güney
General AI
We present FlowIt, a novel architecture for optical flow estimation designed to robustly handle large pixel displacements. At its core, FlowIt leverages a hierarchical transformer architecture that captures extensive global context, enabling the model to effectively model long-range correspondences. To overcome the lim…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-03-30 · Yichi Zhang, Weihao Yuan, Yizhuo Zhang, Xidong Zhang, Jia Wan
General AI
Vision-Language-Action (VLA) models improve action generation by conditioning policies on rich vision-language information. However, current auto-regressive policies are constrained by three bottlenecks: (1) architectural bias drives models to overlook visual details, (2) an excessive number of visual tokens makes atte…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-03-30 · Lorenza Prospero, Orest Kupyn, Ostap Viniavskyi, João F. Henriques, Christian Rupprecht
General AI
Acquiring labeled datasets for 3D human mesh estimation is challenging due to depth ambiguities and the inherent difficulty of annotating 3D geometry from monocular images. Existing datasets are either real, with manually annotated 3D geometry and limited scale, or synthetic, rendered from 3D engines that provide preci…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-03-30 · Patrick Rim, Kevin Harris, Braden Copple, Shangchen Han, Xu Xie, Ivan Shugurov, Sizhe An, He Wen, Alex Wong, Tomas Hodan, Kun He
General AI
Accurate 3D understanding of human hands and objects during manipulation remains a significant challenge for egocentric computer vision. Existing hand-object interaction datasets are predominantly captured in controlled studio settings, which limits both environmental diversity and the ability of models trained on such…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-03-30 · Noam Kolt
General AI
The prospect of artificial superintelligence -- AI agents that can generally outperform humans in cognitive tasks and economically valuable activities -- will transform the legal order as we know it. Operating autonomously or under only limited human oversight, AI agents will assume a growing range of roles in the lega…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-03-31 · Ingyu Jang, Leila J. Bridgeman
General AI
Despite longstanding interest, controller synthesis remains challenging for networks of heterogeneous, nonlinear agents. Moreover, the requirements for computational scalability and information privacy have become increasingly critical. This paper introduces a dissipativity-based distributed controller synthesis framew…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-03-31 · Quanyan Zhu, Zhengye Han
General AI
This paper introduces a performative scenario optimization framework for decision-dependent chance-constrained problems. Unlike classical stochastic optimization, we account for the feedback loop where decisions actively shape the underlying data-generating process. We define performative solutions as self-consistent e…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-04-02 · Taha Ameen, Flore Sentenac, Sophie H. Yu
General AI
This paper studies how a fixed flexibility budget should be allocated across the two sides of a balanced bipartite matching market. We model compatibilities via a sparse bipartite stochastic block model in which flexible agents are more likely to connect with agents on the opposite side, and derive an exact variational…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-04-03 · Dennis Marquis, Mazen Farhood
General AI
This paper presents a reinforcement learning-based path-following controller for a fixed-wing small uncrewed aircraft system (sUAS) that is robust to certain actuator failures. The controller is conditioned on a parameterization of actuator faults using hypernetwork-based adaptation. We consider parameter-efficient for…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-04-06 · Chaoran Chen, Zhiping Zhang, Zeya Chen, Eryue Xu, Yinuo Yang, Ibrahim Khalilov, Simret A Gebreegziabher, Yanfang Ye, Ziang Xiao, Yaxing Yao, Tianshi Li, Toby Jia-Jun Li
General AI
LLM-powered computer-use agents (CUAs) are shifting users from direct manipulation to supervisory coordination. Existing oversight mechanisms, however, have largely been studied as isolated interface features, making broader oversight strategies difficult to compare. We conceptualize CUA oversight as a structural coord…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-04-06 · Zeyu Ma, Alexander Raistrick, Jia Deng
General AI
In this paper, we explore the design space of procedural rules for multi-view stereo (MVS). We demonstrate that we can generate effective training data using SimpleProc: a new, fully procedural generator driven by a very small set of rules using Non-Uniform Rational Basis Splines (NURBS), as well as basic displacement …
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-04-06 · Siyuan Liu, Chaoqun Zheng, Xin Zhou, Tianrui Feng, Dingkang Liang, Xiang Bai
General AI
Scene-level point cloud understanding remains challenging due to diverse geometries, imbalanced category distributions, and highly varied spatial layouts. Existing methods improve object-level performance but rely on static network parameters during inference, limiting their adaptability to dynamic scene data. We propo…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-04-06 · Seungjae Son
General AI
We study the spectral gaps of parallel and simulated tempering chains targeting multimodal Gibbs measures. In particular, we consider chains constructed from Metropolis random walks that preserve the Gibbs distributions at a sequence of harmonically spaced temperatures. We prove that their spectral gaps admit polynomia…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-04-06 · Hyunsoo Cha, Wonjung Woo, Byungjun Kim, Hanbyul Joo
General AI
We present Vanast, a unified framework that generates garment-transferred human animation videos directly from a single human image, garment images, and a pose guidance video. Conventional two-stage pipelines treat image-based virtual try-on and pose-driven animation as separate processes, which often results in identi…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-04-07 · Haoyu Zhen, Zixian Gao, Qiao Sun, Yilin Zhao, Yuncong Yang, Yilun Du, Tsun-Hsuan Wang, Yi-Ling Qiao, Chuang Gan
General AI
World action models (WAMs) have emerged as a promising direction for robot policy learning, as they can leverage powerful video backbones to model the future states. However, existing approaches often rely on separate action modules, or use action representations that are not pixel-grounded, making it difficult to full…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-04-07 · Chen Song, Angela Fontan, Rong Su, Julien M. Hendrickx, Vladimir Cvetkovic, Karl H. Johansson
General AI
This paper presents a theoretical convergence analysis for an opinion-action coevolution model that integrates the opinion updating rule of the Hegselmann-Krause model with a utility-based decision-making mechanism. The model is reformulated into an augmented state-space representation, where the state matrix induces a…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-04-07 · Jonathan Bourne, Mwiza Simbeye, Joseph Nockels
General AI
The Character Error Rate (CER) is a key metric for evaluating the quality of Optical Character Recognition (OCR). However, this metric assumes that text has been perfectly parsed, which is often not the case. Under page-parsing errors, CER becomes undefined, limiting its use as a metric and making evaluating page-level…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-04-07 · Zhichao Jia, Guanghui Lan
General AI
Value iteration-type methods have been extensively studied for computing a nearly optimal value function in reinforcement learning (RL). Under a generative sampling model, these methods can achieve sharper sample complexity than policy optimization approaches, particularly in their dependence on the discount factor. In…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-04-09 · Mayur Deshmukh, Hiroyasu Akada, Helge Rhodin, Christian Theobalt, Vladislav Golyanik
General AI
Event cameras offer multiple advantages in monocular egocentric 3D human pose estimation from head-mounted devices, such as millisecond temporal resolution, high dynamic range, and negligible motion blur. Existing methods effectively leverage these properties, but suffer from low 3D estimation accuracy, insufficient in…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-04-09 · Giannis Fikioris, Balasubramanian Sivan, Éva Tardos
General AI
The study of repeated interactions between a learner and a utility-maximizing optimizer has yielded deep insights into the manipulability of learning algorithms. However, existing literature primarily focuses on independent, unlinked rounds, largely ignoring the ubiquitous practical reality of budget constraints. In th…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 3.8
2026-04-09 · Zhengyang Sun, Yu Chen, Xin Zhou, Xiaofan Li, Xiwu Chen, Dingkang Liang, Xiang Bai
General AI
Text-to-video diffusion models have enabled open-ended video synthesis, but often struggle with generating the correct number of objects specified in a prompt. We introduce NUMINA , a training-free identify-then-guide framework for improved numerical alignment. NUMINA identifies prompt-layout inconsistencies by selecti…
- Review
- pending
- Role
- unreviewed
- Read
- later