arxiv
Score 37.0
2026-06-29 · Byeong Hoon Yoon
Research Track A · General AI
We introduce Neural Subspace Reallocation (NSR), which reframes continual learning as memory management over parameter subspaces. Instead of treating Low-Rank Adaptation (LoRA) modules as disposable per-task adapters, NSR manages them as compressible, retrievable memory units on a frozen backbone through a recurring cy…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 36.4
2026-06-22 · Ulas Berk Karli, Tesca Fitzgerald
Research Track A · General AI
Vision-Language-Action (VLA) models are commonly fine-tuned through passive imitation learning, where additional demonstrations are collected for tasks where the policy performs poorly. This approach incurs several downsides: it requires the robot to fail before data collection is triggered, provides little guidance ab…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 35.0
2026-05-12 · Hamza Ahmed Durrani, Rafay Suleman Durrani
Research Track A · General AI
Large language-vision models (LVLMs) such as CLIP, Flamingo, and BLIP have revolutionized AI by enabling understanding across textual and visual modalities. These models excel at tasks like image captioning, visual question answering, and cross-modal retrieval. However, they face catastrophic forgetting when learning n…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 34.5
2026-04-15 · Noureddine Kermiche
Research Track A · General AI
Catastrophic forgetting remains a primary hurdle in sequential task learning for artificial neural networks. We propose a silicon-native modular architecture that achieves structural parameter isolation using Task-Specific Experts and a distributed, outlier-based Gatekeeper. Moving beyond traditional sequential consoli…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 30.0
2026-03-12 · Jiaheng Hu, Jay Shim, Chen Tang, Yoonchang Sung, Bo Liu, Peter Stone, Roberto Martin-Martin
Research Track A · General AI
Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual learning suggests that naive Sequential Fine-Tuning (Seq. FT) leads to catastrophi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 29.5
2026-06-05 · Rahul Nair, Chun Tao
Research Track A · General AI
Deploying Small Language Models (SLMs) on edge devices requires efficient fine-tuning strategies that adapt models to new tasks without degrading their general capabilities. In this study, we benchmark five sub-1B models (135M-1B) on mathematical reasoning tasks and uncover a critical vulnerability: Full Fine-Tuning (F…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 29.0
2026-04-17 · Alexandra Dragomir, Ioana Pintilie, Antonio Barbalau, Marius Dragoi, Florin Brad, Cristian Daniel Paduraru, Alexandru Tifrea, Elena Burceanu, Radu Tudor Ionescu
Research Track A · General AI
Adapter-based methods have become a cost-effective approach to continual learning (CL) for Large Language Models (LLMs), by sequentially learning a low-rank update matrix for each task. To mitigate catastrophic forgetting, state-of-the-art approaches impose constraints on new adapters with respect to the previous ones,…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 28.5
2026-04-09 · Xing Han Lù, Siva Reddy
Research Track B · General AI
Frontier LLMs can navigate complex websites, but their cost and reliance on third-party APIs make local deployment impractical. We introduce Agent-as-Annotators, a framework that structures synthetic trajectory generation for web agents by analogy to human annotation roles, replacing the Task Designer, Annotator, and S…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 28.4
2026-06-22 · Haggai Roitman
General AI
The Hitchhiker's Guide to Agentic AI is a comprehensive practitioner's reference for building autonomous AI systems. The book covers the full stack from first principles to production deployment, organized around a central thesis: building great agentic systems requires understanding every layer of the pipeline, not ju…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 27.5
2026-03-20 · Zijian Lu, Yiping Zuo, Yupeng Nie, Xin He, Weibei Fan, Chen Dai, Lianyong Qi, Shi Jin
Research Track B · General AI
Despite rapid progress in multimodal GUI agents, reusable skill acquisition remains difficult because on-demand generated skills often leave action semantics, state assumptions, and success criteria implicit. This makes them brittle to execution errors, hard to verify, and difficult to repair. We present ContractSkill,…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 27.5
2026-06-09 · Toan Nguyen, Yang Liu, Trung Le, Celso de Melo, Flora D. Salim
Research Track A · General AI
We argue that forgetting is not confined to continual learning but is a general optimization phenomenon: during standard training, dominant mini-batch gradients suppress rare but useful update directions, causing short-term forgetting at every step. When such knowledge is never revisited, these losses compound into lon…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 27.0
2026-05-19 · Fatemeh Pesaran zadeh, Seyeon Choi, Xing Han Lù, Siva Reddy, Gunhee Kim, Fatemeh Pesaran Zadeh
Research Track B · General AI
Large language models (LLMs) have enabled web agents that follow natural language goals through multi-step browser interactions. However, agents fine-tuned on specific trajectories and domain often struggle to generalize out of domain, and offline training can be compute-inefficient due to noisy, redundant trajectories…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 27.0
2026-06-29 · Bertram Taetz, Hugo Albuquerque Cosme da Silva, Gabriele Bleser-Taetz
Research Track A · General AI
Motion-language agents must possess the bidirectional capability to both understand human movement (motion-to-text, M2T) and generate it from natural language (text-to-motion, T2M). While foundational models have achieved strong performance in static settings, autonomous agents operating in dynamic environments must co…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 26.4
2026-06-24 · Haoxiang Sun, Zhihang Yi, Langxuan Deng, Yuhao Zhou, Peiqi Jia, Jian Zhao, Li Yuan, Jiancheng Lv, Tao Wang
General AI
Fine-grained visual reasoning requires multimodal large language models (MLLMs) to identify task-relevant visual evidence and ground their reasoning in local image regions. Existing agentic methods typically rely on reinforcement learning with verifiable rewards or supervised fine-tuning on large-scale annotated reason…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 26.3
2026-03-31 · Yinuo Liu, Zi Qian, Heng Zhou, Jiahao Zhang, Yajie Zhang, Zhihang Li, Mengyu Zhou, Erchao Zhao, Xiaoxi Jiang, Guanjun Jiang
General AI
Interleaved text-and-image generation represents a significant frontier for Multimodal Large Language Models (MLLMs), offering a more intuitive way to convey complex information. Current paradigms rely on either image generation or retrieval augmentation, yet they typically treat the two as mutually exclusive paths, fa…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 26.3
2026-04-22 · Pavel Salovskii, Iuliia Gorshkova
General AI
This paper presents a hybrid architecture for intelligent systems in which large language models (LLMs) are extended with an external ontological memory layer. Instead of relying solely on parametric knowledge and vector-based retrieval (RAG), the proposed approach constructs and maintains a structured knowledge graph …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 26.3
2026-06-08 · Hongcheng Gao, Hailong Qu, Jingyi Tang, Jiahao Wang, Zihao Huang, Hengkang Qiao, Shihong Huang, Junming Yang, Yi Li, Hongyixuan Yuan, Wenjie Li, Bohan Zeng, Wenbo Li, Bo Wang, Jianhui Liu, Olive Huang, Haoyang Huang, Wentao Zhang, Guoqing Huang, Nan Duan, Yinpeng Dong
General AI
Spatial reasoning is a foundational capability for multimodal large language models (MLLMs) to perceive and operate within the physical world. However, existing benchmarks predominantly rely on passive evaluation (e.g., static VQA) or simulator-specific pipelines, failing to assess general interactive spatial understan…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 26.0
2026-03-29 · Ashish Pandey
Research Track A
Sequential fine-tuning of pretrained language encoders often overwrites previously acquired capabilities, but the forgetting behavior of parameter-efficient updates remains under-characterized. We present a controlled empirical study of Low-Rank Adaptation (LoRA) in sequential transformer encoder fine-tuning with compa…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 26.0
2026-06-09 · Jayoo Hwang, Xiaowen Zhang, Vedant Padwal
Research Track B · General AI
Autonomous web navigation remains challenging for LLM agents, and the strongest generalist systems rely on proprietary reasoning models whose inference cost is prohibitive for the repetitive tasks where such agents would be most useful. We argue this gap stems not from insufficient model capability but from agent archi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 26.0
2026-06-12 · Sina Hajimiri, Masih Aminbeidokhti, Jose Dolz, Ismail Ben Ayed, Issam H. Laradji, Spandana Gella, Nicolas Gontier
Research Track B · General AI
Online web agents often augment a base actor with memory, workflow, or skill modules. These modules can improve performance, but they also consume test-time tokens, a cost rarely reported alongside the actor's inference cost. We study online augmentation, where this overhead is paid on every task, and re-evaluate its b…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 25.9
2026-06-24 · Luke McDermott, Robert W. Heath, Rahul Parhi
Research Track A · General AI
Lifelong continual learning remains an obstacle on the path to human-like intelligence. Modern transformers show sparks of intelligence with in-context learning. The quadratic nature of attention, however, prohibits transformers from performing this process on arbitrarily long sequences. In this work, we argue that ext…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 25.8
2026-05-21 · Jinyang Wu, Guocheng Zhai, Ruihan Jin, Yuhao Shen, Zhengxi Lu, Fan Zhang, Haoran Luo, Zheng Lian, Zhengqi Wen, Jianhua Tao
General AI
The proliferation of large language models (LLMs) and modular skills has endowed autonomous agents with increasingly powerful capabilities. Existing frameworks typically rely on monolithic LLMs and fixed logic to interface with these skills. This gives rise to a critical bottleneck: different LLMs offer distinct advant…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 25.5
2026-04-20 · Lixian Chen, Jianhong Tan
Research Track A
Adapting foundation models under resource budgets relies heavily on Parameter-Efficient Fine-Tuning (PEFT), with LoRA being a standard modular solution. However, LoRA suffers from spectral interference. Low-rank updates often concentrate energy on the leading singular directions of pretrained weights, perturbing genera…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 25.0
2026-04-10 · Xingyu Shao, Zhiqiang Yan, Liangzheng Sun, Mengfan He, Chao Chen, Jinhui Zhang, Chunyu Li, Ziyang Meng
Research Track A · General AI
Robust geo-localization in changing environmental conditions is critical for long-term aerial autonomy. While visual place recognition (VPR) models perform well when airborne views match the training domain, adapting them to shifting distributions during sequential missions triggers catastrophic forgetting. Existing co…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 25.0
2026-04-23 · Nicolae Filat, Ahmed Hussain, Konstantinos Kalogiannis, Elena Burceanu
Research Track A · General AI
Streaming Continual Learning (CL) typically converts a continuous stream into a sequence of discrete tasks through temporal partitioning. We argue that this temporal taskification step is not a neutral preprocessing choice, but a structural component of evaluation: different valid splits of the same stream can induce d…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 25.0
2026-05-01 · Beining Wu, Zihao Ding, Jun Huang
Research Track A · General AI
While current federated multimodal continual learning over mixture-of-experts low-rank adaptation (MoE-LoRA) is built on the unverified assumption that routing isolates task-specific knowledge into disjoint experts, we argue that routing operates per-sample, while forgetting accumulates across the task sequence, and gr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 25.0
2026-06-04 · Kion Fallah, Silen Naihin, Barak Widawsky, Qingqing Mao
Research Track A · General AI
Deployed large language model agents must adapt to distribution shift in dynamic environments. Ideally, adaptation can be performed from accumulated agent experiences and retain prior capabilities while transferring to future tasks. However, agent actions and environmental transitions can only be sampled once per scena…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 24.5
2026-04-15 · Xiaohua Wang, Muzhao Tian, Yuqi Zeng, Zisu Huang, Jiakang Yuan, Bowen Chen, Jingwen Xu, Mingbo Zhou, Wenhao Liu, Muling Wu, Zhengkang Guo, Qi Qian, Yifei Wang, Feiran Zhang, Ruicheng Yin, Shihan Dou, Changze Lv, Tao Chen, Kaitao Song, Xu Tan, Tao Gui, Xiaoqing Zheng, Xuanjing Huang
General AI
Reinforcement Learning from Human Feedback (RLHF) and related alignment paradigms have become central to steering large language models (LLMs) and multimodal large language models (MLLMs) toward human-preferred behaviors. However, these approaches introduce a systemic vulnerability: reward hacking, where models exploit…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 24.5
2026-06-05 · Cong Chen, Guo Gan, Kaixiang Ji, ChaoYang Zhang, Zhen Yang, Guangming Yao, Hao Chen, Jingdong Chen, Yi Yuan, Chunhua Shen
General AI
Current Vision-Language Models struggle with hours-long videos because processing full-length visual sequences induces prohibitive token explosion and attention dilution. To overcome this, we introduce MemDreamer to decouple perception and reasoning, shifting long-video understanding into an agentic exploration process…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 24.5
2026-06-29 · Xuan Zhao, Haonan He, Qingyu Yang, Minglei Li, Jingqi Ye, Zelin Tan, Bo Wan, Peng Ye
Research Track A · General AI
Since intelligence fundamentally relies on efficient skill acquisition (Chollet, 2019), the ability to leverage skills is critical. For LLMs, skills, manually authored or extracted from task trajectories, are textual recipes encoding mature problem-solving experience and are critical to agentic capabilities. Despite wi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 24.4
2026-06-20 · Mohammed Rawhani, Dervis Karaboga, Ozkan Ufuk Nalbantoglu, Alper Basturk, Bahriye Akay
Research Track A · General AI
Pre-trained language models struggle when applied to new domains, as full fine-tuning is computationally expensive and prone to catastrophic forgetting. This study addresses this challenge by presenting a novel parameter-efficient strategy for unsupervised domain adaptation that combines custom PEFT architectures with …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 24.3
2026-04-17 · Dian Shao, Zhengzheng Xu, Peiyang Wang, Like Liu, Yule Wang, Jieqi Shi, Jing Huo
General AI
UAV vision-language navigation (VLN) requires an agent to navigate complex 3D environments from an egocentric perspective while following ambiguous multi-step instructions over long horizons. Existing zero-shot methods remain limited, as they often rely on large base models, generic prompts, and loosely coordinated mod…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 24.3
2026-04-21 · Shuai Wang, Hongyi Zhu, Jia-Hong Huang, Yixian Shen, Chengxi Zeng, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring
General AI
Understanding artworks requires multi-step reasoning over visual content and cultural, historical, and stylistic context. While recent multimodal large language models show promise in artwork explanation, they rely on implicit reasoning and internalized knowl- edge, limiting interpretability and explicit evidence groun…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 24.3
2026-06-15 · Peiyang Xu, Bangzheng Li, Sijia Liu, Karthik R. Narasimhan, Pramod Viswanath, Prateek Mittal, Xingyu Fu
General AI
Large language models (LLMs) often fail when answering requires identifying a small but decisive piece of evidence within a long or complex context, such as a single line in a tool trace or a subtle detail in an image. We propose ContextRL, a context-aware reinforcement learning (RL) method that improves long-horizon r…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 24.0
2026-03-16 · Zhaohui Geoffrey Wang
Research Track A · General AI
A critical failure mode of current lifelong agents is not lack of knowledge, but the inability to decide how to reason. When an agent encounters "Is this coin fair?" it must recognize whether to invoke frequentist hypothesis testing or Bayesian posterior inference - frameworks that are epistemologically incompatible. M…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 24.0
2026-04-09 · Yushuo Zhang, Yu Cheng, Yongkang Hu, Jiuan Zhou, Jiawei Chen, Yuan Xie, Zhaoxia Yin
Research Track A
The rapid advancement of facial forgery techniques poses severe threats to public trust and information security, making facial DeepFake detection a critical research priority. Continual learning provides an effective approach to adapt facial DeepFake detection models to evolving forgery patterns. However, existing met…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 24.0
2026-04-27 · Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma
Research Track A · General AI
Continual learning for large language models is typically evaluated through accuracy retention under sequential fine-tuning. We argue that this perspective is incomplete, because uncertainty reliability can degrade earlier and more sharply than top-1 performance. We study this empirically by measuring conformal coverag…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 24.0
2026-05-28 · Alireza Salemi, Chang Zeng, Atharva Nijasure, Jui-Hui Chung, Razieh Rahimi, Fernando Diaz, Hamed Zamani
General AI
Large Language Model (LLM) search agents have shown strong promise for knowledge-intensive language tasks through multiple rounds of reasoning and information retrieval. Most existing systems access information using a retriever that takes a keyword or natural language query and returns a ranked list of documents using…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 23.9
2026-06-22 · Amrita Singh, Rishabh Jha
Research Track A
Medical vision-language models (VLMs) such as BiomedCLIP generalize broadly, but adapting them to a clinical service is as much a safety problem as an accuracy one. Updating a deployed model for a new imaging modality can fail silently in two ways that harm patients: it can forget modalities it already handled (catastr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 23.8
2026-05-07 · Hanxiang Chao, Yihan Bai, Rui Sheng, Tianle Li, Yushi Sun
Research Track A · General AI
Large Language Model (LLM) agents are increasingly expected to maintain coherent, long-term personalized memory, yet current benchmarks primarily measure static fact retrieval, overlooking the ability to revise stored beliefs when new evidence emerges. We identify a critical and underexplored failure mode, Implicit Con…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 23.5
2026-04-12 · Mikhail Menschikov, Dmitry Evseev, Victoria Dochkina, Ruslan Kostoev, Ilia Perepechkin, Petr Anokhin, Nikita Semenov, Evgeny Burnaev
General AI
Personalizing language models by effectively incorporating user interaction history remains a central challenge in the development of adaptive AI systems. While large language models (LLMs), combined with Retrieval-Augmented Generation (RAG), have improved factual accuracy, they often lack structured memory and fail to…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 23.5
2026-04-16 · Cuong Hoang, Le-Minh Nguyen
Research Track A · General AI
The proliferation of financial misinformation poses a severe threat to market stability and investor trust, misleading market behavior and creating critical information asymmetry. Detecting such misleading narratives is inherently challenging, particularly in real-world scenarios where external evidence or supplementar…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 23.3
2026-03-07 · Yunteng Tan, Zhi Gao, Xinxiao Wu
Research Track B · General AI
Large language model-based web agents have shown strong potential in automating web interactions through advanced reasoning and instruction following. While retrieval-based memory derived from historical trajectories enables these agents to handle complex, long-horizon tasks, current methods struggle to generalize acro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 23.3
2026-04-06 · Shu Wang, Edwin Yu, Oscar Love, Tom Zhang, Tom Wong, Steve Scargall, Charles Fan
General AI
Large Language Model (LLM) agents require persistent memory to maintain personalization, factual continuity, and long-horizon reasoning, yet standard context-window and retrieval-augmented generation (RAG) pipelines degrade over multi-session interactions. We present MemMachine, an open-source memory system that integr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 23.3
2026-06-09 · Jaewoo Lee, Zaid Khan, Archiki Prasad, Justin Chih-Yao Chen, Supriyo Chakraborty, Kartik Balasubramaniam, Sambit Sahu, Elias Stengel-Eskin, Hyunji Lee, Mohit Bansal
Research Track A · Research Track B · General AI
Various test-time interventions for Computer Use Agents (CUAs), including critic models, have been developed to improve performance through pre-execution action evaluation in complex Graphical User Interface (GUI) environments. However, existing critics suffer from two key limitations: they (1) focus primarily on short…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 23.3
2026-06-17 · Zhenghao Xing, Ruiyang Xu, Yuxuan Wang, Jinzheng He, Ziyang Ma, Qize Yang, Yunfei Chu, Jin Xu, Junyang Lin, Chi-Wing Fu, Pheng-Ann Heng
General AI
Passive models for long video understanding typically rely on a "watch-it-all" paradigm, processing frames uniformly regardless of query difficulty, causing computational cost to grow with video duration. Although interactive frameworks have emerged, they often rely on global pre-scanning, and their context cost still …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.8
2026-06-29 · Xuan Zhang, Wenxuan Zhang, See-Kiong Ng, Yang Deng
General AI
World models offer a principled way to equip long-horizon LLM agents with foresight: predictions of action consequences before execution. However, unreliable foresight can be ignored, misused, or even degrade downstream decision-making. In this paper, we introduce WorldEvolver, a self-evolving world model framework tha…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 22.8
2026-07-02 · Xiangchen Cheng, Yunwei Jiang, Jianwen Sun, Zizhen Li, Chuanhao Li, Xiangcheng Cao, Yihao Liu, Fanrui Zhang, Li Jin, Kaipeng Zhang
General AI
Memory for a long-horizon LLM agent is a contract about what each future decision is allowed to see. The simplest contract appends past observations, tool calls, and reflections to every prompt, which makes prior context easy to access but also turns it into a jumbled mixture in which the effect of any single memory co…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.5
2026-03-13 · Hongyang Chen, Zhongwu Sun, Hongfei Ye, Kunchi Li, Xuemin Lin
Research Track A · General AI
Continual learning (CL) has emerged as a pivotal paradigm to enable large language models (LLMs) to dynamically adapt to evolving knowledge and sequential tasks while mitigating catastrophic forgetting-a critical limitation of the static pre-training paradigm inherent to modern LLMs. This survey presents a comprehensiv…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.5
2026-03-31 · Michael Chertkov
Research Track A · General AI
An agent that operates sequentially must incorporate new experience without forgetting old experience, under a fixed memory budget. We propose a framework in which memory is not a parameter vector but a stochastic process: a Bridge Diffusion on a replay interval $[0,1]$, whose terminal marginal encodes the present and …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 22.5
2026-04-06 · Jingyang Qiao, Weicheng Meng, Yu Cheng, Zhihang Lin, Zhizhong Zhang, Xin Tan, Jingyu Gong, Kun Shao, Yuan Xie
General AI
Deep research agents (DRAs) integrate LLM reasoning with external tools. Memory systems enable DRAs to leverage historical experiences, which are essential for efficient reasoning and autonomous evolution. Existing methods rely on retrieving similar trajectories from memory to aid reasoning, while suffering from key li…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.5
2026-04-08 · Radu Negulescu
Research Track A · General AI
Catastrophic forgetting is not an engineering failure. It is a mathematical consequence of storing knowledge as global parameter superposition. Existing methods, such as regularization, replay, and frozen subnetworks, add external mechanisms to a shared-parameter substrate. None derives retention from the learning dyna…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.5
2026-04-14 · Jagadeesh Rachapudi, Ritali Vatsi, Praful Hambarde, Amit Shukla
Research Track A · General AI
Recent advances in deep learning underscore the need for systems that can not only acquire new knowledge through Continual Learning (CL) but also remove outdated, sensitive, or private information through Machine Unlearning (MU). However, while CL methods are well-developed, MU techniques remain in early stages, creati…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.5
2026-04-28 · Dominik Żurek, Kamil Faber, Marcin Pietron, Paweł Gajewski, Roberto Corizzo
Research Track A · General AI
Continual offline reinforcement learning (CORL) aims to learn a sequence of tasks from datasets collected over time while preserving performance on previously learned tasks. This setting corresponds to domains where new tasks arise over time, but adapting the model in live environment interactions is expensive, risky, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.5
2026-06-16 · Shiqi He, Yue Cui, Feijie Wu, Xinyu Ma, Jiaheng Lu, Yaliang Li, Bolin Ding, Mosharaf Chowdhury
Research Track B · General AI
Large language model (LLM) web agents are usually deployed as tool callers: each turn, the model reads a fresh page observation and emits one structured tool action. When every action is a low-level primitive, horizons grow quickly and so do policy-facing LLM completions, dominating latency and cost on benchmarks such …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.4
2026-06-25 · Tianyi Men, Zhuoran Jin, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao
Research Track B · General AI
Multimodal web agents can assist humans in operating repetitive GUI tasks, where effective task planning is essential for decomposing complex tasks into executable actions. While small open source MLLMs are cost efficient and privacy preserving compared with commercial large models, they suffer from weak planning and l…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 22.4
2026-06-26 · Yiling Tao, Shihan Deng, Meiling Tao, Pengzhi Wei, Zhichao Hu, Zhihao Zhu
General AI
Search agents powered by large language models (LLMs) are increasingly used to solve complex information-seeking tasks, requiring multi-step retrieval and reasoning to fulfill user goals. However, existing benchmarks often assume that user queries are complete and explicit, overlooking the fact that real-world search r…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.3
2026-03-20 · Taiyi Wang, Sian Gooding, Florian Hartmann, Oriana Riva, Edward Grefenstette
Research Track B · General AI
Large language model (LLM)-based agents have emerged as powerful autonomous controllers for digital environments, including mobile interfaces, operating systems, and web browsers. Web navigation, for example, requires handling dynamic content and long sequences of actions, making it particularly challenging. Existing L…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.3
2026-04-02 · Srivaths Ranganathan, Abhishek Dharmaratnakar, Anushree Sinha, Debanshu Das
General AI
Video recommender systems are among the most popular and impactful applications of AI, shaping content consumption and influencing culture for billions of users. Traditional single-model recommenders, which optimize static engagement metrics, are increasingly limited in addressing the dynamic requirements of modern pla…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.3
2026-04-09 · Wenbo Hu, Xin Chen, Yan Gao-Tian, Yihe Deng, Nanyun Peng, Kai-Wei Chang
General AI
Group Relative Policy Optimization (GRPO) has emerged as the de facto Reinforcement Learning (RL) objective driving recent advancements in Multimodal Large Language Models. However, extending this success to open-source multimodal generalist models remains heavily constrained by two primary challenges: the extreme vari…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.3
2026-04-21 · Josue Torres-Fonseca, Naihao Deng, Yinpei Dai, Shane Storks, Yichi Zhang, Rada Mihalcea, Casey Kennington, Joyce Chai
General AI
Multimodal Large Language Models are increasingly adopted as autonomous agents in interactive environments, yet their ability to proactively address safety hazards remains insufficient. We introduce SafetyALFRED, built upon the embodied agent benchmark ALFRED, augmented with six categories of real-world kitchen hazards…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.3
2026-04-27 · Soyeon Kim, Cheongwoong Kang, Myeongjin Lee, Eun-Chul Chang, Jaedeok Lee, Jaesik Choi
General AI
The development of practical (multimodal) large language model assistants for Korean weather forecasters is hindered by the absence of a multidimensional, expert-level evaluation framework grounded in authoritative sources. To address this, we introduce K-MetBench, a diagnostic benchmark grounded in national qualificat…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.3
2026-04-30 · Jing Zhang, Wentao Jiang, Tao Huang, Zhiwei Wang, Jianxin Liu, Jian Chen, Ping Ye, Gang Wang, Zengmao Wang, Bo Du, Dacheng Tao
General AI
Ultrasound interpretation requires both precise lesion localization and holistic clinical reasoning, yet existing methods typically excel at only one of these capabilities: specialized detectors offer strong localization but limited reasoning, whereas multimodal large language models (MLLMs) provide flexible reasoning …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.3
2026-06-01 · Rui Yang, Qianhui Wu, Yuxi Chen, Hao Bai, Wenlin Yao, Hao Cheng, Baolin Peng, Huan Zhang, Tong Zhang, Jianfeng Gao
Research Track B · General AI
Building capable visual web agents requires long-horizon reasoning, precise grounding, and robust interaction with dynamic real-world websites. Despite rapid progress, the strongest systems remain largely proprietary, while open agents still depend heavily on supervised post-training over large collections of curated w…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.3
2026-06-03 · Bo Mao, Jie Zhou, Yutao Yang, Xin Li, Xian Wei, Qin Chen, Xingjiao Wu, Liang He
Research Track A · General AI
Lifelong learning is essential for Large Language Model (LLM) agents operating in dynamic, interactive environments. However, existing lifelong learning agents for long-horizon tasks typically depend on discrete skill or past experiences retrieval with static parameters during inference, which prevents them from contin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.3
2026-06-10 · Shang Ma, Jisheng Dang, Wencan Zhang, Yifan Zhang, Bimei Wang, Hong Peng, Bin Hu, Qi Tian, Tat-Seng Chua
General AI
We propose a multi-agent collaborative framework built upon a lightweight Multimodal Large Language Model (MLLM), specifically designed for social intelligence reasoning. A key feature of our approach is that both the training and inference phases are augmented via knowledge distillation. Within this architecture, mult…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.3
2026-07-02 · Qianyu Chen, Canran Xiao, Runxuan Tang
Research Track A · General AI
Multimodal large language models must continually adapt to evolving tasks and domains, yet standard continual learning metrics mainly measure whether old answers remain correct, leaving the stability of multimodal grounding largely unexamined. We study this overlooked failure mode and ask whether a continually adapted …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.2
2026-06-23 · Xinyu Mao, Yuhui Zeng, Xiaokun Liu, Wenyu Qin, Meng Wang, Xin Tao, Pengfei Wan, Xiaohan Xing, Max Meng
General AI
Cinematographic captioning aims to describe how a video is filmed using professional film-language concepts such as camera movement, shot size, depth of field, composition, and shooting angle. This capability is important for fine-grained video understanding and controllable movie-quality video generation, yet remains …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.2
2026-06-23 · Tianyu Yang, Sudipta Paul, Vijay Srinivasan, Vivek Kulkarni, Srinivas Chappidi
Research Track A · General AI
Large language model (LLM) agents rely on long-term memory to support extended interactions and personalized assistance beyond finite context windows. Existing memory agents actively update external memory through generated write, revise, and delete operations, but these updates may omit important information, corrupt …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 22.0
2026-05-11 · Shijue Huang, Hangyu Guo, Chenxin Li, Junting Lu, Xinyu Geng, Zhaochen Su, Zhenyu Li, Shuang Chen, Hongru Wang, Yi R. Fung
General AI
Multimodal deep search requires an agent to solve open-world problems by chaining search, tool use, and visual reasoning over evolving textual and visual context. Two bottlenecks limit current systems. First, existing tool-use harnesses treat images returned by search, browsing, or transformation as transient outputs, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.0
2026-05-12 · Rishabh Tiwari, Kusha Sareen, Lakshya A Agrawal, Joseph E. Gonzalez, Matei Zaharia, Kurt Keutzer, Inderjit S Dhillon, Rishabh Agarwal, Devvrit Khatri
Research Track A · General AI
Large language models (LLMs) are trained for downstream tasks by updating their parameters (e.g., via RL). However, updating parameters forces them to absorb task-specific information, which can result in catastrophic forgetting and loss of plasticity. In contrast, in-context learning with fixed LLM parameters can chea…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 22.0
2026-05-21 · Jinho Park, Youbin Kim, Hogun Park, Eunbyung Park
General AI
Spatio-temporal reasoning is a core capability for Multimodal Large Language Models (MLLMs) operating in the real world. As such, evaluating it precisely has become an essential challenge. However, existing spatio-temporal reasoning benchmark datasets primarily rely on static image sets or passively curated video data,…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.0
2026-05-27 · Cheng Chen, Pengpeng Zeng, Yuyu Guo, Lianli Gao, Hengtao Shen, Jingkuan Song
Research Track A · General AI
Low-Rank Adaptation (LoRA) has emerged as a promising paradigm for Continual Learning. It independently updates its low-rank factors ($A$ and $B$), creating a composite update to the full weight matrix through their interaction. To prevent catastrophic forgetting, this update should remain orthogonal to the task-specif…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.0
2026-06-06 · Emre Alyamac, Himanshu Janmeda, Shashwat Krishna, Yash Vijay
Research Track A
Catastrophic forgetting, the abrupt loss of previously acquired knowledge upon learning new information, remains the central challenge in Continual Learning. This project investigates whether the order in which a model learns information affects how well it retains knowledge. Specifically, we ask: does learning general…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 21.5
2026-03-20 · Chang Nie, Chaoyou Fu, Yifan Zhang, Haihua Yang, Caifeng Shan
General AI
Multimodal Large Language Models (MLLMs) serve as daily assistants for millions. However, their ability to generate responses aligned with individual preferences remains limited. Prior approaches enable only static, single-turn personalization through input augmentation or output alignment, and thus fail to capture use…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.5
2026-04-06 · Satyam Goyal, Anirudh Kanchi, Garv Shah, Prakhar Gupta
Research Track A · General AI
Large Language Models (LLMs) are typically static after training, yet real-world applications require continual adaptation to new knowledge without degrading existing capabilities. Standard approaches to updating models, like full finetuning or parameter-efficient methods (e.g., LoRA), face a fundamental trade-off: cat…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.5
2026-04-07 · Guruprasad Viswanathan Ramesh, Asmit Nayak, Basieem Siddique, Kassem Fawaz
Research Track B · General AI
Web agents automate browser tasks, ranging from simple form completion to complex workflows like ordering groceries. While current benchmarks evaluate general-purpose performance~(e.g., WebArena) or safety against malicious actions~(e.g., SafeArena), no existing framework assesses an agent's ability to successfully exe…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 21.5
2026-04-20 · Xinping Lei, Xinyu Che, Junqi Xiong, Chenchen Zhang, Yukai Huang, Chenyu Zhou, Haoyang Huang, Minghao Liu, Letian Zhu, Hongyi Ye, Jinhua Hao, Ken Deng, Zizheng Zhan, Han Li, Dailin Li, Yifan Yao, Ming Sun, Zhaoxiang Zhang, Jiaheng Liu
General AI
Large language models are rapidly evolving into interactive coding agents capable of end-to-end web coding, yet existing benchmarks evaluate only narrow slices of this capability, typically text-conditioned generation with static-correctness metrics, leaving visual fidelity, interaction quality, and codebase-level reas…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.5
2026-05-06 · Alaa Zniber, Ouassim Karrakchou, Mounir Ghogho
Research Track A · General AI
Early-exiting neural networks enable adaptive inference by allowing inputs to exit at intermediate classifiers, reducing computation for easy samples while maintaining high accuracy. In practice, exits can be trained sequentially by incrementally adding them to a shared backbone; however, this sequential training can c…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.5
2026-05-12 · Xinrui Wang, Shao-Yuan Li, Bartłomiej Twardowski, Alexandra Gomez-Villa, Songcan Chen
Research Track A · General AI
Online Continual Learning (OCL) aims to learn from endless non\text{-}stationary data streams, yet most existing methods assume a flat label space and overlook the hierarchical organization of real\text{-}world concepts that evolves both horizontally (sibling classes) and vertically (coarse or fine categories). To bett…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.5
2026-05-12 · Patryk Krukowski, Jacek Tabor, Przemysław Spurek, Marek Śmieja, Łukasz Struski
Research Track A · General AI
Data-free continual learning (DFCIL) relies on model inversion to synthesize pseudo-samples and mitigate catastrophic forgetting. However, existing inversion methods are fundamentally limited by a simplifying assumption: they model feature distributions using diagonal covariance, effectively ignoring correlations that …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.5
2026-05-21 · Javad Parsa, Enis Simsar, Amir Joudaki, Thomas Hofmann, André M. H. Teixeira
Research Track A · General AI
Parameter-efficient fine-tuning enables fast personalization of text-to-image diffusion models, but composing multiple custom concepts remains challenging due to representation interference. Existing modular methods either rely on expensive post-hoc fusion or freeze adaptation subspaces, which limit expressiveness and …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.5
2026-06-08 · Steven Vander Eeckt, Hugo Van hamme
Research Track A
Speech foundation models enable strong general-purpose ASR and are attractive for downstream adaptation. However, their size and the catastrophic forgetting induced by sequential fine-tuning demand parameter-efficient and regularized training methods, motivating parameter-efficient continual learning (PECL). While PECL…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.5
2026-06-29 · Yiting Hu, Lingjie Duan, Qian Zhang
Research Track A
Machine unlearning aims to eliminate the influence of specific data from trained models to safeguard privacy. However, this presents a significant challenge in the context of continual learning (CL), where models update sequentially on dynamic datasets. A major limitation is that current certified unlearning algorithms…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 21.4
2026-06-23 · Shiding Zhu, Yudi Qi, Yajie Wang, Jiaze Li, Chao Song, Yaorui Shi, Yibo Miao, Hanqi Gao, Kai Zhang
General AI
Experience-driven self-evolution is critical for large language model (LLM) agents to improve through open-world interaction. However, existing experience learning methods mostly rely on single-agent loops, where the same agent executes tasks, summarizes outcomes, and determines memory content. This setup makes agents …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 21.4
2026-06-23 · Yuxin Zuo, Zikai Xiao, Li Sheng, Fei Huang, Jianhong Tu, Yuxuan Liu, Tianyi Tang, Xiaomeng Hu, Yang Su, Qingfeng Lan, Yantao Liu, Qin Zhu, Yinger Zhang, Bowen Yu, Haiquan Zhao, Haiyang Xu, Jianxin Yang, Jiayang Cheng, Junyang Wang, Lianghao Deng, Mingfeng Xue, Tianyi Bai, Yang Fan, Yubo Ma, Yucheng Li, Zeyu Cui, Zhihai Wang, Zhihui Xie, Zhuorui Ye, An Yang, Dayiheng Liu, Jingren Zhou, Ning Ding
General AI
A world model predicts environment dynamics based on current observations and actions, serving as a core cognitive mechanism for reasoning and planning. In this work, we investigate how world modeling based on language models can further push the boundaries of general agents. (i) We first focus on building foundation m…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.3
2026-03-23 · Shoubin Yu, Lei Shu, Antoine Yang, Yao Fu, Srinivas Sunkara, Maria Wang, Jindong Chen, Mohit Bansal, Boqing Gong
Research Track B · General AI
Multimodal AI agents are increasingly automating complex real-world workflows that involve online web execution. However, current web-agent benchmarks suffer from a critical limitation: they focus entirely on web-based interaction and perception, lacking grounding in the user's real-world physical surroundings. This li…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.3
2026-04-09 · Shilin Yan, Jintao Tong, Hongwei Xue, Xiaojun Tang, Yangyang Wang, Kunyu Shi, Guannan Zhang, Ruixuan Li, Yixiong Zou
General AI
The advent of agentic multimodal models has empowered systems to actively interact with external environments. However, current agents suffer from a profound meta-cognitive deficit: they struggle to arbitrate between leveraging internal knowledge and querying external utilities. Consequently, they frequently fall prey …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.3
2026-04-22 · Yubo Jiang, Yitong An, Xin Yang, Abudukelimu Wuerkaixi, Xuxin Cheng, Fengying Xie, Zhiguo Jiang, Cao Liu, Ke Zeng, Haopeng Zhang
General AI
We introduce V-tableR1, a process-supervised reinforcement learning framework that elicits rigorous, verifiable reasoning from multimodal large language models (MLLMs). Current MLLMs trained solely on final outcomes often treat visual reasoning as a black box, relying on superficial pattern matching rather than perform…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.3
2026-04-29 · GLM-V Team, :, Wenyi Hong, Xiaotao Gu, Ziyang Pan, Zhen Yang, Yuting Wang, Yue Wang, Yuanchang Yue, Yu Wang, Yanling Wang, Yan Wang, Xijun Liu, Wenmeng Yu, Weihan Wang, Wei Li, Shuaiqi Duan, Sheng Yang, Ruiliang Lv, Mingdao Liu, Lihang Pan, Ke Ning, Junhui Ji, Jinjiang Wang, Jing Chen, Jiazheng Xu, Jiale Zhu, Jiale Cheng, Ji Qi, Guobing Gan, Guo Wang, Cong Yao, Zijun Dou, Zihao Zhou, Zihan Wang, Zhiqi Ge, Zhijie Li, Zhenyu Hou, Zhao Xue, Zehui Wang, Zehai He, Yusen Liu, Yukuo Cen, Yuchen Li, Yuan Wang, Yijian Lu, Yanzi Wang, Yadong Xue, Xinyu Zhang, Xinyu Liu, Wenkai Li, Tianyu Tong, Tianshu Zhang, Shengdong Yan, Qinkai Zheng, Mingde Xu, Licheng Bao, Jiaxing Xu, Jiaxin Fan, Jiawen Qian, Jiali Chen, Jiahui Lin, Haozhi Zheng, Haoran Wang, Haochen Li, Fan Yang, Dan Zhang, Chuangxin Zhao, Chengcheng Wu, Boyan Shi, Bowei Jia, Baoxu Wang, Peng Zhang, Debing Liu, Bin Xu, Juanzi Li, Minlie Huang, Yuxiao Dong, Jie Tang, V Team
General AI
We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environments, agentic capability depends not only on language reasoning, but also on the ability to perceive, interpret, and act over heterogeneous contexts such as images, video…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.3
2026-06-04 · Shangheng Du, Xiangchao Yan, Jinxin Shi, Zongsheng Cao, Shiyang Feng, Zichen Liang, Boyuan Sun, Tianshuo Peng, Yifan Zhou, Xin Li, Jie Zhou, Liang He, Bo Zhang, Lei Bai
General AI
Large language model (LLM) agents are increasingly applied to long-horizon tasks such as scientific discovery and machine learning engineering (MLE), where sustained self-evolution becomes a key capability. However, existing MLE agents suffer from inter-branch information isolation, memoryless search, and lack of hiera…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.3
2026-06-08 · Zechen Sun, Yuyang Sun, Zecheng Tang, Juntao Li, Wenpeng Hu, Wenliang Chen, Zhunchen Luo, Guotong Geng, Min Zhang
General AI
Generating coherent and controllable long-form content remains a persistent challenge for Large Language Models (LLMs). While reasoning-enhanced models have demonstrated success in logic-intensive domains, our evaluation reveals that they suffer from a severe length collapse in open-ended writing, where performance deg…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.3
2026-06-09 · Yv Zhang, Hao Sun, Hao Fang, Kuofeng Gao, Fan Mo, Bin Chen, Shu-Tao Xia, Yaowei Wang
Research Track B · General AI
External memory has become a core component of modern web agents, enabling long-horizon reasoning through the retrieval of past experiences. However, this paradigm introduces a critical vulnerability: malicious content injected into memory can be persistently recalled and repeatedly influence agent behavior. In this wo…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.3
2026-06-09 · Heming Zou, Qi Wang, Yun Qu, Yuhang Jiang, Lizhou Cai, Yixiu Mao, Ru Peng, Xin Xu, Weijie Liu, Kai Yang, Saiyong Yang, Xiangyang Ji
General AI
Reinforcement learning with verifiable rewards (RLVR) is a promising approach for enhancing reasoning and agentic behavior in large language models. However, rollout-intensive policy optimization is often limited by insufficient reward contrast, arising when overly simple or complex prompts generate low-variance feedba…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.3
2026-06-11 · Tanmoy Kanti Halder, Akash Ghosh, Subhadip Baidya, Arijit Roy, Sriparna Saha
General AI
Multimodal Large Language Models (MLLMs) have shown promising reasoning capabilities in general domains, yet their performance remains limited in specialized settings such as healthcare, especially in multilingual and low-resource scenarios. This gap is critical in regions like rural India, where patients often express…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.3
2026-06-11 · Xunhao Lai, Weiqi Xu, Yufeng Yang, Qiaorui Chen, Yang Xu, Lunbin Zeng, Xiaolong Li, Haohai Sun, Haichao Zhu, Vito Zhang, Pengyu Zhao
General AI
Ultra-long-context capability is becoming indispensable for frontier LLMs: agentic workflows, repository-scale code reasoning, and persistent memory all require the model to jointly attend over hundreds of thousands to millions of tokens, yet the quadratic cost of softmax attention makes this untenable at deployment sc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.3
2026-06-11 · King Yeung Tsang, Zihao Zhao, Vishal Venkataramani, Haizhou Shi, Zixuan Ke, Semih Yavuz, Shafiq Joty, Hao Wang
General AI
Multi-Agent Systems (MAS) built on Large Language Models (LLMs) require effective orchestration to coordinate specialized agents, yet training such orchestrators is hindered by limited supervision and high computational cost. We propose Orchestration Reward Modeling (OrchRM), a self-supervised framework for evaluating …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.3
2026-06-17 · Shengyuan Ding, Xilin Wei, Xinyu Fang, Haodong Duan, Dahua Lin, Jiaqi Wang, Yuhang Zang
Research Track A · General AI
Deploying multimodal foundation models as closed-loop policies increasingly requires conditioning actions on observations that are no longer visible. However, existing benchmarks either expose the full state, conflate hidden-state reconstruction with other agent skills, or test recall only after an episode has ended. W…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.2
2026-06-23 · Wei Zhou, Xuanhe Zhou, Shaokun Han, Hongming Xu, Guoliang Li, Zhiyu Li, Feiyu Xiong, Fan Wu
Research Track A · General AI
Memory for large language model (LLM) agents has rapidly evolved from simple retrieval-augmented mechanisms into a data management system that supports persistent information storage, retrieval, update, consolidation, and dynamic lifecycle governance throughout agent execution. Despite this evolution, existing evaluati…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.2
2026-06-24 · Akshay Paruchuri, Sanmi Koyejo, Ehsan Adeli
General AI
Standard benchmarks for multimodal large language models (MLLMs) score each item on one canonical ordering and miss whether order-irrelevant shuffling changes the answer, a baseline reliability property called for by emerging AI evaluation guidelines. We introduce Facet-Probe, a five-facet audit (option, evidence-chunk…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.0
2026-04-22 · Noah Flynn
Research Track A · General AI
Large language models (LLMs) often exhibit performance disparities across languages, with naive multilingual fine-tuning frequently degrading performance due to negative cross-lingual interference. To address this, we introduce COMPASS (COntinual Multilingual PEFT with Adaptive Semantic Sampling), a novel data-centric …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 21.0
2026-05-05 · You Qin, Kai Liu, Shengqiong Wu, Kai Wang, Shijian Deng, Yapeng Tian, Junbin Xiao, Yazhou Xing, Yinghao Ma, Bobo Li, Roger Zimmermann, Lei Cui, Furu Wei, Jiebo Luo, Hao Fei
General AI
Audio-Visual Intelligence (AVI) has emerged as a central frontier in artificial intelligence, bridging auditory and visual modalities to enable machines that can perceive, generate, and interact in the multimodal real world. In the era of large foundation models, joint modeling of audio and vision has become increasing…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.0
2026-06-11 · Zhibao Chen, Qian Cheng
Research Track A · General AI
Long-running LLM agents accumulate interaction histories far larger than any context window, forcing a standing decision: what to encode deeply, what to forget, and what to retrieve under a fixed memory budget. Production systems answer with semantic similarity or recency -- both mis-specified for the forgetting decisi…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 21.0
2026-06-15 · Yongjia Lei, Nedim Lipka, Zhisheng Qi, Utkarsh Sahu, Koustava Goswami, Franck Dernoncourt, Ryan A. Rossi, Yu Wang
General AI
Retrieving external knowledge is essential for solving real-world tasks, yet it remains challenging when the relationship between a query and its relevant knowledge involves implicit and complex reasoning beyond surface-level semantic or lexical matching (e.g., mathematical problems relying on the same theorem or codin…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 21.0
2026-06-30 · Junha Jung, Minbyul Jeong, Suhyeon Lim, Sungwook Jung, Jaehoon Yun, Taeyun Roh, Mujeen Sung, Jaewoo Kang
General AI
Recent multimodal large language models have shown great promise in clinical image reasoning, but existing post-training pipelines remain predominantly outcome-centric, relying on final answer correctness or sequence-level preferences. This suffers from sparse credit assignment, making it difficult to optimize the reas…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.8
2026-05-07 · Bodong Du, Bowen Liu, Yang Yu, Xinpeng Ding, Zhiheng Wu, Shuning Wang, Shuo Nie, Naiming Liu, Qifeng Chen, Yangqiu Song, Xiaomeng Li
General AI
Medical multimodal large language models (MLLMs) have advanced image understanding and short-video analysis, but real clinical review often requires full-procedure video understanding. Unlike general long videos, medical procedures contain highly redundant anatomical views, while decisive evidence is temporally sparse,…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.8
2026-05-12 · Seokwon Jung, Alexander Rubinstein, Arnas Uselis, Sangdoo Yun, Seong Joon Oh
General AI
LLM-based agents increasingly operate in persistent environments where they must store, update, and reason over information across many sessions. While prior benchmarks evaluate only single-entity updates, MEME defines six tasks spanning the full space defined by the multi-entity and evolving axes, including three not …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.8
2026-05-29 · Nianyi Lin, Jiajie Zhang, Lei Hou, Juanzi Li
General AI
Long-context reasoning remains a central challenge for large language models, which often fail to locate and integrate key information in extensive distracting content. Reinforcement learning with verifiable rewards (RLVR) has shown promise for this task, yet existing methods are limited by low-confusability distractor…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.8
2026-05-29 · Yulu Pan, Han Yi, Seongsu Ha, Md Mohaiminul Islam, Benjamin Zhang, Lorenzo Torresani, Gedas Bertasius
General AI
True video intelligence demands more than recognizing what is visible: it requires reasoning about why events unfold, predicting what would change under different conditions, and deciding what to do next. We refer to this progression, from perception through causal reasoning and simulation to strategic planning, as Str…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.6
2026-07-02 · Liyan Tang, Fangcong Yin, Greg Durrett
General AI
Large vision-language models can reason over multimodal inputs by generating textual chains of thought (CoT). A key capability exhibited in CoT reasoning is self-reflection: revisiting earlier decisions and correcting previous errors. However, existing LVLMs often fail to properly attend to visual inputs during reflect…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 20.5
2026-03-26 · Jie Zhu, Yimin Tian, Boyang Li, Kehao Wu, Zhongzhi Liang, Junhui Li, Xianyin Zhang, Lifan Guo, Feng Chen, Yong Liu, Chi Zhang
General AI
This paper introduces FinMCP-Bench, a novel benchmark for evaluating large language models (LLMs) in solving real-world financial problems through tool invocation of financial model context protocols. FinMCP-Bench contains 613 samples spanning 10 main scenarios and 33 sub-scenarios, featuring both real and synthetic us…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 20.5
2026-03-30 · Yuqi Zhou, Sunhao Dai, Changle Qu, Liang Pang, Jun Xu, Ji-Rong Wen
General AI
Information retrieval (IR) systems have traditionally been designed and trained for human users, with learning-to-rank methods relying heavily on large-scale human interaction logs such as clicks and dwell time. With the rapid emergence of large language model (LLM) powered search agents, however, retrieval is increasi…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 20.5
2026-03-30 · Bin Zhu, Qianghuai Jia, Tian Lan, Junyang Ren, Feng Gu, Feihu Jiang, Longyue Wang, Zhao Xu, Weihua Luo
General AI
Deep research agents autonomously conduct open-ended investigations, integrating complex information retrieval with multi-step reasoning across diverse sources to solve real-world problems. To sustain this capability on long-horizon tasks, reliable verification is critical during both training and inference. A major bo…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 20.5
2026-04-09 · Chuzhan Hao, Wenfeng Feng, Guochao Jiang, Guofeng Quan, Guohua Liu, Yuewei Zhang
General AI
Reinforcement learning (RL) has become an effective approach for advancing the reasoning capabilities of large language models (LLMs) through the strategic integration of external search engines. However, current RL-based search agents often rely on a process of stochastic exploration guided by carefully crafted outcom…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 20.5
2026-04-13 · Junfu Pu, Yuxin Chen, Teng Wang, Ying Shan
General AI
Current multimodal large language models (MLLMs) have demonstrated remarkable capabilities in short-form video understanding, yet translating long-form cinematic videos into detailed, temporally grounded scripts remains a significant challenge. This paper introduces the novel video-to-script (V2S) task, aiming to gener…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 20.5
2026-04-18 · Chuanyu Qin, Chenxu Yang, Qingyi Si, Naibin Gu, Dingyu Yao, Zheng Lin, Peng Fu, Nan Duan, Jiaqi Wang
General AI
Reinforcement learning from verifiable rewards (RLVR) has demonstrated remarkable effectiveness in improving the reasoning capabilities of large language models. As models evolve into natively multimodal architectures, extending RLVR to video understanding becomes increasingly important yet remains largely unexplored, …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 20.5
2026-04-22 · Xiyang Wu, Zongxia Li, Guangyao Shi, Alexander Duffy, Tyler Marques, Matthew Lyle Olson, Tianyi Zhou, Dinesh Manocha
General AI
Long horizon interactive environments are a testbed for evaluating agents skill usage abilities. These environments demand multi step reasoning, the chaining of multiple skills over many timesteps, and robust decision making under delayed rewards and partial observability. Games are a good testbed for evaluating agent …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.5
2026-04-27 · Kevin McKee, Thomas Hazy, Yicong Zheng, Zacharie Bugaud, Thomas Miconi
Research Track A · General AI
Block-sequential continual learning demands that a single model both protect prior solutions from catastrophic forgetting and efficiently infer at inference time which prior solution matches the current input without task labels. We present Functional Task Networks (FTN), a parameter-isolation method inspired by struct…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.5
2026-05-08 · Donguk Kwon, Dongha Lee
Research Track B · General AI
Web agents perceive web pages through an observation space, yet its granularity has remained an underexamined design choice. Existing work treats observation at the same element-level granularity as the action space, leaving the page's functional organization implicit and forcing the agent to infer it from element-leve…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.5
2026-05-14 · Julien Piet, Annabella Chow, Yiwei Hou, Muxi Lyu, Sylvie Venuto, Jinhao Zhu, Raluca Ada Popa, David Wagner
Research Track B · General AI
ReAct has become the default architecture across LLM agents, and many existing web agents follow this paradigm. We argue that it is the wrong default for web agents. Instead, web agents should default to plan-then-execute: commit to a task-specific program before observing runtime web content, then execute it. The reas…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.5
2026-05-27 · Guangyu Li, Meng Ding, Lijie Hu
Research Track A · General AI
In-context learning (ICL) derives its power from enabling Large Language Models to adapt to new tasks via prompt-based reasoning alone, entirely bypassing the need for parameter updates. Existing theories primarily study ICL in single-task settings, while real-world prompts often contain sequences of heterogeneous task…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 20.5
2026-06-05 · Lingyong Yan, Can Xu, Yukun Zhao, Wenxuan Li, Qingyang Chen, Jiulong Wu, Wenli Song, Xiangnan Li, Weixian Shi, Yiqun Chen, Xuchen Ma, Yuchen Li, Jiashu Zhao, Shuaiqiang Wang, Jianmin Wu, Dawei Yin
General AI
Deep Research (DR) has emerged as a new agentic paradigm to tackle complex, open-ended research tasks, demanding systems that can iteratively frame problems, acquire evidence, verify sources, and synthesize long-form reports. In practice, however, current DR systems are constrained by four interrelated limitations: lon…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.5
2026-06-10 · Longkun Hao, Hongyu Lin, Hao Li, Zhichao Yang, Haojie Hao, Dongshuo Huang, Haitao Yang, Hongyu Ge, Ming jie Xie, Yanjun Wu, Zi Hao Yin, Yan Bai, Yihang Lou
Research Track B · General AI
Training interactive web agents through imitation learning from expert trajectories has emerged as a highly effective approach. However, determining the optimal timing for expert intervention presents a critical challenge in this context. Delayed intervention often leads to the accumulation of early-stage errors, pushi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.3
2026-04-01 · Haiyang Guo, Yichen Shi, Fei Zhu, Wenzhuo Liu, Hongbo Zhao, Fanhu Zeng, Shijie Ma, Da-Han Wang, Xu-Yao Zhang
Research Track A · General AI
Video Large Language Models (Video-LLMs) require continual learning to adapt to non-stationary real-world data. However, existing benchmarks fall short of evaluating modern foundation models: many still rely on models without large-scale pre-training, and prevailing benchmarks typically partition a single dataset into …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.3
2026-04-07 · Md Shamimul Islam, Luis G. Jaimes, Ayesha S. Dina
Research Track A · General AI
Network Intrusion Detection Systems (NIDS) face important limitations. Signature-based methods are effective for known attack patterns, but they struggle to detect zero-day attacks and often miss modified variants of previously known attacks, while many machine learning approaches offer limited interpretability. These …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.3
2026-04-13 · Keyang Zhong, Junlin Xie, Hefeng Wu, Haofeng Li, Guanbin Li
General AI
Vision-language models (VLMs) have shown impressive capabilities in perceptual tasks, yet they degrade in complex multi-hop reasoning under multiplayer game settings with imperfect and deceptive information. In this paper, we study a representative multiplayer task, Murder Mystery Games, which require inferring hidden …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.3
2026-04-14 · Zhaofen Wu, Hanrong Zhang, Fulin Lin, Wujiang Xu, Xinran Xu, Yankai Chen, Henry Peng Zou, Shaowen Chen, Weizhi Zhang, Xue Liu, Philip S. Yu, Hongwei Wang
Research Track A · General AI
To sustain coherent long-term interactions, Large Language Model (LLM) agents must navigate the tension between acquiring new information and retaining prior knowledge. Current unified stream-based memory systems facilitate context updates but remain vulnerable to interference from transient noise. Conversely, discrete…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.3
2026-04-20 · Xingchen Xiao, Heyan Huang, Runheng Liu, Jincheng Xie
General AI
Large language models (LLMs) are widely used in retrieval-augmented generation (RAG) to incorporate external knowledge at inference time. However, when retrieved contexts are noisy, incomplete, or heterogeneous, a single generation process often struggles to reconcile evidence effectively. We propose \textbf{MASS-RAG},…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.3
2026-04-24 · Chih-Ting Liao, Xi Xiao, Chunlei Meng, Zhangquan Chen, Yitong Qiao, Weilin Zhou, Tianyang Wang, Xu Zheng, Xin Cao
General AI
Multimodal large language models (MLLMs) have advanced static visual--spatial reasoning, yet they often fail to preserve long-horizon spatial coherence in embodied settings where beliefs must be continuously revised from egocentric observations under environmental change. We introduce SpaMEM (Spatial Memory from Action…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.3
2026-04-29 · Mingji Ge, Qirui Chen, Zeqian Li, Weidi Xie
General AI
Long-term video understanding requires interpreting complex temporal events and reasoning over procedural activities. While instructional video corpora, like HowTo100M, offer rich resources for model training, they present significant challenges, including noisy ASR transcripts and inconsistent temporal alignments betw…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.3
2026-06-04 · Yasmine Omri, Ziyu Gan, Zachary Broveak, Robin Geens, Zexue He, Alex Pentland, Marian Verhelst, Tsachy Weissman, Thierry Tambe
General AI
LLM agents are increasingly deployed on long-horizon tasks requiring sustained reasoning over extended interaction histories. Realizing this at scale requires agents to persistently store, retrieve, and update their own memory across sessions. A rich ecosystem of agent memory systems has emerged spanning flat retrieval…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.3
2026-06-04 · Yuxiao Ye, Haoran He, Fangyuan Kong, Xintao Wang, Pengfei Wan, Kun Gai, Ling Pan
General AI
Text-guided image editing has advanced rapidly with diffusion models and unified multimodal foundation models. However, most existing methods remain confined to single-turn settings, overlooking the more realistic scenario of multi-turn in-context editing, where users iteratively refine an image through a sequence of i…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.3
2026-06-07 · Ruoyu Yao, Pei Liu, Ruiguo Zhong, Mingxing Peng, Rui Yang, Jun Ma
Research Track A · General AI
While large language models (LLMs) offer promising reasoning capabilities, their integration into safety-critical driving systems is hindered by limited reasoning diversity, high computational overhead, and static learning paradigms. To address these challenges, we propose LUNA-AD, a lightweight uncertainty-aware langu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.3
2026-06-15 · Zhiqiang Zhou, Junliang Dai, Xu ling
General AI
Multimodal large language models (MLLMs) excel at visual reasoning but rely on text-based chain-of-thought (CoT), lacking interpretable visual intermediates. Existing methods use opaque tokens or external tools, missing key properties. We propose Gen-VCoT, a framework using expert vision models to generate RGB images a…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.2
2026-06-23 · Wenxin Wang, Bo Zhang, Feng Chen, Zixuan Wang, Wen Li, Changsheng Li, Yinjie Lei
General AI
Recent advancements have explored agentic zero-shot 3D understanding by reformulating it as video keyframe understanding with Multimodal Large Language Models (MLLMs). However, existing methods face an intrinsic bottleneck due to the finite observation perspectives inherent in videos and the implicit perception of 3D s…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.2
2026-06-24 · Yu-Yang Chen, Lan-Zhe Guo
General AI
Multimodal Large Language Models (MLLMs) demonstrate strong performance on standard visual question answering benchmarks, yet their scalability under controlled structural complexity remains poorly understood. We introduce TriViewBench, a controlled three-view visual reasoning benchmark constructed from synthetic 3D sc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.0
2026-03-13 · Orit Shahnovsky, Rotem Dror
Research Track B · General AI
Developing autonomous agents for web-based tasks is a core challenge in AI. While Large Language Model (LLM) agents can interpret complex user requests, they often operate as black boxes, making it difficult to diagnose why they fail or how they plan. This paper addresses this gap by formally treating web tasks as sequ…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.0
2026-04-14 · Zhaoyang Wang, Qianhui Wu, Xuchao Zhang, Chaoyun Zhang, Wenlin Yao, Fazle Elahi Faisal, Baolin Peng, Si Qin, Suman Nath, Qingwei Lin, Chetan Bansal, Dongmei Zhang, Saravan Rajmohan, Jianfeng Gao, Huaxiu Yao
Research Track B · General AI
Autonomous web agents powered by large language models (LLMs) have shown promise in completing complex browser tasks, yet they still struggle with long-horizon workflows. A key bottleneck is the grounding gap in existing skill formulations: textual workflow skills provide natural language guidance but cannot be directl…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.0
2026-04-19 · Liangzu Peng, Uday Kiran Reddy Tadipatri, Ziqing Xu, Eric Eaton, René Vidal
Research Track A · General AI
Continual learning (CL) is concerned with learning multiple tasks sequentially without forgetting previously learned tasks. Despite substantial empirical advances over recent years, the theoretical development of CL remains in its infancy. At the heart of developing CL theory lies the challenge that the data distributi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.0
2026-04-20 · Lingfeng Zhang, yongan sun, Jinpeng Hu, Hui Ma, yang ying, Kuien Liu, Zenglin Shi, Meng Wang, Yongan Sun, Yang Ying
Research Track B · General AI
Recent advancements in large language models (LLMs) have empowered autonomous web agents to execute natural language instructions directly on real-world webpages. However, existing agents often struggle with complex tasks involving dynamic interactions and long-horizon execution due to rigid planning strategies and hal…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.0
2026-04-22 · Yingjie Gu, Bo Xiong, Yijuan Guo, Chao Li, Xiaojing Zhang, Liqiang Wang, Pengcheng Ren, Qi Sun, Jingyao Ma, Shidang Shi
Research Track A · General AI
For LLM agents, memory management critically impacts efficiency, quality, and security. While much research focuses on retention, selective forgetting--inspired by human cognitive processes (hippocampal indexing/consolidation theory and Ebbinghaus forgetting curve)--remains underexplored. We argue that in resource-cons…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.0
2026-04-29 · Fazle Elahi Faisal, Qianhui Wu, Baolin Peng, Jianfeng Gao
Research Track B · General AI
Recent advances in multimodal large language models (LLMs) have revolutionized web agents that can automate complex tasks on websites. However, their accuracy remains limited by the scarcity of high-quality web trajectory training data. Existing automatic trajectory generation methods suffer from incomplete website cov…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.0
2026-05-03 · Matteo Gambella, Fabrizio Pittorino, Manuel Roveri
Research Track A · General AI
Neural Architecture Search (NAS) has emerged as a powerful framework for automatically discovering neural architectures that balance accuracy and efficiency. However, as AI transitions from static benchmarks to real-world deployment, the traditional focus on hardware-aware efficiency is no longer sufficient. We observe…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.0
2026-05-07 · Md Anwar Hossen, Fatema Siddika, Juan Pablo Munoz, Tanya Roosta, Ali Jannesari
Research Track A · General AI
Large language models (LLMs) can acquire new capabilities through fine-tuning, but continual adaptation often leads to catastrophic forgetting. We propose CRAFT, a continual learning framework that avoids updating model weights by instead learning low-rank interventions on hidden representations. CRAFT proceeds in thre…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.0
2026-05-25 · Bowen Wang, Dunjie Lu, Junli Wang, Tianyi Bai, Shixuan Liu, Zhipeng Zhang, Haiquan Wang, Hao Hu, Tianbao Xie, Shuai Bai, Dayiheng Liu, Que Shen, Junyang Lin, Tao Yu
Research Track B · General AI
Reinforcement learning with verifiable rewards (RLVR) has driven breakthroughs in domains such as math, tool-use, and software engineering, yet its extension to computer-use agents (CUAs) has been bottlenecked by the scarcity of scalable training data with deterministic rewards. Constructing such data for CUAs requires…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.0
2026-05-28 · Alejandra Zambrano, Sara Vera Marjanovic, Imene Kerboua, Xing Han Lù, Leila Kosseim
Research Track B · General AI
Despite recent advances, LLM-based web agents still struggle with limited exploration, omission of critical steps, and sensitivity to task constraints. Prior work suggests that many of these failures stem from weaknesses in planning, yet the impact of alternative natural language plan representation remains unexplored.…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.9
2026-06-23 · Ahmed Anwar, Andreas Wagner, Federico Raue, Tobias Nauen, Andreas Dengel
Research Track A
Accuracy degradation is the standard metric for Catastrophic Forgetting (CF), however, it records only whether forgetting occurred or not. It saturates at the extremes and collapses discretely at task boundaries, hiding the internal structure of what is being forgotten. We introduce six softmax-derived metrics spanning…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.8
2026-05-01 · Jinpai Zhao, Albert Cerrone, Joannes Westerink, Clint Dawson
General AI
Single-agent systems (SAS) have become the default pattern for LLM-driven scientific workflows, but routing planning, tool use, and synthesis through a single context window comes with a well-known cost: as tool specifications and observational traces accumulate, the effective context available for each decision shrink…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.8
2026-05-20 · Wujiang Xu, Yu Wang, Kai Mei, Kaiqu Liang, Zhenting Wang, Mingyu Jin, Han Zhang, Shi-Xiong Zhang, Wenyue Hua, Sambit Sahu, Dimitris N. Metaxas
Research Track A · Research Track B · General AI
Memory is a central capability for LLM agents operating across long-horizon tasks. Existing memory benchmarks predominantly evaluate retention of personalized information in multi-turn chat scenarios, overlooking the dynamic memory formation that occurs during extended agent execution. Consequently, the memory systems …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.8
2026-06-16 · Xuelong Dai, Jianyu Ma, Boyang Ma, Biwei Yan, Yijun Yang, Yue Zhang
Research Track B · General AI
Multimodal Large Language Model (MLLM)-based web agents provide practical, high-precision solutions for visual browser automation; however, they inherently expand the attack surface, introducing novel vision-based vulnerabilities. Existing adversarial evaluations targeting these agents frequently rely on permissive thr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.8
2026-06-27 · Dianwei Chen, Yuan-Zheng Lei, Zifan Zhang, Yuchen Liu, Xianfeng, Yang
General AI
Recent advancements in generative artificial intelligence (AI) and large language models (LLMs) have shown significant promise in automating complex reasoning, summarization, and question-answering tasks. However, the effectiveness of general-purpose LLMs in specialized engineering domains remains limited due to insuff…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.8
2026-06-29 · Haocong He, Chenfei Liao, Zichen Wen, Zihao Dongfang, Xu Zheng, Bin Ren, Chang Su, Zixin Zhang, Harold Haodong Chen, Hongfei Zhang, Weijia Li, Kailun Yang, Conghui He, Xuming Hu, Nicu Sebe, Linfeng Zhang
General AI
Multimodal Large Language Models (MLLMs) have demonstrated promising spatial reasoning capabilities, while these abilities remain underexplored in the emerging visual modality of panoramic imagery. The full 360°$\times$180° field of view of panoramas essentially supports complex global multi-step reasoning, which is al…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 19.8
2026-07-01 · Aryo Pradipta Gema, Beatrice Alex, Pasquale Minervini
General AI
In long-context use, large language models frequently synthesize answers from the meaning of a relevant context span rather than literally copy-pasting them. Identifying which attention heads perform this synthesis matters for interpreting long-context model behavior. Yet existing detectors miss these heads by construc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.6
2026-07-02 · Yanjun Zhao, Ruizhong Qiu, Tianxin Wei, Yuanchen Bei, Zhining Liu, Lingjie Chen, Ismini Lourentzou, Hanghang Tong, Jingrui He
General AI
Understanding and reasoning over long contexts has become a key requirement for deploying large language models (LLMs) in realistic applications. Although recent LLMs support increasingly long context windows, they often fail to use relevant evidence that is already present in the input, revealing a gap between context…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 19.5
2026-04-09 · Chonghan Qin, Xiachong Feng, Weitao Ma, Xiaocheng Feng, Lingpeng Kong
General AI
Existing memory benchmarks for LLM agents evaluate explicit recall of facts, yet overlook implicit memory where experience becomes automated behavior without conscious retrieval. This gap is critical: effective assistants must automatically apply learned procedures or avoid failed actions without explicit reminders. We…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 19.5
2026-04-13 · CocoaBench Team, Shibo Hao, Zhining Zhang, Zhiqi Liang, Tianyang Liu, Yuheng Zha, Qiyue Gao, Jixuan Chen, Zilong Wang, Zhoujun Cheng, Haoxiang Zhang, Junli Wang, Hexi Jin, Boyuan Zheng, Kun Zhou, Yu Wang, Feng Yao, Licheng Liu, Yijiang Li, Zhifei Li, Zhengtao Han, Pracha Promthaw, Tommaso Cerruti, Xiaohan Fu, Ziqiao Ma, Jingbo Shang, Lianhui Qin, Julian McAuley, Eric P. Xing, Zhengzhong Liu, Rupesh Kumar Srivastava, Zhiting Hu
General AI
LLM agents now perform strongly in software engineering, deep research, GUI automation, and various other applications, while recent agent scaffolds and models are increasingly integrating these capabilities into unified systems. Yet, most evaluations still test these capabilities in isolation, which leaves a gap for m…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 19.5
2026-04-16 · Qianqian Xie, Qingheng Xiong, He Zhu, Tiantian Xia, Xueming Han, Fanyu Meng, Jiakai Wang, Zhiqi Bai, Chengkang Jiang, Zhaohui Wang, Yubin Guo, Yuqing Wen, Jiayang Mao, Zijie Zhang, Shihao Li, Yanghai Wang, Yuxiang Ren, Junlan Feng, Jiaheng Liu
General AI
Deep Research Agents (DRAs) aim to solve complex, long-horizon research tasks involving planning, retrieval, multimodal understanding, and report generation, yet their evaluation remains challenging due to dynamic web environments and ambiguous task definitions. We propose DR^{3}-Eval, a realistic and reproducible benc…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 19.5
2026-04-16 · Jun Wang, Shuo Tan, Zelong Sun, Tiancheng Gu, Yongle Zhao, Ziyong Feng, Kaicheng Yang, Cewu Lu
General AI
Retrieval-Augmented Generation (RAG) extends Large Vision-Language Models (LVLMs) with external visual knowledge. However, existing visual RAG systems typically rely on generic retrieval signals that overlook the fine-grained visual semantics essential for complex reasoning. To address this limitation, we propose UniDo…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.5
2026-04-24 · Yuelin Hu, Zhenbo Yu, Zhengxue Cheng, Wei Liu, Li Song
Research Track A · General AI
Many continual-learning methods modify gradients upstream (e.g., projection, penalty rescaling, replay mixing) while treating Adam as a neutral backend. We show this composition has a hidden failure mode. In a high-overlap, non-adaptive 8-domain continual LM, all shared-routing projection baselines collapse close to va…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.5
2026-05-06 · Yazheng Liu, Yuxuan Wan, Rui Xu, Xi Zhang, Sihong Xie, Hui Xiong
Research Track A · General AI
Large language models (LLMs) often suffer from catastrophic forgetting in continual learning: after learning new tasks sequentially, they perform worse on earlier tasks. Existing methods mitigate catastrophic forgetting by data replay, parameter freezing, or regularization. However, these methods lack semantic awarenes…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.5
2026-06-04 · Parth Asawa, Christopher M. Glaze, Gabriel Orlanski, Ramya Ramakrishnan, Benji Xu, Asim Biswal, Vincent Sunn Chen, Frederic Sala, Matei Zaharia, Joseph E. Gonzalez
Research Track A · General AI
Continual learning, the ability of AI systems to improve through sequential experience, has attracted substantial interest, but no high-quality benchmark exists to evaluate it. We introduce Continual Learning Bench (CL-Bench), the first difficult, expert-validated benchmark designed to measure whether LLM-based systems…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 19.5
2026-06-14 · Jingru Guo, Xiangyuan Xue, Lian Zhang, Wanghan Xu, Siki Chen, Philip Torr, Wanli Ouyang, Lei Bai, Zhenfei Yin
General AI
Frontier scientific reasoning remains a major challenge for large language models (LLMs), where even the strongest commercial systems fall short of expert-level performance. A closer look at model behavior reveals substantial complementarity that single-model evaluation hides: different frontier models excel on differe…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 19.5
2026-06-16 · Haowen Liu, Xirui Li, Shaoxiong Yao, Peng Shi, Tianyi Zhou, Jia-Bin Huang, Furong Huang, Jiayuan Mao
General AI
Language models trained on large-scale vision-language data have demonstrated strong potential for embodied agents. Harnessing models through embodied tools use offers a promising alternative to end-to-end vision-language-action systems by combining high-level reasoning with external modules for perception, planning, a…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.5
2026-06-19 · Yu Luo
Research Track A · General AI
Social intelligence is a core competency for language agents, yet current research primarily focuses on static capability evaluation rather than how these skills are continuously shaped and accumulated. This gap calls for a shift toward sustainable learning paradigms. Currently, two methodological pain points exist: so…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.4
2026-06-24 · Zhihao Gu, Lin Wang
Research Track A · General AI
Building a generalist robot that can leverage prior knowledge for continuous task adaptation remains a significant challenge. Previous works alleviate the catastrophic forgetting problem by parameter-efficient fine-tuning for single-task adaptation. However, they fail to extract reusable skills and model the interactio…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 19.4
2026-06-26 · Shoufa Chen, Luyuan Wang, Xuan Yang, Zhiheng Liu, Yuren Cong, Yuanfeng Ji, Feiyan Zhou, Xiaohui Zhang, Fanny Yang, Belinda Zeng
General AI
As large language models and harness frameworks continue to advance, agents operating in terminals are increasingly capable of performing a broader range of general computer-use tasks beyond coding. However, existing benchmarks do not adequately evaluate general-purpose terminal computer-use agents (TUAs): general comp…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 19.4
2026-06-26 · Hohin Kwan, Hongyu Li, Ray Zhang, Manyuan Zhang, Xianghao Kong, Anyi Rao, Jiahao Xie, Si Liu
General AI
Recent interest in multimodal large language models (MLLMs) raises a central question: can they reason over dynamic visual evidence rather than merely recognize objects or events in individual frames? This ability, which we refer to as video temporal-logical reasoning, requires models to maintain, update, and compose e…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.3
2026-04-06 · Shiek Ruksana, Sailesh Kiran Kurra, Thipparthi Sanjay Baradwaj
General AI
Large Language Models (LLMs) have shown strong performance across a wide range of natural language processing tasks; however, their effectiveness is highly dependent on prompt design, structure, and embedded reasoning signals. Conventional prompt engineering methods largely rely on heuristic trial-and-error processes, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.3
2026-04-09 · Ziwei Zhou, Zeyuan Lai, Rui Wang, Yifan Yang, Zhen Xing, Yuqing Yang, Qi Dai, Lili Qiu, Chong Luo
General AI
Text-to-Audio-Video (T2AV) generation is rapidly becoming a core interface for media creation, yet its evaluation remains fragmented. Existing benchmarks largely assess audio and video in isolation or rely on coarse embedding similarity, failing to capture the fine-grained joint correctness required by realistic prompt…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.3
2026-04-09 · Boer Zhang, Mingyan Wu, Dongzhuoran Zhou, Yuqicheng Zhu, Wendong Fan, Puzhen Zhang, Zifeng Ding, Guohao Li, Yuan He
Research Track B · General AI
Deep research requires reasoning over web evidence to answer open-ended questions, and it is a core capability for AI agents. Yet many deep research agents still rely on implicit, unstructured search behavior that causes redundant exploration and brittle evidence aggregation. Motivated by Anthropic's "think" tool parad…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.3
2026-04-14 · Benjamin Stern, Peter Nadel
General AI
LLM agents with persistent memory store information as flat factual records, providing little context for temporal reasoning, change tracking, or cross-session aggregation. Inspired by the drawing effect [3], we introduce dual-trace memory encoding. In this method, each stored fact is paired with a concrete scene trace…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.3
2026-04-21 · Md Nayem Uddin, Kumar Shubham, Eduardo Blanco, Chitta Baral, Gengyu Wang
Research Track A · General AI
Personalized agents that interact with users over long periods must maintain persistent memory across sessions and update it as circumstances change. However, existing benchmarks predominantly frame long-term memory evaluation as fact retrieval from past conversations, providing limited insight into agents' ability to …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.3
2026-04-29 · Happy Bhati
General AI
The arrival of large language models (LLMs) capable of multi-step reasoning, tool use, and long-horizon planning has produced a qualitative shift in software engineering. Where earlier code-completion tools such as GitHub Copilot operated at the granularity of a line or function, modern agentic systems -- Claude Code, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.3
2026-05-01 · Derong Xu, Shuochen Liu, Pengfei Luo, Pengyue Jia, Yingyi Zhang, Yi Wen, Yimin Deng, Wenlin Zhang, Enhong Chen, Xiangyu Zhao, Tong Xu
General AI
Large language model (LLM) agents require long-term user memory for consistent personalization, but limited context windows hinder tracking evolving preferences over long interactions. Existing memory systems mainly rely on static, hand-crafted update rules; although reinforcement learning (RL)-based agents learn memor…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.3
2026-06-08 · Yimu Wang, Yee Man Choi, Barry Zhang, Mozhgan Nasr Azadani, Sean Sedwards, Krzysztof Czarnecki
General AI
Multimodal large language models (MLLMs) achieve strong results on visual reasoning benchmarks, but answer accuracy alone does not indicate whether a model relied on the correct visual evidence. This gap is particularly important in multi-view driving scenes used for autonomous driving, where a model can produce a plau…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.2
2026-06-24 · Mingguang Chen, Bo Qu
General AI
Large language models are increasingly deployed as investment research assistants, yet no benchmark tests whether they can accurately reconstruct and apply the specific procedural decision frameworks of expert investors. We introduce InvestPhilBench, a multi-layer dynamic benchmark spanning eight cognitive tiers, from …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.2
2026-06-24 · Changdae Oh, Wendi Li, Seongheon Park, Samuel Yeh, Tanwi Mallick, Sharon Li
General AI
Process reward models enable fine-grained, step-level evaluation of LLMs, yet building them for agentic settings remains prohibitively difficult: long-horizon interactions, irreversible actions, and stochastic environment feedback make both human annotation and Monte Carlo estimation infeasible at scale. In this work, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.2
2026-06-24 · Yupu Hao, Zhuoran Jin, Huanxuan Liao, Kang Liu, Jun Zhao
General AI
Tool use enables large language models (LLMs) to perform complex tasks, and recent agentic reinforcement learning (RL) methods show promise for enhancing model capabilities. However, RL alone often leads to instability or limited gains in tool-use tasks. In our experiments, some models exhibit catastrophic collapse, wh…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.0
2026-03-30 · Tiantian Wang, Xiang Xiang, Simon S. Du
Research Track A · General AI
In federated healthcare systems, Federated Class-Incremental Learning (FCIL) has emerged as a key paradigm, enabling continuous adaptive model learning among distributed clients while safeguarding data privacy. However, in practical applications, data across agent nodes within the distributed framework often exhibits n…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.0
2026-04-06 · Yuwen Zhai, Runze Li, Liang Wang, Nian Shi, Liwu Xu, Wei Zhang, Ran Lin, Bo Xu, Benlei Cui
Research Track B · General AI
Evaluating GUI agents presents a distinct challenge: trajectories are long, visually grounded, and open-ended, yet evaluation must be both accurate and interpretable. Existing approaches typically apply a single holistic judgment over the entire action-observation sequence-a strategy that proves unreliable on long-hori…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.0
2026-04-20 · Pourya Shamsolmoali, Masoumeh Zareapoor, Eric Granger, William A. P. Smith, Yue Lu
Research Track A · General AI
In continual learning, the primary challenge is to learn new information without forgetting old knowledge. A common solution addresses this trade-off through regularization, penalizing changes to parameters critical for previous tasks. In most cases, this regularization term is directly added to the training loss and o…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 19.0
2026-05-22 · Bin Lin, Bo Zhao, Boyong Wu, Chao Yan, Chen Wu, Cheng Yi, Chengyuan Yao, Daijiao Liu, Fei Tian, Feng Tian, Haiyang Sun, Haoyang Zhang, Jiangjie Zhen, Jinglan Gong, Jun Chen, Li Xie, Peilin Li, Peng Yang, Pengfei Tan, Qingjian Lin, Runze Li, Shenghua Hu, Siyi Zhou, Wenwen Qu, Xiangyu Li, Xiangyu Tony Zhang, Xuerui Yang, Yang Yang, Yechang Huang, Yu Fu, Yuchu Luo, Yuxin Li, Yuxin Zhang, Zhengyan Sheng, Brian Li, Chang Zeng, Changlin Zhang, Chen Geng, Chenghao Dong, Chengli Feng, Dan Zhou, Danni Wan, Di Chen, Die Zhang, Dongqing Pang, Guanglong Yang, Guoqiang Hu, Huangxi Zhu, Jianzheng Gao, Jinghua Liang, Jinmei Wan, Junjie Yuan, Kang An, Lei Lei, Limin Zhong, Lun Cai, Mengqiang Ren, Min Xu, Mingliang Li, Mingxiao Li, Na Wang, Qiang Tong, Qiaoling Huang, Qingfu Du, Rui Wang, Shengchen Zhou, Shi Qiu, Shihao Peng, Shiliang Yang, Siqi Tu, Tianjiao Deng, Ting Xu, Tong Wang, WeiMing Niu, Wuxun Xie, Xianwei Zhang, Xianyu Feng, Xiaojia Liu, Xing Chen, Xiongbin Wu, Yan Wu, Yang Li, Yi Liu, Yifan Zhang, Yile Liu, Yongshen Long, Yu Luo, Yuanhao Ding, Yuhao Wang, Yuhe Yin, Yunfang Xu, Yuxiang Yang, Zhiguo Huang, Zhiyue Wu, Zichao Li, Zichao Zhou, Daxin Jiang, Future Li, Gang Yu, Xiangyu Zhang, Yibo Zhu
General AI
Unified audio-language modeling has emerged as a prominent trend in modern speech systems, promising to bring the reasoning capabilities of large language models to auditory tasks. However, existing unified foundations often struggle to match the depth of specialized systems across automatic speech recognition (ASR), t…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.0
2026-05-28 · Masafumi Enomoto, Ryoma Obara, Haochen Zhang, Masafumi Oyamada
Research Track B · General AI
HTML observations in LLM-based web agents are extremely long, and while many reduction methods have been proposed, it remains unclear which methods reduce overall agent latency while maintaining performance. The main obstacle is the high cost of end-to-end evaluation: in our experiments, evaluating 11 methods across 32…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 19.0
2026-05-29 · Chang-Bin Zhang, Yujie Zhong, Qiang Zhang, Kai Han
General AI
While visually grounded Chain-of-Thought (CoT) has emerged as a promising paradigm to enhance fine-grained perception in multimodal large language models (MLLMs), its efficacy during the inference phase remains underexplored. In this work, we empirically find that mandating explicit object boxes in visually grounded Co…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.0
2026-06-03 · Jiahua Dong, Wenqi Liang, Hongliu Li, Yang Cong, Duzhen Zhang, Hanbin Zhao, Henghui Ding, Yulun Zhang, Salman Khan, Fahad Shahbaz Khan
Research Track A · General AI
Custom diffusion models (CDMs) have garnered significant interest owing to their remarkable capacity for generating personalized concepts. However, the majority of CDMs unrealistically presume that the user's collection of personalized concepts is static and incapable of incremental growth over time. Furthermore, they …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.0
2026-06-03 · Jiaxi Li, Ke Deng, Yun Wang, Jingyuan Huang, Yucheng Shi, Qiaoyu Tan, Jin Lu, Ninghao Liu
Research Track B · General AI
Language agents increasingly rely on reusable skills to improve multi-step web automation across related tasks. A growing line of work studies online skill learning, where agents continually induce skills from previous task trajectories and reuse them in future tasks on the fly. However, existing methods mainly reuse s…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.0
2026-06-13 · Xinze Zhang
Research Track A · General AI
Visual perception of urban streetscapes underpins evidence-based decisions in landscape planning, public health, and place-making. Yet models trained on a few well-photographed metropolises systematically misjudge underrepresented districts, propagating geographic bias into downstream policy. We address this gap with H…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.0
2026-06-15 · Mao-Lin Luo, Yi-Lin Zhang, Zi-Hao Zhou, Yankun Hong, Xialiang Tong, Mingxuan Yuan, Tong Wei, Min-Ling Zhang
Research Track A · General AI
Continual learning for pre-trained vision-language models requires balancing three competing objectives: retaining pre-trained knowledge, preserving knowledge from a sequence of learned tasks, and maintaining the plasticity to acquire new knowledge. This paper presents KeepLoRA++, balancing these objectives through a u…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 19.0
2026-06-29 · Peyman Hosseini, Ondrej Bohdal, Ahmed Alajrami, Andrea Maracani, Ignacio Castro, Matthew Purver, Mete Ozay, Savas Ozkan, Taha Ceritli
General AI
Large Language Model (LLM)-based agents can solve complex procedural tasks by interacting with environments over multiple turns, but this ability typically depends on large models, long contexts, and repeated inference calls. This makes advanced memory-augmented agents difficult to deploy on resource-constrained device…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.9
2026-06-22 · Chuangxin Zhao, Canran Xiao, Siyuan Ma, Mengyao Lyu, Yanbiao Ma, Jun Xia, Guiguang Ding, Yang Liu
Research Track A · General AI
Multimodal large language models (MLLMs) are increasingly required to adapt to non-stationary streams of visual domains, question types, and user instructions, yet continual fine-tuning often causes severe forgetting of previously acquired multimodal skills. Existing continual vision-language methods mainly preserve ou…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.8
2026-05-07 · Tianle Wang, Zhaoyang Wang, Guangchen Lan, Xinpeng Wei, Sipeng Zhang, Guanwen Qiu, Abulhair Saparov
General AI
Reinforcement learning (RL) has been applied to improve large language model (LLM) reasoning, yet the systematic study of how training scales with task difficulty has been hampered by the lack of controlled, scalable environments. We introduce ScaleLogic, a synthetic logical reasoning framework that offers independent …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.8
2026-05-12 · Runhui Huang, Jie Wu, Rui Yang, Zhe Liu, Hengshuang Zhao
General AI
In this paper, we propose AlphaGRPO, a novel framework that applies Group Relative Policy Optimization (GRPO) to AR-Diffusion Unified Multimodal Models (UMMs) to enhance multimodal generation capabilities without an additional cold-start stage. Our approach unlocks the model's intrinsic potential to perform advanced re…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.8
2026-06-29 · Yuhan Zhang, Zhiyuan Guo, Ziheng Zeng, Wei Wang, Wentao Wu, Lijie Xu
General AI
Long-term conversational agents need to remember and query cross-session, multi-typed information with complex correlations. Existing agent memory systems rely on heterogeneous vector and graph databases, which fragment memory information and cause high cross-database I/O latency. For retrieval, common RAG-style method…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.8
2026-06-29 · Haoyang Li, Guanlin Li, Youhe Feng, Chen Zhao, Zhuoran Wang, Yang Li, Qizhe Wei, Shifeng Bao, Haitao Shen, Yihan Zhao, Tong Yang, Jing Zhang
General AI
Cross-embodiment transfer in vision-language-action (VLA) models remains challenging because low-level state and action spaces differ fundamentally across robot platforms. We observe that the high-level cognitive process underlying manipulation, including scene perception, object identification, task planning, and sub-…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.8
2026-07-02 · Meng Wang, Haohan Zhao, Wenzhuo Liu, Lu Yang, Geng Liu, Haiyang Guo, Guo-Sen Xie, Gaofeng Meng, Hongbin Liu, Fei Zhu
Research Track A · General AI
Continual post-training enables foundation models to acquire new knowledge while preserving existing capabilities. Recent work suggests that on-policy learning can mitigate forgetting, with on-policy self-distillation emerging as a particularly attractive approach. In this work, we revisit this optimistic view through …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.5
2026-05-04 · Masafumi Oyamada, Kunihiro Takeoka, Kosuke Akimoto, Ryoma Obara, Masafumi Enomoto, Haochen Zhang, Daichi Haraguchi, Takuya Tamura
Research Track B · General AI
What if a browser agent could learn your work simply by watching you do it? We present cotomi Act, a browser-based computer-using agent that combines reliable multi-step task execution with persistent organizational knowledge learned from user behavior. For execution, an agent scaffold with adaptive lazy observation, v…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.5
2026-05-06 · Andreas Pattichis, Constantine Dovrolis
Research Track A · General AI
LLMs are trained once, then deployed into a world that never stops changing. External memory compensates for this, but most systems manage it explicitly rather than letting it adapt on its own. Biological memory works differently: coupled multi-timescale dynamics make new associations immediately usable, strengthen wha…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.5
2026-05-11 · Qianqian Shi, Yue Che, Faqiang Liu, Hongyi Li, Mingkun Xu, Sandra Reinert, Pieter M. Goltstein, Rong Zhao, Luping Shi
Research Track A
Adaptive behavior requires the brain to transition between distinct contexts while maintaining representations of prior experience. The ability to reconfigure neural representations without erasing previously acquired knowledge is central to learning in dynamic environments, yet the neural mechanisms that support this …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.5
2026-05-12 · Hao Wang, Hanchen Li, Qiuyang Mang, Alvin Cheung, Koushik Sen, Dawn Song
Research Track B · General AI
Agent benchmarks have become the de facto measure of frontier AI competence, guiding model selection, investment, and deployment. However, reward hacking, where agents maximize a score without performing the intended task, emerges spontaneously in frontier models without overfitting. We argue that benchmarks must be se…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.5
2026-05-18 · Ali Zindari, Xiaowen Jiang, Rotem Mulayoff, Sebastian U. Stich
Research Track A · General AI
Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method, yet its learned correction is static: the same low-rank update is applied to every input. This input-agnostic approach creates an inevitable compromise between adapting to the fine-tuning distribution and preserving pre-trained behavior…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.5
2026-05-24 · Yubo Li, Yidi Miao, Yuntian Shen, Yuxin Liu
Research Track B · General AI
Recent advances in multimodal web agents often rely on increased inference-time computation, including rollout search, verifier passes, offline skill discovery, and specialist model stacks. This raises a central question: can a web agent become more efficient as it accumulates experience, rather than more expensive? We…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.5
2026-05-28 · Runze Xu, Arpit Garg, Hemanth Saratchandran, Simon Lucey
Research Track A · General AI
Low-Rank Adaptation (LoRA) has become one of the most widely used fine-tuning mechanisms for adapting large language models to new domains, tasks, and users. Yet adaptation performance alone can obscure an important failure mode: LoRA updates may improve performance on the target distribution while degrading prior capa…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 18.5
2026-05-29 · Qian Kou, Xiaofeng Shi, Yulin Li, Xiaosong Qiu, Xinyang Wang, Hua Zhou, Cao Dongxing
General AI
Multimodal Large Language Models (MLLMs) have demonstrated significant achievements in general visual question answering (VQA) tasks. However, they remain brittle on mechanical engineering drawings, where high annotation density and weak domain knowledge, compounded by unreliable spatial relation reasoning under strict…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.5
2026-06-09 · Masoume Gholizade, Fabrizio Ruffini, Pietro Ducange, Francesco Marcelloni
Research Track A · General AI
Federated Learning (FL) enables collaborative and privacy-preserving model training across distributed clients, but most existing FL systems implicitly assume data stationarity. In real-world settings-such as healthcare, industrial IoT (IIOT), cybersecurity, and smart cities-data streams are inherently non-stationary, …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 18.5
2026-06-09 · Kwai Keye Team, Bin Wen, Changyi Liu, Chengru Song, Chongling Rao, Guowang Zhang, Han Li, Haonan Fan, Hengrui Ju, Jiankang Chen, Jiapeng Chen, Jiawei Yuan, Kaixuan Yang, Kaiyu Jiang, Kun Gai, Lingzhi Zhou, Na Nie, Sen Na, Tianke Zhang, Tingting Gao, Xuanyu Zheng, Yulong Chen, Fan Yang, Haixuan Gao, Lele Yang, Mingqiao Liu, Muxi Diao, Qi Zhang, Qile Su, Wei Chen, Wentao Hong, Xingyu Lu, Yancheng Long, Yankai Yang, Yingxin Li, Yiyang Fan, Yu Xia, Yuzhe Chen, Ziliang Lai, Chuan Yi, Haonan Jia, Tianming Liang, Weixin Xu, Xiaoxiao Ma, Yang Tian, Yufei Han, Feng Han, Hang Li, Jing Wang, Jinghui Jia, Junmin Chen, Junyu Shi, Ruilin Zhang
General AI
We introduce Kwai Keye-VL-2.0-30B-A3B, an open-source Mixture-of-Experts (MoE) multimodal foundation model designed to advance long-video understanding and agentic intelligence. To address the challenges of ultra-long contexts, information redundancy, and prohibitive computational costs inherent in hour-level videos, K…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 18.5
2026-06-12 · Haonan Qi, Jin Cao, Yongqi Zhang, Xintong Wang, Weidong Tang, Bin Chen, Chengfu Huo, Haojun Pan, Hengyu You, Jing Li, Yingde Wang, Liang Ding
General AI
Industrial products such as valves and circuit breakers are defined by dense technical specifications that govern procurement, compatibility, and safety across supply chains. These specifications are scattered across multiple heterogeneous product images, including specification tables, nameplates, and technical drawin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.5
2026-06-14 · Shuaike Zhang, Shaokun Wang, Haoyu Tang, Jianlong Wu, Liqiang Nie
Research Track A · General AI
Embodied Continual Learning (ECL) aims to enable robots to continually acquire new manipulation tasks while retaining previously learned behaviors under closed-loop control. Compared with conventional continual learning, ECL suffers from more severe catastrophic forgetting. Feature drift accumulated under closed-loop c…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-03-15 · Mohamed Aghzal, Gregory J. Stein, Ziyu Yao
Research Track B · General AI
Large language model (LLM) web agents are increasingly used for web navigation but remain far from human reliability on realistic, long-horizon tasks. Existing evaluations focus primarily on end-to-end success, offering limited insight into where failures arise. We propose a hierarchical planning framework to analyze w…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-03-26 · Ünsal Öztürk, Hatef Otroshi Shahreza, Sébastien Marcel
General AI
Multimodal Large Language Models (MLLMs) have recently been explored as face verification systems that determine whether two face images are of the same person. Unlike dedicated face recognition systems, MLLMs approach this task through visual prompting and rely on general visual and reasoning abilities. However, the d…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-03-26 · Cristian Lupascu, Alexandru Lupascu
Research Track A · General AI
Large Language Model based agents increasingly operate in high stakes, multi turn settings where factual grounding is critical, yet their memory systems typically rely on flat key value stores or plain vector retrieval with no mechanism to track the provenance or trustworthiness of stored knowledge. We present Elephant…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-03-30 · Ziqi Miao, Haonan Jia, Lijun Li, Chen Qian, Yuan Xiong, Wenting Yan, Jing Shao
General AI
Reinforcement learning with verifiable rewards (RLVR) has substantially enhanced the reasoning capabilities of multimodal large language models (MLLMs). However, existing RLVR approaches typically rely on outcome-driven optimization that updates both perception and reasoning using a shared reward based solely on the fi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-03-31 · Md Saad, Sajjad Hussain, Mohd Suhaib
General AI
This paper introduces a new hybrid framework that combines Reinforcement Learning (RL) and Large Language Models (LLMs) to improve robotic manipulation tasks. By utilizing RL for accurate low-level control and LLMs for high level task planning and understanding of natural language, the proposed framework effectively co…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-04-13 · Fei Tang, Zhiqiong Lu, Boxuan Zhang, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen
Research Track B · General AI
GUI agents drive applications through their visual interfaces instead of programmatic APIs, interacting with arbitrary software via taps, swipes, and keystrokes, reaching a long tail of applications that CLI-based agents cannot. Yet progress in this area is bottlenecked less by modeling capacity than by the absence of …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-04-13 · Xiaozhe Li, Tianyi Lyu, Yizhao Yang, Liang Shan, Siyi Yang, Ligao Zhang, Zhuoyi Huang, Qingwen Liu, Yang Li
Research Track B · General AI
Large Language Models (LLMs) struggle with long-horizon tasks due to the "context bottleneck" and the "lost-in-the-middle" phenomenon, where accumulated noise from verbose environments degrades reasoning over multi-turn interactions. To address this issue, we introduce a symbiotic framework that decouples context manag…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-04-13 · Haojie Bai, Aimin Li, Ruoyu Yao, Xiongwei Zhao, Tingting Zhang, Xing Zhang, Lin Gao, and Jun Ma
General AI
Closed-loop cooperative driving requires planners that generate realistic multimodal multi-agent trajectories while improving safety and traffic efficiency. Existing diffusion planners can model multimodal behaviors from demonstrations, but they often exhibit weak scene consistency and remain poorly aligned with closed…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-04-16 · Ke Xu, Yuhao Wang, Yu Wang
General AI
Recent advancements in LLM agents are gradually shifting from reactive, text-based paradigms toward proactive, multimodal interaction. However, existing benchmarks primarily focus on reactive responses, overlooking the complexities of proactive intervention and monitoring. To bridge this gap, we introduce ProVoice-Benc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-04-16 · Yan Li, Zezi Zeng, Yifan Yang, Yuqing Yang, Ning Liao, Weiwei Guo, Lili Qiu, Mingxi Cheng, Qi Dai, Zhendong Wang, Zhengyuan Yang, Xue Yang, Ji Li, Lijuan Wang, Chong Luo
Research Track B · General AI
The rapid progress of Artificial Intelligence Generated Content (AIGC) tools enables images, videos, and visualizations to be created on demand for webpage design, offering a flexible and increasingly adopted paradigm for modern UI/UX. However, directly integrating such tools into automated webpage generation often lea…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-04-22 · Jiahao Xie, Alessio Tonioni, Nathalie Rauschmayr, Federico Tombari, Bernt Schiele
General AI
Reinforcement learning (RL) with verifiable rewards (RLVR) has demonstrated the great potential of enhancing the reasoning abilities in multimodal large language models (MLLMs). However, the reliance on language-centric priors and expensive manual annotations prevents MLLMs' intrinsic visual understanding and scalable …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-04-30 · Eyon Jang, Damon Falck, Joschka Braun, Nathalie Kirch, Achu Menon, Perusha Moodley, Scott Emmons, Roland S. Zimmermann, David Lindner
General AI
Reinforcement learning (RL) has become essential to the post-training of large language models (LLMs) for reasoning, agentic capabilities and alignment. Successful RL relies on sufficient exploration of diverse actions by the model during training, which creates a potential failure mode: a model could strategically alt…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-06-04 · Shiyun Xiong, Dongming Wu, Peiwen Sun, Yuang Ai, Bokang Yang, Wencheng Han, Xiao-Hui Li, Xiangyu Yue
General AI
Benchmarks are fundamental for evaluating and advancing LLMs and MLLMs by providing standardized and explicit measures of performance. However, their construction is labor-intensive and hard to reuse, raising concerns about sustainability and scalability. Moreover, existing benchmarks often quickly reach performance sa…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-06-04 · Paul Jünger, Justin Lovelace, Linxi Zhao, Dongyoung Go, Kilian Q. Weinberger
General AI
Discrete diffusion language models generate text by iteratively denoising an entire response in parallel. At each step, they predict tentative tokens for every masked position, committing the confident predictions to the output and discarding the unconfident ones. We show that the discarded tokens are in fact a useful …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-06-05 · Zequn Xie, Junjie Wang, Dan Yang, Jie Feng, Yue Shen, Jian Wang, Jinjie Gu
Research Track B · General AI
Deep research agents have demonstrated remarkable capabilities in complex information-seeking tasks, yet this power comes at a steep computational cost. Driven by accuracy-focused training paradigms, current models adopt brute-force strategies characterized by blind tool dependency and performative reasoning-generating…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-06-09 · Andrew Bo Liu, Samira Nedungadi, Bryce Cai, Alex Kleinman, Harmon Bhasin, Seth Donoughe
General AI
Large language models (LLMs) are rapidly acquiring capabilities relevant to biological research, from literature synthesis to interpretation of experimental data. Increasingly, LLM agents can also perform in silico biology tasks that previously required experienced human biologists. These emerging AI capabilities offer…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-06-09 · Yikang Yang, Zhanpeng Hu, Youtian Lin, Mengqi Zhou, Jingxi Xu, Feihu Zhang, Jiaheng Liu, Yao Yao
General AI
Multimodal large language models can write code to produce complex programs as well as use programs to do 3D modeling, which opens up a new avenue for 3D generation powered by their priors, world knowledge and reasoning. Yet existing benchmarks rarely evaluate 3D modeling through code. Such modeling demands more than r…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-06-11 · Zongsheng Cao, Bihao Zhan, Jinxin Shi, Jiong Wang, Fangchen Yu, Zhijie Zhong, Zijie Guo, Tianshuo Peng, Zhuo Liu, Yi Xie, Xiang Zhuang, Yue Fan, Runmin Ma, Shiyang Feng, Xiangchao Yan, Anran Liu, Peng Ye, Wenlong Zhang, Shufei Zhang, Chunfeng Song, Fenghua Ling, Jie Zhou, Liang He, Bo Zhang, Lei Bai
General AI
Current LLM-based research agents have advanced through agent orchestration, yet largely overlook scientific knowledge orchestration. Existing works often reduce papers to abstracts, surface mentions, and flat \texttt{cites} edges, omitting key entities, claims, evidence, mechanisms, and method lineages essential for s…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-06-11 · Dian Zheng, Harry Lee, Manyuan Zhang, Kaituo Feng, Zoey Guo, Ray Zhang, Hongsheng Li
General AI
Recent image generators have demonstrated impressive photorealism and instruction-following capabilities in single-image generation and editing. However, constrained by their architectures, they cannot achieve interleaved generation (text-image sequence), which has crucial applications in visual narratives, guidance, a…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-06-15 · Truong Thanh Hung Nguyen, Khanh Van Quynh Nguyen, Hoang-Loc Cao, Tri Duong, Phuc Ho, Van Pham, Loc Nguyen, Hung Cao
General AI
Accurate Harmonized Tariff Schedule (HTS) code classification is essential for customs clearance, duty assessment, trade statistics, and regulatory compliance in maritime logistics. However, exact HTS classification remains challenging because product descriptions are often short, incomplete, or ambiguous, while correc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-06-17 · Mohamed Nabail, Leo Cheng, Jingmin Wang, Nicholas Rhinehart
General AI
Preference-based RL provides an approach to learning reward models from pairwise comparisons of behaviors, bypassing the need for explicit reward design. However, existing methods typically rely on passive data collection and suffer from poor sample efficiency, especially during the early stages of learning. We introdu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.2
2026-06-23 · Hovhannes Tamoyan, Sean Narenthiran, Erik Arakelyan, Mira Mezini, Boris Ginsburg
General AI
LLM agents solve repository-level coding tasks through multi-turn tool use, but utilize half their budget on locating faults before editing. Dedicated localization frameworks have emerged, yet are still evaluated as file retrieval rather than actionable diagnosis, producing locations without the diagnostic context a re…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.2
2026-06-23 · Xiaowei Gao, Pengxiang Li, Yitai Cheng, Ruihan Xu, James Haworth, Stephen Law, Yun Ye
General AI
Recent multimodal large language models (MLLMs) have shown strong potential for autonomous driving scene understanding, yet existing methods still face a fundamental trade-off between temporal reasoning and spatial precision. Models that rely on single-frame or low-resolution inputs often miss small, distant, or partia…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.0
2026-01-12 · Jihong Wang, Jiamu Zhou, Weiming Zhang, Weiwen Liu, Zhuosheng Zhang, Xingyu Lou, Weinan Zhang, Huarong Deng, Jun Wang
Research Track B · General AI
With the advancement of vision-language models, web automation has made significant progress. However, deploying autonomous agents in real-world settings remains challenging, primarily due to site heterogeneity, where generalist models lack domain-specific priors for diverse interfaces, and long-horizon instability, ch…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.0
2026-03-09 · Jingbo Yang, Bairu Hou, Wei Wei, Yujia Bao, Shiyu Chang
Research Track B · General AI
Modern agents powered by thinking LLMs achieve high accuracy through long chain-of-thought reasoning but incur substantial inference costs. While many LLMs now support configurable reasoning levels (e.g., high/medium/low), static strategies are often ineffective: using low-effort modes at every step leads to significan…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.0
2026-04-03 · Wei Zou, Mingwen Dong, Miguel Romero Calvo, Shuaichen Chang, Jiang Guo, Dongkyu Lee, Xing Niu, Xiaofei Ma, Yanjun Qi, Jiarong Jiang
Research Track B · General AI
Memory makes LLM-based web agents personalized, powerful, yet exploitable. By storing past interactions to personalize future tasks, agents inadvertently create a persistent attack surface that spans websites and sessions. While existing security research on memory assumes attackers can directly inject into memory stor…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.0
2026-05-07 · Mei Wu, Wenchao Weng, Wenxin Su, Wenjie Tang, Wei Zhou
Research Track A · General AI
In recent years, the integration of non-topological space modeling with temporal learning methods has emerged as an effective approach for capturing spatio-temporal information in non-Euclidean graphs. However, most existing methods rely on static underlying graph structures, which are inadequate for capturing the cont…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.0
2026-05-11 · Debashis Guha
Research Track A · General AI
Every adaptive learning system must alternate between two operations: consolidating what it already knows and expanding into new evidence. We propose \emph{Consolidation-Expansion Operator Mechanics} (OpMech), a framework that makes this structure precise. The central object is the \emph{order-gap} $\Ogap(θ; e)$, the d…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.0
2026-06-15 · Anqi Zou, Han Deng, Chengyu Zhang, Junquan Hu, Yu Wang, Yuxiang Xing, Aokai Zhang, Hanling Zhang, Zhaoyang Liu, Ben Fei, Zhihui Wang, Wanli Ouyang
Research Track B · General AI
Current computer-use benchmarks primarily focus on software operation tasks in virtualized systems, whereas scientific instrumentation scenarios require coordinated control over complex interfaces, and feedback-driven parameter adjustment. However, directly evaluating agents on physical high-precision instruments is im…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.0
2026-06-29 · Haoliang Han
Research Track A · General AI
Long-running language agents need mechanisms for deciding which experiences should persist after the working context is gone. Retrieval systems can reinsert past text, but they do not by themselves show that an experience has been selectively consolidated into the model's own behavior. We introduce EVAF, an Echo-Valenc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.9
2026-06-23 · Beining Wu, Zihao Ding, Jun Huang, Yanxiao Zhao
Research Track A · General AI
On-device language-model agents improve by accumulating experience in retrieved memory rather than by updating weights. This memory is hard-bounded and exposed: it consumes RAM and energy, reaches peers through a thin uplink, and becomes an attack surface because it is writable by what the agent reads. Existing systems…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.8
2026-05-02 · Zebin Guo, Weidong Geng, Ruichen Mao
General AI
Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by grounding responses in external knowledge during inference. However, conventiona RAG systems under-perform on structured tabular data, largely due to coarse retrieval granularity and insufficient table semantic comprehension. To address these…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.8
2026-05-12 · Yuangong Chen, Wai Keung Wong, Jiaxing Li, Ioannis Patras, Xu Zheng
General AI
Multimodal Large Language Models (MLLMs) show strong visual perception, yet remain limited in reasoning about space under changing viewpoints. We study this challenge as Perspective-Conditioned Spatial Reasoning (PCSR) in 360-degree omnidirectional images, where broad scene coverage reduces ambiguity from partial obser…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.8
2026-05-12 · Alireza Nadali, Patrick Cooper, Ashutosh Trivedi, Alvaro Velasquez
General AI
We introduce KV-Fold, a simple, training-free long-context inference protocol that treats the key-value (KV) cache as the accumulator in a left fold over sequence chunks. At each step, the model processes the next chunk conditioned on the accumulated cache, appends the newly produced keys and values, and passes the enl…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.8
2026-05-22 · Jiarui Guo, Haojia Wei, Yiming Zhang, Yifei Liu, Yuning Gong, Hongjie Zhang, Xue Yang, Zhihang Zhong
General AI
Virtual photography asks an agent to enter a prepared 3D scene with no preselected camera pose or reference image, infer a suitable shot from scene information and a language intent, choose executable camera parameters, and render the final photograph. Recent progress in vision-language models makes this kind of spatia…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.8
2026-05-27 · Jizhan Fang, Buqiang Xu, Zhixian Wang, Haoliang Cao, Xinle Deng, Baohua Dong, Hangcheng Zhu, Ruohui Huang, Gang Yu, Ying Wei, Guozhou Zheng, Feiyu Xiong, Haofen Wang, Huajun Chen, Ningyu Zhang
Research Track A · General AI
Existing memory-augmented LLM agents often treat memory as a static repository with pre-defined representations and fixed retrieval pipelines, which is brittle in dynamic agentic environments where feedback, task variation, and heterogeneous signals continuously reshape what should be remembered and how it should be co…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.8
2026-05-28 · Chunru Lin, Hongxin Zhang, Fenghao Yu, Zhehuan Chen, Thomas L. Griffiths, Yejin Choi, David Held, Chuang Gan
General AI
The ability to reason, adapt, and creatively solve problems under unexpected challenges is essential for robots operating in real-world environments. However, current robotic benchmarks primarily emphasize skill-level execution and provide limited insight into such cognitive reasoning capabilities. We introduce RoboWit…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.8
2026-06-30 · Ruijia Zhang, Jiacheng Zhu, Hanqing Zhu, Laixi Shi
General AI
Low-rank adaptation (LoRA) and its variants enable parameter-efficient fine-tuning of large language models under the supervised fine-tuning (SFT) paradigm. However, their efficacy and behavior under Reinforcement learning with verifiable rewards (RLVR) are less well understood. In particular, two structurally initiali…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.7
2026-04-23 · Qijun Han, Haoqin Tu, Zijun Wang, Haoyue Dai, Yiyang Zhou, Nancy Lau, Alvaro A. Cardenas, Yuhui Xu, Ran Xu, Caiming Xiong, Zeyu Zheng, Huaxiu Yao, Yuyin Zhou, Cihang Xie
Research Track B · General AI
Autonomous GUI agents face two fundamental challenges: early stopping, where agents prematurely declare success without verifiable evidence, and repetitive loops, where agents cycle through the same failing actions without recovery. We present VLAA-GUI, a modular GUI agentic framework built around three integrated comp…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.6
2026-07-02 · Yunhe Li, Hao Shi, Wenhao Liu, Mengzhe Ruan, Hanxu Hou, Zhongxiang Dai, Shuang Qiu, Linqi Song
General AI
On-policy self-distillation (OPSD) has emerged as a practical method for training large language models (LLMs) to reason, where a single model acts as both the teacher and the student with different levels of information access. However, recent studies have found that the teacher's dense token-level supervision, condit…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.6
2026-07-02 · Cristian-Gabriel Florea, Stelian Spînu
General AI
Over 285 million people worldwide live with a visual impairment, for whom everyday tasks such as avoiding obstacles, locating personal belongings, recognizing familiar faces, or handling cash remain persistent obstacles to personal autonomy. Existing assistive applications are typically limited to recognizing predefine…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.5
2026-03-15 · Xudong Wang, Gan Li, Zhiyu Liu, Yao Wang, Lianqing Liu, Zhi Han
Research Track A · General AI
Deploying vision-and-language navigation (VLN) agents requires adaptation across diverse scenes and environments, but fine-tuning on a specific scenario often causes catastrophic forgetting in others, which severely limits flexible long-term deployment. We formalize this challenge as the all-day multi-scenes lifelong V…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.5
2026-03-26 · Dingjie Song, Tianlong Xu, Yi-Fan Zhang, Hang Li, Zhiling Yan, Xing Fan, Haoyang Li, Lichao Sun, Qingsong Wen
General AI
Assessing student handwritten scratchwork is crucial for personalized educational feedback but presents unique challenges due to diverse handwriting, complex layouts, and varied problem-solving approaches. Existing educational NLP primarily focuses on textual responses and neglects the complexity and multimodality inhe…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.5
2026-03-29 · Shijian Wang, Jiarui Jin, Runhao Fu, Zexuan Yan, Xingjian Wang, Mengkang Hu, Eric Wang, Xiaoxi Li, Kangning Zhang, Li Yao, Wenxiang Jiao, Xuelian Cheng, Yuan Lu, Zongyuan Ge
General AI
Research agents have recently achieved significant progress in information seeking and synthesis across heterogeneous textual and visual sources. In this paper, we introduce MuSEAgent, a multimodal reasoning agent that enhances decision-making by extending the capabilities of research agents to discover and leverage st…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.5
2026-03-29 · Shi Qiu, Junyi Deng, Yiwei Deng, Haoran Dong, Jieyu Fu, Mao Li, Zeyu Li, Zhaolong Zhang, Huiwen Zheng, Leidong Bao, Anqi Lv, Zihan Mo, Yadi Niu, Yiyang Peng, Yu Tian, Yili Wang, Ziyu Wang, Zi-Yu Wang, Jiashen Wei, Liuheng Wu, Aoran Xue, Leyi Yang, Guanglu Yuan, Xiarui Zhan, Jingjun Zhang, Zifan Zheng, Pengfei Liu, Linrui Zhen, Kaiyang Li, Qichang Li, Ziheng Zhou, Guo-En Nian, Yunwei Xiao, Qing-Hong Cao, Linjie Dai, Xu Feng, Peng Gao, Ying Gu, Chang Liu, Jia Liu, Ming-xing Luo, Yan-Qing Ma, Liang-You Peng, Huichao Song, Shufeng Wang, Chenxu Wang, Tao Wang, Yi-Nan Wang, Chengyin Wu, Pengwei Zhao, Hua Xing Zhu
General AI
AI agents powered by large language models exhibit strong reasoning and problem-solving capabilities, enabling them to assist scientific research tasks such as formula derivation and code generation. However, whether these agents can reliably perform end-to-end reproduction from real scientific papers remains an open q…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.5
2026-04-09 · Makanjuola Ogunleye, Eman Abdelrahman, Ismini Lourentzou
General AI
Large multimodal models are increasingly used as the reasoning core of embodied agents operating in 3D environments, yet they remain prone to hallucinations that can produce unsafe and ungrounded decisions. Existing inference-time hallucination mitigation methods largely target 2D vision-language settings and do not tr…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.5
2026-04-14 · Yunhui Jang, Lu Zhu, Jake Fawkes, Alisandra Kaye Denton, Dominique Beaini, Emmanuel Noutahi
General AI
Large language models (LLMs) have recently gained significant attention as a promising approach to accelerate scientific discovery. However, their application in open-ended scientific domains such as biology remains limited, primarily due to the lack of factually grounded and actionable explanations. To address this, w…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.5
2026-04-17 · Rohit Sinha, Aditya Kanade, Sai Srinivas Kancheti, Vineeth N Balasubramanian, Tanuja Ganu
General AI
Multimodal large language models (MLLMs) have achieved impressive progress on vision language benchmarks, yet their capacity for visual cognitive and visuospatial reasoning remains less understood. We introduce "Mind's Eye", a multiple-choice benchmark of eight visuo-cognitive tasks inspired by classic human intelligen…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.5
2026-04-22 · Muzhi Zhu, Shunyao Jiang, Huanyi Zheng, Zekai Luo, Hao Zhong, Anzhou Li, Kaijun Wang, Jintao Rong, Yang Liu, Hao Chen, Tao Lin, Chunhua Shen
General AI
Spatial intelligence is essential for multimodal large language models, yet current benchmarks largely assess it only from an understanding perspective. We ask whether modern generative or unified multimodal models also possess generative spatial intelligence (GSI), the ability to respect and manipulate 3D spatial cons…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.5
2026-04-22 · Juyong Jiang, Chenglin Cai, Chansung Park, Jiasi Shen, Sunghun Kim, Jianguo Li, Yue Wang
General AI
While Large Language Models (LLMs) excel at function-level code generation, project-level tasks such as generating functional and visually aesthetic multi-page websites remain highly challenging. Existing works are often limited to single-page static websites, while agentic frameworks typically rely on multi-turn execu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.5
2026-04-29 · Qisheng Hu, Quanyu Long, Wenya Wang
Research Track A · General AI
Memory-augmented LLM agents offer an appealing shortcut to continual learning: rather than updating model parameters, they accumulate experience in external memory, seemingly sidestepping the stability-plasticity dilemma of parametric learning. We show that this challenge does not disappear but resurfaces at the memory…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.5
2026-04-30 · Zihao Li, Jiaru Zou, Feihao Fang, Xuying Ning, Mengting Ai, Tianxin Wei, Sirui Chen, Xiyuan Yang, Jingrui He
General AI
Agentic large language model systems have demonstrated strong capabilities. However, their reliance on language as the universal interface fundamentally limits their applicability to many real-world problems, especially in scientific domains where domain-specific foundation models have been developed to address special…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.5
2026-06-04 · Jiayu Liu, Cheng Qian, Zhenhailong Wang, Bingxuan Li, Jiateng Liu, Heng Wang, Jeonghwan Kim, Yumeng Wang, Xiusi Chen, Yi R. Fung, Heng Ji
General AI
Planning for real-world problems by language models often involves both world and user constraints, which may not be fully specified upfront and are progressively disclosed through interaction. However, existing benchmarks still underexplore adaptive planning under such progressively revealed dual constraints. To addre…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.5
2026-06-04 · Ashutosh Hathidara, Sai Shruthi Sistla, Sebastian Schreiber, Sahil Bansal
General AI
Large language models deployed as agents over large tool catalogs face a critical tool-retrieval bottleneck. As embedding-based retrieval approaches rely on compact encoders that may under-capture specialized tool semantics, parametric tool retrieval addresses this by encoding each tool as a virtual token appended to t…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.5
2026-06-08 · Haoran Sun, Wenjie Li, Yujie Zhang, Zekai Lin, Fanrui Zhang, Kaitao Chen, Xingqi He, Yichen Li, Mianxin Liu, Lei Liu, Yankai Jiang
General AI
Medical agent systems are increasingly expected to support interactive clinical decision making rather than only static question answering. In such settings, effective agents must reuse prior experience across evolving cases, yet existing memory mechanisms often retain raw historical traces that are redundant, noisy, a…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.5
2026-06-08 · Han Zhou, Adam X. Yang, Laurence Aitchison, Anna Korhonen, Albert Q. Jiang
General AI
Reinforcement learning with verifiable rewards (RLVR) has become a leading paradigm for improving the reasoning ability of large language models through outcome-based supervision. However, verifiable rewards frequently become uninformative at the group level: when all sampled traces of a given prompt receive identical …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.5
2026-06-10 · Ahmed Sharshar, Naveen Kumar Kummari, Mohsen Guizani
Research Track A · General AI
Continual learning (CL) models often use experience replay to reduce catastrophic forgetting, but their robustness to replay sampling interference remains underexplored. Existing CL attacks alter inputs or training pipelines (poisoning/backdoors) and rarely include explicit auditable constraints, limiting realism. Here…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.5
2026-06-16 · Jian Yang, Shawn Guo, Wei Zhang, Tianyu Zheng, Yaxin Du, Haau-Sing Li, Jiajun Wu, Yue Song, Yan Xing, Qingsong Cai, Zelong Huang, Chuan Hao, Ran Tao, Xianglong Liu, Wayne Xin Zhao, Mingjie Tang, Weifeng Lv, Ming Zhou, Bryan Dai
General AI
Looped Transformers scale latent computation by repeatedly applying shared blocks, but sequential looping increases latency and KV-cache memory with the loop count. Parallel loop Transformers (PLT) alleviate this cost through cross-loop position offsets (CLP) and shared-KV gated sliding-window attention, making loop co…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.5
2026-06-19 · Jianwei Lou
Research Track A · General AI
Continual learning that is gradient-free, local, online, and append-only is attractive for edge and streaming deployment, but its value is usually argued informally. We give a provable account on recurring-regime streams. Given segmentation, a warm-start library learner attains amortized recovery cost $O\!\big(KD/\vare…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-03-26 · Abdullah Hamdi, Changchun Yang, Xin Gao
General AI
Early screening via colonoscopy is critical for colon cancer prevention, yet developing robust AI systems for this domain is hindered by the lack of densely annotated, long-sequence video datasets. Existing datasets predominantly focus on single-class polyp detection and lack the rich spatial, temporal, and linguistic …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-03-26 · Liang Zhang, Yu Fu, Xinyi Jin
General AI
Large Language Models (LLMs) are increasingly used in math education not only as problem solvers but also as assessors of learners' reasoning. However, it remains unclear whether stronger math problem-solving ability is associated with stronger step-level assessment performance. This study examines that relationship us…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-03-26 · André G. Viveiros, Nuno Gonçalves, Matthias Lindemann, André Martins
General AI
While language reasoning models excel in many tasks, visual reasoning remains challenging for current large multimodal models (LMMs). As a result, most LMMs default to verbalizing perceptual content into text, a strong limitation for tasks requiring fine-grained spatial and visual understanding. While recent approaches…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-03-26 · Jiwook Han, Geo Ahn, Youngrae Kim, Jinwoo Choi
General AI
Multimodal Large Language Models (MLLMs) have shown strong performance on Video Temporal Grounding (VTG). However, their coarse recognition capabilities are insufficient for fine-grained temporal understanding, making task-specific fine-tuning indispensable. This fine-tuning causes models to memorize dataset-specific s…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-03-26 · Jingwei Ni, Yihao Liu, Xinpeng Liu, Yutao Sun, Mengyu Zhou, Pengyu Cheng, Dexin Wang, Xiaoxi Jiang, Guanjun Jiang
General AI
Equipping Large Language Model (LLM) agents with domain-specific skills is critical for tackling complex tasks. Yet, manual authoring creates a severe scalability bottleneck. Conversely, automated skill generation often yields fragile or fragmented results because it either relies on shallow parametric knowledge or seq…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-03-30 · Huanxuan Liao, Zhongtao Jiang, Yupu Hao, Yuqiao Tan, Shizhu He, Jun Zhao, Kun Xu, Kang Liu
General AI
Multimodal Large Language Models (MLLMs) achieve stronger visual understanding by scaling input fidelity, yet the resulting visual token growth makes jointly sustaining high spatial resolution and long temporal context prohibitive. We argue that the bottleneck lies not in how post-encoding representations are compresse…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-03-31 · Fumihiko Tsuchiya, Taiki Miyanishi, Mahiro Ukai, Nakamasa Inoue, Shuhei Kurita, Yusuke Iwasawa, Yutaka Matsuo
General AI
Counting in long videos remains a fundamental yet underexplored challenge in computer vision. Real-world recordings often span tens of minutes or longer and contain sparse, diverse events, making long-range temporal reasoning particularly difficult. However, most existing video counting benchmarks focus on short clips …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-06 · Ke Li, Maoliang Li, Jialiang Chen, Jiayu Chen, Zihao Zheng, Shaoqi Wang, Xiang Chen
General AI
Video mashup creation represents a complex video editing paradigm that recomposes existing footage to craft engaging audio-visual experiences, demanding intricate orchestration across semantic, visual, and auditory dimensions and multiple levels. However, existing automated editing frameworks often overlook the cross-l…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-06 · Shuai Liu, Shulin Tian, Kairui Hu, Yuhao Dong, Zhe Yang, Bo Li, Jingkang Yang, Chen Change Loy, Ziwei Liu
General AI
Coworking AI agents operating within local file systems are rapidly emerging as a paradigm in human-AI interaction; however, effective personalization remains limited by severe data constraints, as strict privacy barriers and the difficulty of jointly collecting multimodal real-world traces prevent scalable training an…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-07 · Wang Yang, Chaoda Song, Xinpeng Li, Debargha Ganguly, Chuang Ma, Shouren Wang, Zhihao Dou, Yuli Zhou, Vipin Chaudhary, Xiaotian Han
General AI
Existing Agent benchmarks suffer from two critical limitations: high environment interaction overhead (up to 41\% of total evaluation time) and imbalanced task horizon and difficulty distributions that make aggregate scores unreliable. To address these issues, we propose ACE-Bench built around a unified grid-based plan…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-07 · Bowen Ye, Rang Li, Qibin Yang, Yuanxin Liu, Linli Yao, Hanglong Lv, Zhihui Xie, Chenxin An, Lei Li, Lingpeng Kong, Qi Liu, Zhifang Sui, Tong Yang
General AI
Large language models are increasingly deployed as autonomous agents executing multi-step workflows in real-world software environments. However, existing agent benchmarks suffer from three critical limitations: (1) trajectory-opaque grading that checks only final outputs, (2) underspecified safety and robustness evalu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-07 · Juekai Lin, Yun Zhu, Honglin Lin, Sijing Li, Tianwei Lin, Zheng Liu, Xiaoyang Wang, Wenqiao Zhang, Lijun Wu
General AI
Graphics Program Synthesis is pivotal for interpreting and editing visual data, effectively facilitating the reverse-engineering of static visuals into editable TikZ code. While TikZ is the de facto standard for scientific schematics due to its programmatic flexibility, its requirement for rigorous spatial precision pr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-09 · Tanmay Gupta, Piper Wolters, Zixian Ma, Peter Sushko, Rock Yuren Pang, Diego Llanes, Yue Yang, Taira Anderson, Boyuan Zheng, Zhongzheng Ren, Harsh Trivedi, Taylor Blanton, Caleb Ouellette, Winson Han, Ali Farhadi, Ranjay Krishna
Research Track B · General AI
Web agents--autonomous systems that navigate and execute tasks on the web on behalf of users--have the potential to transform how people interact with the digital world. However, the most capable web agents today rely on proprietary models with undisclosed training data and recipes, limiting scientific understanding, r…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-09 · Hang Ye, Xiaoxuan Ma, Fan Lu, Wayne Wu, Kwan-Yee Lin, Yizhou Wang
General AI
Digital human generation has been studied for decades and supports a wide range of real-world applications. However, most existing systems are passively animated, relying on privileged state or scripted control, which limits scalability to novel environments. We instead ask: how can digital humans actively behave using…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-12 · Cheng-Yen Li, Xuanjun Chen, Claire Lin, Wei-Yu Chen, Wenhua Nie, Hung-Yi Lee, Jyh-Shing Roger Jang
Research Track A · General AI
Large Language Models (LLMs) struggle with knowledge-intensive tasks due to hallucinations and fragmented reasoning over dispersed information. While Retrieval-Augmented Generation (RAG) grounds generation in external sources, existing methods often treat evidence as isolated units, failing to reconstruct the logical c…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-13 · Mihir Prabhudesai, Aryan Satpathy, Yangmin Li, Zheyang Qin, Nikash Bhardwaj, Amir Zadeh, Chuan Li, Katerina Fragkiadaki, Deepak Pathak
General AI
We have witnessed remarkable advances in LLM reasoning capabilities with the advent of DeepSeek-R1. However, much of this progress has been fueled by the abundance of internet question-answer (QA) pairs, a major bottleneck going forward, since such data is limited in scale and concentrated mainly in domains like mathem…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-13 · Artem Gadzhiev, Andrew Kislov
General AI
Providing AI agents with reliable long-term memory that does not hallucinate remains an open problem. Current approaches to memory for LLM agents -- sliding windows, summarization, embedding-based RAG, and flat fact extraction -- each reduce token cost but introduce catastrophic information loss, semantic drift, or unc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-14 · Sophia Sirko-Galouchenko, Monika Wysoczanska, Andrei Bursuc, Nicolas Thome, Spyros Gidaris
General AI
Multimodal large language models (MLLMs) perform well on many vision-language tasks but often struggle with vision-centric problems that require fine-grained visual reasoning. Recent evidence suggests that this limitation arises not from weak visual representations, but from under-utilization of visual information duri…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-14 · Yulin Chen, Tri Cao, Haoran Li, Yue Liu, Yibo Li, Yufei He, Le Minh Khoi, Yangqiu Song, Shuicheng Yan, Bryan Hooi
Research Track B · General AI
Web agents powered by vision-language models (VLMs) enable autonomous interaction with web environments by perceiving and acting on both visual and textual webpage content to accomplish user-specified tasks. However, they are highly vulnerable to prompt injection attacks, where adversarial instructions embedded in HTML…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-18 · Pollawat Hongwimol, Haoning Shang, Chutong Wang, Zhichao Wan, Yi Gao, Yuanming Li, Lin Gui, Wenhao Sun, Cheng Yu
Research Track A · General AI
Product attribute extraction in e-commerce is bottlenecked by ontologies that are inconsistent, incomplete, and costly to maintain. We present AutoPKG, a multi-agent Large Language Model (LLM) framework that automatically constructs a Product-attribute Knowledge Graph (PKG) from multimodal product content. AutoPKG indu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-20 · Terry Leitch
General AI
We present a systematic evaluation of large language model families -- spanning both proprietary cloud APIs and locally-hosted open-source models -- on two purpose-built benchmarks for System Dynamics AI assistance: the \textbf{CLD Leaderboard} (53 tests, structured causal loop diagram extraction) and the \textbf{Discu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-20 · Shaden Alshammari, Kevin Wen, Abrar Zainal, Mark Hamilton, Navid Safaei, Sultan Albarakati, William T. Freeman, Antonio Torralba
General AI
Mathematical problem solving remains a challenging test of reasoning for large language and multimodal models, yet existing benchmarks are limited in size, language coverage, and task diversity. We introduce MathNet, a high-quality, large-scale, multimodal, and multilingual dataset of Olympiad-level math problems toget…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-20 · Harish Santhanalakshmi Ganesan
General AI
Persistent memory is the bottleneck separating stateless chatbots from long-running agentic systems. Retrieval-augmented generation (RAG) over flat vector stores fragments facts into chunks, loses cross-session identity, and has no first-class notion of supersession or contradiction. Recent bitemporal knowledge-graph s…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-21 · Anton Kolonin, Alexey Glushchenko, Evgeny Bochkov, Abhishek Saxena
General AI
Evaluating the reasoning capabilities of Large Language Models (LLMs) for complex, quantitative financial tasks is a critical and unsolved challenge. Standard benchmarks often fail to isolate an agent's core ability to parse queries and orchestrate computations. To address this, we introduce a novel evaluation methodol…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-21 · Jing Jin, Hao Liu, Yan Bai, Yihang Lou, Zhenke Wang, Tianrun Yuan, Juntong Chen, Yongkang Zhu, Fanhu Zeng, Xuanyu Zhu, Yige Xu
General AI
Multimodal large language models (MLLMs) have shown promising reasoning abilities, yet evaluating their performance in specialized domains remains challenging. STEM reasoning is a particularly valuable testbed because it provides highly verifiable feedback, but existing benchmarks often permit unimodal shortcuts due to…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-27 · Lawrence Keunho Jang, Jing Yu Koh, Daniel Fried, Ruslan Salakhutdinov
Research Track B · General AI
Existing web agent benchmarks have largely converged on short, single-site tasks that frontier models are approaching saturation on. However, real world web use consists of long-horizon, multi-site workflows. Common web navigation tasks, such as comparing products across different domains, planning trips across multipl…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-29 · Zhixin Han, Yanzhi Zhang, Chuyang Wei, Maohang Gao, Xiawei Yue, Kefei Chen, Yu Zhuang, Haoxiang Guan, Jiyan He, Jian Li, Yitong Duan, Yu Shi, Mengting Hu, Shuxin Zheng
General AI
Live future prediction refers to the task of making predictions about real-world events before they unfold. This task is increasingly studied using large language model-based agent systems, and it is important for building agents that can continually learn from real-world. Just as interactive environments have often dr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-30 · Bo Zhang, Tzu-Yen Ma, Zichen Tang, Junpeng Ding, Zirui Wang, Yizhuo Zhao, Peilin Gao, Zijie Xi, Zixin Ding, Haiyang Sun, Haocheng Gao, Yuan Liu, Liangjia Wang, Yiling Huang, Yujie Wang, Yuyue Zhang, Ronghui Xi, Yuanze Li, Jiacheng Liu, Zhongjun Yang, Haihong E
General AI
We introduce AEGIS, A holistic benchmark for Evaluating forensic analysis of AI-Generated academic ImageS. Compared to existing benchmarks, AEGIS features three key advances: (1) Domain-Specific Complexity: covering seven academic categories with 39 fine-grained subtypes, exposing intrinsic forensic difficulty, where e…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-04-30 · Jialu Shen, Han Lyu, Suyang Zhong, Hanzheng Li, Haoyi Tao, Nan Wang, Changhong Chen, Xi Fang
General AI
Spectra are a prevalent yet highly information-dense form of scientific imagery, presenting substantial challenges to multimodal large language models (MLLMs) due to their unstructured and domain-specific characteristics. Here we introduce SpecVQA, a professional scientific-image benchmark for evaluating multimodal mod…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-05-01 · Siyuan Huang, Xiaoye Qu, Yafu Li, Tong Zhu, Zefeng He, Muxin Fu, Daizong Liu, Wei-Long Zheng, Yu Cheng
General AI
While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Dilution" phenomenon, where the accumulation of textual history expands the attention partition function, causing visual attention to decay inversely with generated sequence lengt…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-06-08 · Yixia Li, Hongru Wang, Peng Lai, Zhiwen Ruan, He Zhu, Youxin Zhu, Ganlong Zhao, Minda Hu, Yun Chen, Sibei Yang, Peng Li, Jeff Z. Pan, Jia Pan, Guanhua Chen, Yang Liu, Guanbin Li
General AI
Large language model (LLM)-based agents are increasingly used in interactive textual environments, from web navigation and code editing to tool use and long-horizon dialogue. Yet many remain largely reactive, mapping observations to actions without an explicit model of how these environments are structured and evolve. …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-06-08 · Letian Li, Chao Shen, Shuzhao Xie, Chenghao Gu, ZhengXiao He, Yu Meng, Xin Yang, Wenyuan Jiang, Zhi Wang
General AI
Text-driven indoor scene generation and editing require an intermediate representation that language models can both produce and revise. Existing LLM-based systems often rely on scene graphs or global constraint lists, which are compact but underspecify local geometry and make instruction-based edits difficult to local…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-06-11 · Jundong Xu, Qingchuan Li, Jiaying Wu, Yihuai Lan, Shuyue Stella Li, Huichi Zhou, Bowen Jiang, Lei Wang, Jun Wang, Anh Tuan Luu, Caiming Xiong, Hae Won Park, Bryan Hooi, Zhiyuan Hu
General AI
Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. In contrast, real-world deployment is inherently dynamic, requiring agents to continually align their knowledge, skills, and behavior with changing environments and updated …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-06-15 · Anzhe Xie, Weihang Su, Yujia Zhou, Yiqun Liu, Qingyao Ai
General AI
Meta-analysis is a demanding form of evidence synthesis that combines literature retrieval, PI/ECO-guided study selection, and statistical aggregation. Its structured, verifiable workflow makes it an ideal substrate for evaluating systematic scientific reasoning, yet existing benchmarks lack ground truth across the ful…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-06-15 · Minghang Zhu, Chuyang Wei, Junhao Xu, Yilin Cheng, Zhumin Chen, Jiyan He
General AI
Deep research agents synthesize long-form reports by searching and reasoning over retrieved evidence. Reinforcement learning with rubric-based rewards improves these agents by optimizing them against checkable criteria that translate report quality into reward signals, but its efficiency depends on whether those criter…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-06-15 · Haonan Ge, Yiwei Wang, Hang Wu, Yujun Cai
Research Track A · General AI
Streaming video understanding models must answer queries at any moment during an ongoing stream, using only what they have observed so far and under fixed memory and computation budgets. Existing methods address this by adding memory banks, retrieval modules, or visual token compression to preserve long-range history. …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.2
2026-06-28 · Mengqi Yuan, Zilong Zhou, Xinzhuang Xiong, Weiming Wu, Jiayang Sun, Jiamin Song, Kaiqian Cui, Bowen Wang, Haoyuan Wu, Yitong Li, Dunjie Lu, Haikong Lu, Qi Zhen, Xinyuan Wang, Jiaqi Deng, Yuhao Yang, Cheng Chen, Boyuan Zheng, Alex Su, Xiao Yu, Hao Zou, Saaket Agashe, Xing Han Lu, Manpreet Kaur, Zhengyang Qi, Vincent Sunn Chen, Frederic Sala, Dayiheng Liu, Junyang Lin, Zhou Yu, Yu Su, Siva Reddy, Xin Eric Wang, Peng Qi, Tianbao Xie, Tao Yu
Research Track B · General AI
Existing computer-use benchmarks fail to capture the realism, complexity, and long-horizon demands of real-world computer use, limiting their ability to reveal the limitations of frontier agents. We introduce OSWorld 2.0, a benchmark of 108 long-horizon computer-use workflows across everyday and professional tasks, des…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.0
2026-04-14 · Zhiyuan Zeng, Jiameng Huang, Zhangyue Yin, Jiashuo Liu, Ziniu Li, Bingrui Li, Yuhao Wu, Yining Zheng, Ge Zhang, Wenhao Huang, Xipeng Qiu
General AI
Reinforcement learning with verifiable rewards (RLVR) has become a central paradigm for improving reasoning and code generation in large language models, and GRPO-style training is widely adopted for its simplicity and effectiveness. However, an important design choice remains underexplored: how token-level policy grad…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.0
2026-04-18 · Zhaokang Liao, Yingguo Gao, Yi Yang, Yongheng Hu, Jingting Ding
Research Track A · General AI
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising approach to improve the reasoning abilities of Large Language Models (LLMs). Among RLVR algorithms, Group Relative Policy Optimization (GRPO) and its variants have demonstrated strong performance and high training efficiency. However, GRPO…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.0
2026-04-24 · Meng Chu, Xuan Billy Zhang, Kevin Qinghong Lin, Lingdong Kong, Jize Zhang, Teng Tu, Weijian Ma, Ziqi Huang, Senqiao Yang, Wei Huang, Yeying Jin, Zhefan Rao, Jinhui Ye, Xinyu Lin, Xichen Zhang, Qisheng Hu, Shuai Yang, Leyang Shen, Wei Chow, Yifei Dong, Fengyi Wu, Quanyu Long, Bin Xia, Shaozuo Yu, Mingkang Zhu, Wenhu Zhang, Jiehui Huang, Haokun Gui, Haoxuan Che, Long Chen, Qifeng Chen, Wenxuan Zhang, Wenya Wang, Xiaojuan Qi, Yang Deng, Yanwei Li, Mike Zheng Shou, Zhi-Qi Cheng, See-Kiong Ng, Ziwei Liu, Philip Torr, Jiaya Jia
Research Track B · General AI
As AI systems move from generating text to accomplishing goals through sustained interaction, the ability to model environment dynamics becomes a central bottleneck. Agents that manipulate objects, navigate software, coordinate with others, or design experiments require predictive environment models, yet the term world…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.0
2026-05-01 · Steven Tang, Xinze Xiong, Anna Hakhverdyan, Andrew Patterson, Jacob Adkins, Jiamin He, Esraa Elelimy, Parham Mohammad Panahi, Martha White, Adam White
Research Track A · General AI
In continual reinforcement learning (CRL), good performance requires never-ending learning, acting, and exploration in a big, partially observable world. Most CRL experiments have focused on loss of plasticity -- the inability to keep learning -- in one-off experiments where some unobservable non-stationarity is added …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.0
2026-05-01 · Ziwen Zhao, Menglin Yang
General AI
Retrieval-augmented generation (RAG) enhances large language models with external knowledge, and tree-based RAG organizes documents into hierarchical indexes to support queries at multiple granularities. However, existing Tree-RAG methods designed for single-document retrieval face critical challenges in scaling to cro…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.0
2026-05-03 · Zhuofeng Li, Haoxiang Zhang, Cong Wei, Pan Lu, Ping Nie, Yi Lu, Yuyang Bai, Shangbin Feng, Hangxiao Zhu, Ming Zhong, Yuyu Zhang, Jianwen Xie, Yejin Choi, James Zou, Jiawei Han, Wenhu Chen, Jimmy Lin, Dongfu Jiang, Yu Zhang
General AI
Modern retrieval systems, whether lexical or semantic, expose a corpus through a fixed similarity interface that compresses access into a single top-k retrieval step before reasoning. This abstraction is efficient, but for agentic search, it becomes a bottleneck: exact lexical constraints, sparse clue conjunctions, loc…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.0
2026-05-04 · Ruoqi Liu, Imran Q. Mohiuddin, Austin J. Schoeffler, Kavita Renduchintala, Ashwin Nayak, Prasantha L. Vemu, Shivam C. Vedak, Kameron C. Black, John L. Havlik, Isaac Ogunmola, Stephen P. Ma, Roopa Dhatt, Jonathan H. Chen
General AI
We introduce PhysicianBench, a benchmark for evaluating LLM agents on physician tasks grounded in real clinical setting within electronic health record (EHR) environments. Existing medical agent benchmarks primarily focus on static knowledge recall, single-step atomic actions, or action intent without verifiable execut…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.0
2026-05-06 · Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera, Stjepan Picek, Saraga Sakthidharan
Research Track A · General AI
The open-source ecosystem has accelerated the democratization of Large Language Models (LLMs) through the public distribution of specialized Low-Rank Adaptation (LoRA) modules. However, integrating these third-party adapters often induces catastrophic forgetting of the base model's foundational safety alignment. Restor…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.0
2026-05-28 · Xiaohang Tang, Keyue Jiang, Che Liu, Qifang Zhao, Xiaoxiao Xu, Sangwoong Yoon, Ilija Bogunovic
General AI
Reinforcement learning (RL) can be used to improve the policy (denoiser) of diffusion large language models (dLLMs), while being hindered by the intractability of the policy likelihood. A dominant and efficient family of methods replaces the likelihood in standard RL with its evidence lower bound (ELBO), estimated from…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.0
2026-06-07 · Nazreen Shah, Govinda Arya, Bharath B. N., Ranjitha Prasad
Research Track A · General AI
In many real-world settings, data streams are nonstationary and arrive sequentially, requiring learning systems to adapt continuously without retraining from scratch. Continual learning (CL) addresses this challenge by incorporating new tasks while mitigating catastrophic forgetting, where learning new information degr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.0
2026-06-10 · Dayananda Herurkar, Federico Raue, Joachim Folz, Jörn Hees, Andreas Dengel
Research Track A · General AI
Continual anomaly detection in tabular data is challenging and remains largely underexplored, particularly in settings with heterogeneous feature schemas, distribution shifts, and severe class imbalance. In many real-world applications, data arrive sequentially from diverse domains, rendering conventional continual lea…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.0
2026-06-12 · Oxana Salish, Kuniyilh S
Research Track A
Internet of Things (IoT) and Cyber-physical systems (CPS) increasingly rely on continual learning (CL) to adapt to evolving environments, device heterogeneity, and concept drift, thereby improving overall utility. While continual adaptation is essential for long-lived IoT deployments where data patterns evolve, it also…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.0
2026-06-19 · Jiacheng Wang, Xinjia He, Qi Ding, Yutao Yang, Jie Zhou, Liyang Yu, Liang Dou, Qin Chen
Research Track A · General AI
Continual learning (CL) is commonly studied under the assumption that sequential tasks are semantically related or structurally similar. However, in highly heterogeneous settings, where tasks differ substantially in reasoning patterns and input-output formats, existing methods often suffer from catastrophic forgetting …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-05-03 · Arash Ahmadi, Sarah Sharif, Yaser, Banad
General AI
Mathematical reasoning is a key benchmark for large language models. Reinforcement learning is a standard post-training mechanism for improving the reasoning capabilities of large language models, yet performance remains sensitive to the design of the reward function that drives policy optimization. This paper introduc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-05-04 · Chenchen Zhang
General AI
As large language model (LLM) agents evolve from isolated tool users into coordinated teams, reinforcement learning (RL) must optimize not only individual actions but also how work is spawned, delegated, communicated, aggregated, and stopped. This paper studies RL for LLM-based multi-agent systems through orchestration…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-05-07 · Hailey Onweller, Elias Lumer, Austin Huber, Pia Ramchandani, Vamse Kumar Subbiah, Corey Feld
General AI
Large language models (LLMs) power deep research agents that synthesize information from hundreds of web sources into cited reports, yet these citations cannot be reliably verified. Current approaches either trust models to self-cite accurately, risking bias, or employ retrieval-augmented generation (RAG) that does not…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-05-20 · Yibo Li, Jiashuo Yang, Zhi Zheng, Zhiyuan Hu, Yuan Sui, Shizun Wang, Yufei He, Bryan Hooi
Research Track B · General AI
LLM agents have shown strong performance across a wide range of complex tasks, including interactive environments that require long-horizon decision making. But these agents cannot learn on the fly at test time. Self-evolving agents address this by accumulating memory and reflection across episodes rather than requirin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-05-21 · Haokun Wen, Xuemeng Song, Xinghao Xie, Xiaolin Chen, Xiangyu Zhao, Weili Guan
General AI
Fashion image retrieval is a cornerstone of modern e-commerce systems. A unified framework that supports diverse query formats and search intentions is highly desired in practice. However, existing approaches focus on narrow retrieval tasks and do not fully capture such diversity. Therefore, in this work, we aim to dev…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-05-22 · Fen Wang, Zekai Shao, Qiman Kang, Chunran Hu, Zhixuan Zhang, Lexu Xie, Chao Liu, Siming Chen
General AI
Chart descriptions are essential for accessibility, cross-modal retrieval, and assisting readers in extracting insights from complex visualizations. As multimodal large language models (MLLMs) are increasingly adopted for automated chart description generation, a critical question arises: how faithfully and insightfull…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-05-22 · Jiazhen Pan, Weixiang Shen, Jun Li, Julian Canisius, Felix Bitzer, Paula Roßmüller, Jiancheng Yang, Virginie Kreutzinger, Daniel Rueckert, Benedikt Wiestler
General AI
Medical diagnosis is not a single prediction from a fully specified vignette. It is a sequential workup: clinicians decide what evidence to obtain, revise a differential diagnosis, and stop when the diagnosis is sufficiently supported. Most medical AI benchmarks instead reveal the relevant context upfront and score onl…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-05-22 · Rim Assouel, Amir Bar, Michal Drozdzal, Adriana Romero-Soriano
General AI
Despite remarkable progress in Multimodal Large Language Models (MLLMs), these models still struggle with fine-grained understanding tasks. In this work, we propose Procedurally Generated Tasks (PGT), a simple data-driven framework that serves a dual purpose: inducing fine-grained visual understanding and acting as a l…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-05-28 · Weihan Peng, Chenxu Zhang, Qianao Wang, Yuling Shi, Heng Lian, Qihong Mao, Jiahao Pang, Chunliang Feng, Bowen Li, Xiaodong Gu
General AI
While LLM agents have demonstrated remarkable task-oriented abilities such as planning, reasoning, and action, few works have treated them as complete human personalities where emotional dimensions hold equal importance. In this paper, we introduce a novel benchmark to systematically assess whether LLM agents can simul…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-05-29 · Tao Zou, Yichen He, Tian Qiu, Yuan Lin, Hang Li
Research Track A · General AI
Long-term memory is essential for multimodal agents to build coherent experience, accumulate world knowledge, and achieve continual learning. However, constructing effective memory goes beyond memory module design and basic requirements such as accuracy and fidelity; the key challenge lies in determining what to memori…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.8
2026-05-29 · Zhiyu Huang, Johnson Liu, Rui Song, Zewei Zhou, Ruining Yang, Yun Zhang, Tianhui Cai, Hanyin Zhang, Mingxuan Gao, Valeria Xu, Jiali Chen, Yishan Shen, Yiluan Guo, Tony, Qi, Jiaqi Ma
General AI
Reasoning is essential for autonomous driving (AD) in long-tail scenarios, where vehicles must apply commonsense knowledge, understand spatial relations, infer agent interactions, and make safe decisions. However, existing AD datasets and benchmarks mainly target perception, prediction, or planning, and provide limited…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.8
2026-07-02 · Yueqi Song, Lintang Sutawika, Jiarui Liu, Lindia Tjuatja, Jiayi Geng, Yunze Xiao, Daniel Lee, Aditya Bharat Soni, Vincent Lo, Xiang Yue, Graham Neubig
General AI
Evaluating LLM agents on benchmarks like SWE-Bench and GAIA can be expensive, time-consuming, and requires complex infrastructure. A single evaluation can cost thousands of dollars and take days to complete. In contrast, non-agentic LLM benchmarks that test individual capabilities (e.g., reasoning, code generation) are…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.6
2026-07-02 · Yuxuan Li, Lingxi Xie, Xinyue Huo, Jihao Qiu, Jiacheng Shao, Pengfei Chen, Jiannan Ge, Kaiwen Duan, Qi Tian
General AI
Long-form TV dramas present a formidable challenge for comprehensive video understanding, where deciphering complex storyline often relies on \textbf{speaker recognition}, the task of accurately attributing each spoken utterance to its respective character. In this paper, we advance this field through two primary contr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.6
2026-07-02 · Song Tang, Shuming Hu, Xincheng Shuai, Henghui Ding, Yu-Gang Jiang
General AI
Existing referring segmentation models passively process static images captured from fixed perspectives, limiting their applicability in Embodied AI, where agents must perform active perception in the continuous 360$^\circ$ environments. To bridge this gap, we introduce a novel task: Active Panoramic Referring Segmenta…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.6
2026-07-02 · Caleb Ziems, William Held, Su Doga Karaca, David Grusky, Tatsunori Hashimoto, Diyi Yang
General AI
Large Language Model (LLM) social simulations are a promising research method, but they are not yet faithful enough to be adopted widely. In this work, we investigate whether the current scaling paradigm in language modeling is likely to close these gaps, or whether simulation fidelity is orthogonal to general capabili…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.5
2026-01-08 · Uday Allu, Sonu Kedia, Tanmay Odapally, Biddwan Ahmed
General AI
Retrieval-Augmented Generation (RAG) systems critically depend on effective document chunking strategies to balance retrieval quality, latency, and operational cost. Traditional chunking approaches, such as fixed-size, rule-based, or fully agentic chunking, often suffer from high token consumption, redundant text gener…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.5
2026-03-19 · Minhua Lin, Zhiwei Zhang, Hanqing Lu, Hui Liu, Xianfeng Tang, Qi He, Xiang Zhang, Suhang Wang
General AI
Memory-augmented LLM agents maintain external memory banks to support long-horizon interaction, yet most existing systems treat construction, retrieval, and utilization as isolated subroutines. This creates two coupled challenges: strategic blindness on the forward path of the memory cycle, where construction and retri…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.5
2026-04-01 · Shuguang Chen, Adil Hafeez, Salman Paracha
General AI
Agentic applications based on large language models increasingly rely on multi-step interaction loops involving planning, action execution, and environment feedback. While such systems are now deployed at scale, improving them post-deployment remains challenging. Agent trajectories are voluminous and non-deterministic,…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.5
2026-04-03 · Shufan Jiang, Chios Chen, Zhiyang Chen
General AI
The autonomous discovery of bugs remains a significant challenge in modern software development. Compared to code generation, the complexity of dynamic runtime environments makes bug discovery considerably harder for large language models (LLMs). In this paper, we take game development as a representative domain and in…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.5
2026-04-06 · Varun Pratap Bhardwaj
Research Track A · General AI
AI coding agents operate in a paradox: they possess vast parametric knowledge yet cannot remember a conversation from an hour ago. Existing memory systems store text in vector databases with single-channel retrieval, require cloud LLMs for core operations, and implement none of the cognitive processes that make human m…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.5
2026-04-07 · Guhao Feng, Shengjie Luo, Kai Hua, Ge Zhang, Di He, Wenhao Huang, Tianle Cai
Research Track A · General AI
The static ``train then deploy" paradigm fundamentally limits Large Language Models (LLMs) from dynamically adapting their weights in response to continuous streams of new information inherent in real-world tasks. Test-Time Training (TTT) offers a compelling alternative by updating a subset of model parameters (fast we…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.5
2026-04-09 · Sai Srinivas Kancheti, Aditya Kanade, Rohit Sinha, Vineeth N Balasubramanian, Tanuja Ganu
General AI
Multimodal reasoning models (MRMs) trained with reinforcement learning with verifiable rewards (RLVR) show improved accuracy on visual reasoning benchmarks. However, we observe that accuracy gains often come at the cost of reasoning quality: generated Chain-of-Thought (CoT) traces are frequently inconsistent with the f…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.5
2026-04-09 · Ailing Zeng, Casper Yang, Chauncey Ge, Eddie Zhang, Garvey Xu, Gavin Lin, Gilbert Gu, Jeremy Pi, Leo Li, Mingyi Shi, Sheng Bi, Steven Tang, Thorn Hang, Tobey Guo, Vincent Li, Xin Tong, Yikang Li, Yuchen Sun, Yue, Zhao, Yuhan Lu, Yuwei Li, Zane Zhang, Zeshi Yang, Zi Ye
General AI
Performance, the externalization of intent, emotion, and personality through visual, vocal, and temporal behavior, is what makes a character alive. Learning such performance from video is a promising alternative to traditional 3D pipelines. However, existing video models struggle to jointly achieve high expressiveness,…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.5
2026-04-13 · Yinuo Yang, Zixian Ma, Manasi Ganti, Jieyu Zhang, Ranjay Krishna
General AI
We present a discriminative multimodal reward model that scores all candidate responses in a single forward pass. Conventional discriminative reward models evaluate each response independently, requiring multiple forward passes, one for each potential response. Our approach concatenates multiple responses with separato…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.5
2026-04-14 · Tomer Ashuach, Liat Ein-Dor, Shai Gretz, Yoav Katz, Yonatan Belinkov
General AI
Humans use introspection to evaluate their understanding through private internal states inaccessible to external observers. We investigate whether large language models possess similar privileged knowledge about answer correctness, information unavailable through external observation. We train correctness classifiers …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.5
2026-04-15 · Genghan Zhang, Shaowei Zhu, Anjiang Wei, Zhenyu Song, Allen Nie, Zhen Jia, Nandita Vijaykumar, Yida Wang, Kunle Olukotun
General AI
We present AccelOpt, a self-improving large language model (LLM) agentic system that autonomously optimizes kernels for emerging AI acclerators, eliminating the need for expert-provided hardware-specific optimization knowledge. AccelOpt explores the kernel optimization space through iterative generation, informed by an…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.5
2026-04-17 · Jize Wang, Xuanxuan Liu, Yining Li, Songyang Zhang, Yijun Wang, Zifei Shan, Xinyi Le, Cailian Chen, Xinping Guan, Dacheng Tao
General AI
The development of general-purpose agents requires a shift from executing simple instructions to completing complex, real-world productivity workflows. However, current tool-use benchmarks remain misaligned with real-world requirements, relying on AI-generated queries, dummy tools, and limited system-level coordination…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.5
2026-04-21 · Hongnan Ma, Han Wang, Shenglin Wang, Tieyue Yin, Yiwei Shi, Yucong Huang, Yingtian Zou, Muning Wen, Mengyue Yang
General AI
Large language models can generate plausible game code, but turning this capability into iterative creative improvement remains difficult. In practice, single-shot generation often produces brittle runtime behavior, weak accumulation of experience across versions, and creativity scores that are too subjective to serve …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.5
2026-04-21 · Zijie Li, Yichun Shi, Jingxiang Sun, Ye Wang, Yixuan Huang, Zhiyao Guo, Xiaochen Lian, Peihao Zhu, Yu Tian, Zhonghua Zhai, Peng Wang
General AI
We present MMCORE, a unified framework designed for multimodal image generation and editing. MMCORE leverages a pre-trained Vision-Language Model (VLM) to predict semantic visual embeddings via learnable query tokens, which subsequently serve as conditioning signals for a diffusion model. This streamlined design effect…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.5
2026-04-21 · Xiachong Feng, Yi Jiang, Xiaocheng Feng, Deyi Yin, Libo Qin, Yangfan Ye, Lei Huang, Weitao Ma, Yuxuan Gu, Chonghan Qin, Bing Qin, Lingpeng Kong
General AI
Social intelligence, the ability to navigate complex interpersonal interactions, presents a fundamental challenge for language agents. Training such agents via reinforcement learning requires solving the credit assignment problem: determining how individual utterances contribute to multi-turn dialogue outcomes. Existin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.5
2026-04-26 · Alexander Bering
Research Track A · General AI
Despite a century of empirical memory research, existing AI agent memory systems rely on system-engineering metaphors (virtual-memory paging, flat LLM storage, Zettelkasten notes), none integrating principles of consolidation, forgetting, and reconsolidation. We present ZenBrain, a multi-layer memory architecture integ…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.5
2026-04-27 · Yingqian Min, Kun Zhou, Yifan Li, Yuhuan Wu, Han Peng, Yifan Du, Wayne Xin Zhao, Min Yang, Ji-Rong Wen
General AI
Recent advancements in reinforcement learning with verifiable rewards (RLVR) have significantly improved the complex reasoning ability of vision-language models (VLMs). However, its outcome-level supervision is too coarse to diagnose and correct errors within the reasoning chain. To this end, we propose Perceval, a pro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.5
2026-05-04 · Joern Hentsch
Research Track A · General AI
Continual learning systems face a fundamental tension between plasticity -- acquiring new knowledge -- and stability -- retaining prior knowledge. We introduce MPCS (Multi-Plasticity Continual System), a neuroplastic architecture that integrates eleven complementary mechanisms: task-driven neurogenesis, Fourier-encoded…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.5
2026-05-12 · Rodney A Sanchez, Ferat Sahin, Alex Ororbia, Jamison Heard
Research Track A · General AI
Advancements in reinforcement learning have produced a variety of complex and useful intrinsic driving forces; crucially, these drivers operate under a direct conditioning paradigm. This form of conditioning limits our agents' capacity by restricting how they learn from the environment as well as from others. Off-polic…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.5
2026-05-21 · Dianzhi Yu, Vireo Zhang, Hongru Wang, Yanyu Chen, Minda Hu, Wanghan Xu, Siki Chen, Philip Torr, Zhenfei Yin, Irwin King
Research Track A · General AI
Achieving self-evolution in intelligent agents requires the continual accumulation of new knowledge across changing task sequences without forgetting previously acquired abilities. Existing approaches either internalize knowledge by updating model parameters, which induces catastrophic forgetting, or rely on external m…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.5
2026-05-28 · Kellian Cottart, Théo Ballet, Djohan Bonnet, Damien Querlioz
Research Track A · General AI
Always-on edge systems must keep learning as conditions change under tight compute budgets and must detect unreliable predictions. Bayesian binary neural networks are attractive in this setting, but mean-field Bernoulli posteriors can saturate on long non-stationary streams, wiping out epistemic uncertainty and freezin…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.5
2026-06-01 · Ailiya Borjigin, Igor Stadnyk, Ben Bilski, Maksym Chikita, Dmytro Kyrylenko, Sofiia Pidturkina, Julia Stadnyk
General AI
Financial AI agents often fail for a simple reason: they make users carry the complexity. A user must repeatedly restate goals, risk preferences, portfolio context, past judgments, and shifting market assumptions, while the agent answers, retrieves, acts, and forgets. In finance, this is not just inconvenient. In tasks…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.5
2026-06-04 · Haibo Wang, Lifu Huang
General AI
Multimodal Large Language Models (MLLMs) excel at 2D semantic understanding but lack intrinsic 3D awareness, resulting in representations that fail to maintain geometric and spatial consistency across video frames. Given the scarcity of large-scale 3D data, we present GeoVR, a novel framework that learns geometric repr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.5
2026-06-07 · Anthony Bazhenov, Jean Erik Delanois, Giri P. Krishnan
Research Track A
One of the critical limitations of artificial neural networks is their lack of ability to continually learn: training on new tasks often leads to interference and forgetting of the previous ones. While several algorithms have been proposed to protect old memories from interference, they are typically applied during or …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.5
2026-06-08 · Siyuan Liu, Jinyang Wu
General AI
Multimodal large language models (MLLMs) commonly inherit the deep, symmetric Transformer backbone designed for unimodal text modeling, and apply the same computation uniformly to image and language tokens. This design overlooks a key modality asymmetry: image and text tokens differ substantially in information density…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.5
2026-06-08 · Yutong Bian, Dongjie Cheng, Heming Xia, Yongqi Li, Wenjie Li
General AI
Chain-of-Thought (CoT) improves the performance of Large Language Models (LLMs) and has been extended to Multimodal Large Language Models (MLLMs). More recent work further moves from text-based multimodal reasoning toward interleaved-modal reasoning, where intermediate steps can incorporate both textual rationales and …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.5
2026-06-09 · Xucong Wang, Ziyu Ma, Shidong Yang, Tongwen Huang, Pengkun Wang, Yong Wang, Xiangxiang Chu
General AI
Although Large Language Model (LLM) agents have demonstrated strong performance on complex tasks, their learning is often limited by inefficient interaction feedback and static training environments, which hinder broader generalization. To address these limitations, this paper introduces Role-Agent, black{a framework} …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.5
2026-06-15 · Wei Xu, Ke Yang, Gang Luo, Keli Zheng, Lingyan Hu, Jing Wang, Kefeng Li
Research Track A · General AI
Predictive modeling for clinical tabular data is central to clinical decision support and therefore requires not only strong predictive performance but also transparent decision logic. Although deep learning and tree-based ensemble methods can achieve high accuracy, their black-box nature remains a major obstacle to cl…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.4
2026-06-21 · Zhuoran Jin, Kejian Zhu, Hongbang Yuan, Yupu Hao, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao
General AI
Chain-of-Thought (CoT) has become a standard method for improving reasoning capabilities in large language models (LLMs) by eliciting step-by-step thinking, but its effectiveness in multimodal tasks remains unclear. In this paper, we aim to systematically investigate the key question: What can multimodal Chain-of-Thoug…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.4
2026-06-23 · Chenhao Dang, Dantong Zhu, Jun Yang, Conghui He, Weijia Li
General AI
Multimodal misinformation detection is increasingly important because viral posts now combine long multilingual narratives, several images, mixed provenance, and subtle text--image framing errors. Existing benchmarks and methods remain poorly matched to this setting: they usually isolate short captions, single images, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-03-24 · Yenchia Feng, Chirag Sharma, Karime Maamari
Research Track B · General AI
Although large language models (LLMs) have advanced rapidly, robust automation of complex software workflows remains an open problem. In long-horizon settings, agents frequently suffer from cascading errors and environmental stochasticity; a single misstep in a dynamic interface can lead to task failure, resulting in h…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-03-26 · Vishal Narnaware, Animesh Gupta, Kevin Zhai, Zhenyi Wang, Mubarak Shah
General AI
Multimodal Diffusion Large Language Models (MDLLMs) achieve high-concurrency generation through parallel masked decoding, yet the architectures remain prone to multimodal hallucinations. This structural vulnerability stems from an algorithmic flaw: the decoder ranks candidate tokens based on textual likelihood without …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-03-30 · Kaushitha Silva, Srinath Perera
General AI
Large Language Models (LLMs) have demonstrated impressive capabilities in code generation. While an interactive feedback loop can improve performance, writing effective tests is a non-trivial task. Early multi-agent frameworks, such as AgentCoder, automated this process but relied on generated tests as absolute ground …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-03-30 · Kaituo Feng, Manyuan Zhang, Shuang Chen, Yunlong Lin, Kaixuan Fan, Yilei Jiang, Hongyu Li, Dian Zheng, Chenyang Wang, Xiangyu Yue
General AI
Recent image generation models have shown strong capabilities in generating high-fidelity and photorealistic images. However, they are fundamentally constrained by frozen internal knowledge, thus often failing on real-world scenarios that are knowledge-intensive or require up-to-date information. In this paper, we pres…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-03-30 · Zimu Zhang, Yucheng Zhang, Xiyan Xu, Ziyin Wang, Sirui Xu, Kai Zhou, Bing Zhou, Chuan Guo, Jian Wang, Yu-Xiong Wang, Liang-Yan Gui
General AI
Synthesizing human motion has advanced rapidly, yet realistic hand motion and bimanual interaction remain underexplored. Whole-body models often miss the fine-grained cues that drive dexterous behavior, finger articulation, contact timing, and inter-hand coordination, and existing resources lack high-fidelity bimanual …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-03-31 · Yang Shen, Zhenyi Yi, Ziyi Zhao, Lijun Sun, Dongyang Li, Chin-Teng Lin, Yuhui Shi
Research Track A · General AI
As AI agents evolve, the community is rapidly shifting from single Large Language Models (LLMs) to Multi-Agent Systems (MAS) to overcome cognitive bottlenecks in automated research. However, the optimal multi-agent coordination framework for these autonomous agents remains largely unexplored. In this paper, we present …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-06 · Lei Zhang, Junjiao Tian, Zhipeng Fan, Kunpeng Li, Jialiang Wang, Weifeng Chen, Markos Georgopoulos, Felix Juefei-Xu, Yuxiang Bao, Julian McAuley, Manling Li, Zecheng He
General AI
Humans paint images incrementally: they plan a global layout, sketch a coarse draft, inspect, and refine details, and most importantly, each step is grounded in the evolving visual states. However, can unified multimodal models trained on text-image interleaved datasets also imagine the chain of intermediate states? In…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-07 · Yuchi Wang, Haiyang Yu, Weikang Bian, Jiefeng Long, Xiao Liang, Chao Feng, Hongsheng Li
General AI
MLLMs have been successfully applied to multimodal embedding tasks, yet their generative reasoning capabilities remain underutilized. Directly incorporating chain-of-thought reasoning into embedding learning introduces two fundamental challenges. First, structural misalignment between instance-level reasoning and pairw…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-07 · Komal Kumar, Aman Chadha, Salman Khan, Fahad Shahbaz Khan, Hisham Cholakkal
General AI
The rapid growth of scientific literature has made it increasingly difficult for researchers to efficiently discover, evaluate, and synthesize relevant work. Recent advances in multi-agent large language models (LLMs) have demonstrated strong potential for understanding user intent and are being trained to utilize vari…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-14 · Yifan Du, Zikang Liu, Jinbiao Peng, Jie Wu, Junyi Li, Jinyang Li, Wayne Xin Zhao, Ji-Rong Wen
General AI
Multimodal deep search agents have shown great potential in solving complex tasks by iteratively collecting textual and visual evidence. However, managing the heterogeneous information and high token costs associated with multimodal inputs over long horizons remains a critical challenge, as existing methods often suffe…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-20 · Jinghui Lu, Jiayi Guan, Zhijian Huang, Jinlong Li, Guang Li, Lingdong Kong, Yingyan Li, Han Wang, Shaoqing Xu, Yuechen Luo, Fang Li, Chenxu Dang, Junli Wang, Tao Xu, Jing Wu, Jianhua Wu, Xiaoshuai Hao, Wen Zhang, Tianyi Jiang, Lingfeng Zhang, Lei Zhou, Yingbo Tang, Jie Wang, Yinfeng Gao, Xizhou Bu, Haochen Tian, Yihang Qiu, Feiyang Jia, Lin Liu, Yigu Ge, Hanbing Li, Yuannan Shen, Jianwei Cui, Hongwei Xie, Bing Wang, Haiyang Sun, Jingwei Zhao, Jiahui Huang, Pei Liu, Zeyu Zhu, Yuncheng Jiang, Zibin Guo, Chuhong Gong, Hanchao Leng, Kun Ma, Naiyang Wang, Guang Chen, Kuiyuan Yang, Hangjun Ye, Long Chen
General AI
Chain-of-Thought (CoT) reasoning has become a powerful driver of trajectory prediction in VLA-based autonomous driving, yet its autoregressive nature imposes a latency cost that is prohibitive for real-time deployment. Latent CoT methods attempt to close this gap by compressing reasoning into continuous hidden states, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-22 · Naizhong Xu
Research Track A · General AI
Modern retrieval-augmented generation (RAG) systems treat vector embeddings as static, context-free artifacts: an embedding has no notion of when it was created, how trustworthy its source is, or which other embeddings depend on it. This flattening of knowledge has a measurable cost: recent work on VersionRAG reports t…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-24 · Lihao Zheng, Zhenwei Shao, Yu Zhou, Yan Yang, Xintian Shen, Jiawei Chen, Hao Ma, Tao Wei
General AI
Although Multimodal Large Language Models (MLLMs) have advanced rapidly, they still face notable challenges in fine-grained multi-image understanding, often exhibiting spatial hallucination, attention leakage, and failures in object constancy. In addition, existing approaches typically rely on expensive human annotatio…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-24 · Jichao Wang, Liuyang Bian, Yufeng Zhou, Han Xiao, Yue Pan, Guozhi Wang, Hao Wang, Zhaoxiong Wang, Yafei Wen, Xiaoxin Chen, Shuai Ren, Lingfang Zeng
Research Track B · General AI
As Multimodal Large Language Models (MLLMs) mature, GUI agents are evolving from static interactions to complex navigation. While Reinforcement Learning (RL) has emerged as a promising paradigm for training MLLM agents on dynamic GUI tasks, its effective application faces a dilemma. Standard Offline RL often relies on …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-27 · Xihang Wang, Zihan Wang, Chengkai Huang, Quan Z. Sheng, Lina Yao
General AI
Multimodal Retrieval-Augmented Generation (MRAG) addresses key limitations of Multimodal Large Language Models (MLLMs), such as hallucination and outdated knowledge. However, current MRAG systems struggle to distinguish whether retrieved multimodal data truly supports the semantic core of an answer or merely provides s…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-27 · Mofei Li, Taozhi Chen, Guowei Yang, Jia Li
Research Track A · General AI
Large Language Models (LLMs) excel at general code generation, but their performance drops sharply in enterprise settings that rely on internal private libraries absent from public pre-training corpora. While Retrieval-Augmented Generation (RAG) offers a training-free alternative by providing static API documentation, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-30 · Binyan Xu, Xilin Dai, Kehuan Zhang
Research Track A · General AI
Current agentic memory systems (vector stores, retrieval-augmented generation, scratchpads, and context-window management) do not implement memory: they implement lookup. We argue that treating lookup as memory is a category error with provable consequences for agent capability, long-term learning, and security. Retrie…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-04-30 · Sudong Wang, Weiquan Huang, Xiaomin Yu, Zuhao Yang, Hehai Lin, Keming Wu, Chaojun Xiao, Chen Chen, Wenxuan Wang, Beier Zhu, Yunjian Zhang, Chengwei Qin
General AI
The standard post-training recipe for large multimodal models (LMMs) applies supervised fine-tuning (SFT) on curated demonstrations followed by reinforcement learning with verifiable rewards (RLVR). However, SFT introduces distributional drift that neither preserves the model's original capabilities nor faithfully matc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-06-09 · Yunhan Jiang, Wenbin Duan, Shasha Guo, Liang Pang, Xiaoqian Sun, Huawei Shen
General AI
Memory is essential for enabling large language model (LLM) agents to handle long-horizon reasoning tasks. Existing memory mechanisms are largely centralized, typically organizing retrieved information and interaction history within a single model context. This design imposes a fundamental trade-off: scaling reasoning …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.3
2026-06-09 · Peiqi Jia, Haonan Jia, Ziqi Miao, Linkang Du, Yuntao Wang, Zhou Su
General AI
With the widespread deployment of Multimodal Large Language Models (MLLMs) in social interaction, understanding and controlling their behavior under complex personality conditions is essential. This paper introduces explicit personality conditioning and establishes a systematic evaluation framework encompassing single-…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.2
2026-05-28 · Tianpeng Bu, Xin Liu, Qihua Chen, Hao Jiang, Shurui Li, Hongtao Duan, Lu Jiang, Lulu Hu, Bin Yang, Minying Zhang
Research Track B · General AI
While GUI agents have advanced rapidly, they often lack the robustness to recover from their own errors, hindering real-world deployment. To bridge this gap at both the evaluation and data levels, we introduce GUI-RobustEval and propose Robustness-driven Trajectory Synthesis. GUI-RobustEval contains 1,216 executable te…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.2
2026-06-23 · Linpeng Huang, Weixing Chen, Zexin Chen, Yang Liu, Liang Lin
General AI
Recent advances in Video Large Language Models (Video-LLMs) have yielded promising performance on video question answering (VideoQA). Nevertheless, existing benchmarks are predominantly evaluated through answer correctness, while the grounding of predictions in relevant video evidence remains largely unexamined. This d…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.2
2026-06-23 · Zixuan Li, Haokun Lin, Yicheng Xiao, Zhiwei Li, Xinyang Song, Zelong Zheng, Yong He, Heng Yao, Ke Ding, Chao Yu, Chuan Yuan, Qi Li, Zhenan Sun
General AI
Unified multi-modal large language models (MLLMs) have achieved strong text-to-image generation quality, but still struggle with structure-aware prompt following, where object counts, spatial relations, attribute bindings, and coarse layouts must be preserved. We attribute this limitation in part to the entanglement of…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.2
2026-06-24 · Liang-Yuan Wu, Zih-Ching Chen, Tongshuang Wu, Chao-Han Huck Yang, Hua Shen
General AI
As multimodal conversational systems increasingly engage in spoken interaction, their ability to navigate paralinguistic social cues has become a critical bottleneck for natural human-AI communication. However, existing evaluations of machine emotional intelligence assess reasoning exclusively through isolated text or …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.0
2026-04-05 · Gunn Kim
Research Track A · General AI
Continual learning in artificial neural networks is fundamentally limited by the stability--plasticity dilemma: systems that retain prior knowledge tend to resist acquiring new knowledge, and vice versa. Existing approaches, most notably elastic weight consolidation~(EWC), address this empirically without a physical ac…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.0
2026-04-07 · Manuel Barusco, Francesco Borsatti, David Petrovic, Davide Dalle Pezze, Gian Antonio Susto
Research Track A · General AI
Visual Anomaly Detection (VAD) is a critical task for many applications including industrial inspection and healthcare. While VAD has been extensively studied, two key challenges remain largely unaddressed in conjunction: edge deployment, where computational resources are severely constrained, and continual learning, w…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.0
2026-04-08 · Mohamed Rabie, Chinthana Panagamuwa, Konstantinos G. Kyriakopoulos
Research Track A
Reliable radar pulse classification is essential in Electromagnetic Warfare for situational awareness and decision support. Deep Neural Networks have shown strong performance in radar pulse and RF emitter recognition; however, on their own they struggle to efficiently learn new pulses and lack mechanisms for expressing…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.0
2026-04-09 · Danit Yanowsky, Daphna Weinshall
Research Track A · General AI
Catastrophic forgetting remains a key challenge in Continual Learning (CL). In replay-based CL with severe memory constraints, performance critically depends on the sample selection strategy for the replay buffer. Most existing approaches construct memory buffers using embeddings learned under supervised objectives. Ho…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.0
2026-04-19 · Ou Wu
Research Track A · General AI
Large language model optimization has historically bifurcated into isolated data-centric and model-centric paradigms: the former manipulates involved samples through selection, augmentation, or poisoning, while the latter tunes model weights via masking, quantization, or low-rank adaptation. This paper establishes a un…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.0
2026-04-20 · Riccardo Casciotti, Francesco De Santis, Alberto Antonietti, Annamaria Mesaros
Research Track A
The ability of humans for lifelong learning is an inspiration for deep learning methods and in particular for continual learning. In this work, we apply Hebbian learning, a biologically inspired learning process, to sound classification. We propose a kernel plasticity approach that selectively modulates network kernels…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.0
2026-05-02 · Wenhao Li, Xiu Su, Yichao Cao, Hongyan Xu, Xiaobo Xia, Shan You, Yi Chen, Chang Xu
Research Track A · General AI
Vision-language-action (VLA) models have advanced the field of embodied manipulation by harnessing broad world knowledge and strong generalization. However, current VLA models still face several key challenges, including limited reasoning capability, lack of status monitoring, and difficulty in self-correction. In this…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.0
2026-05-09 · Liangqi Yuan, Wenzhi Fang, Shiqiang Wang, Christopher G. Brinton
General AI
Large language model (LLM) agents face a structural tension: cloud agents provide strong reasoning but expose user data, while on-device agents preserve privacy at the cost of overall capability. Existing device-cloud designs treat this boundary as a compute split rather than a trust boundary suited to agentic workload…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.0
2026-05-12 · Minjong Cheon
Research Track A · General AI
Catastrophic forgetting remains the central obstacle in continual learning (CL): parameters shared across tasks interfere with one another, and existing regularization methods such as EWC and SI apply uniform penalties without awareness of which input region a parameter serves. We propose KAN-CL, a continual learning f…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.0
2026-05-12 · Neha Verma, Nikhil Mehta, Shao-Chuan Wang, Naijing Zhang, Alicia Tsai, Li Wei, Lukasz Heldt, Lichan Hong, Ed Chi, Xinyang Yi
Research Track A · General AI
Despite the rapid advancements in large language model (LLM) development, fine-tuning them for specific tasks often results in the catastrophic forgetting of their general, language-based reasoning abilities. This work investigates and addresses this challenge in the context of the Generative Retrieval (GenRetrieval) t…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.0
2026-05-20 · Kei Hiroshima, Kento Uchida, Shinichi Shirakawa
Research Track A · General AI
Continual learning (CL) aims to train models sequentially on multiple tasks while mitigating catastrophic forgetting of previously learned knowledge. Recent advances in large pre-trained models (LPMs) and model merging techniques, such as MAGMAX, have demonstrated effective CL performance by combining task-specific par…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.0
2026-05-27 · Elvin Hajizada, Michael Neumeier, Edward Paxon Frady, Yulia Sandamirskaya, Axel von Arnim, Bing Li, Eyke Hüllermeier
Research Track A · General AI
Recognizing and continuously learning novel human actions without forgetting prior classes is a requirement for emerging AR/VR and robotics applications. For these applications, both on-device processing and learning are essential for privacy and low-latency adaptation. Event cameras address the efficiency of visual se…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.0
2026-05-28 · Kajetan Schweighofer, Conor F. Hayes, Roberto Dailey, Risto Miikkulainen, Xin Qiu
Research Track A · General AI
Evolution Strategies (ES) has recently emerged as a competitive alternative to reinforcement learning (RL) for large language model (LLM) fine-tuning, offering advantages through simplicity, scalability, and inference-only training. However, recent work suggests that ES fine-tuning on new tasks may induce forgetting of…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.0
2026-06-09 · Bocheng Ju, Jianhua Wang, Chengliang Liu, Xiaolin Chang
Research Track A · General AI
Large language model unlearning aims to suppress designated undesirable knowledge while preserving benign capabilities. Many unlearning objectives focus on suppressing undesired answers, while recent target-guided variants specify replacement behavior but still leave update locality largely unconstrained. This paper in…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.9
2026-06-23 · Yujiang He, Frederic Uhrweiller, Bernhard Sick
Research Track A
Power forecasting models deployed in real-world energy markets must operate under nonstationary conditions, where data distributions continually evolve due to weather variability, infrastructure upgrades, and changing consumption behaviors. In practice, these models face strict operational constraints: historical data …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.8
2026-05-12 · Chen Li, Xiaoling Hu, Songzhu Zheng, Jiawei Zhou, Chao Chen
General AI
Large language models (LLMs) often produce answers with high certainty even when they are incorrect, making reliable confidence estimation essential for deployment in real-world scenarios. Verbalized confidence, where models explicitly state their confidence in natural language, provides a flexible and user-facing unce…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.8
2026-05-28 · Lukas Aichberger, Sepp Hochreiter
General AI
To improve the reasoning capabilities of large language models, test-time compute is typically scaled by generating intermediate tokens before the final answer. However, this couples reasoning to autoregressive generation and thereby conflates internal computation with external communication. In contrast, human cogniti…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.8
2026-05-29 · Weile Chen, Bingchen Miao, Qifan Yu, Wendong Bu, Guoming Wang, Wenqiao Zhang, Shengyu Zhang, Juncheng Li, Siliang Tang
Research Track B · General AI
Recent advances in Multimodal Large Language Models (MLLMs) have led to promising progress in web agents. However, existing web agents often rely on handcrafted execution pipelines or expensive expert trajectories, limiting their adaptability to complex, dynamic environments. To address these challenges, we propose SCA…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.8
2026-06-29 · Cao-Tri Nguyen, Nguyen-Khoa Luong, Vinh-Tiep Nguyen, Minh-Triet Tran
General AI
Photographs frequently contain \emph{visual distractors} besides foregrounds and backgrounds of the intended subject, competing for attention and weakening composition. While modern editing tools streamline object removal, identifying which objects to remove remains a mostly manual process. Existing saliency models and…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.8
2026-06-30 · Runyu Lu, Yubo Wu, Ethan Kou, Letian Fu, Wenli Xiao, Ajay Mandlekar, Yinzhen Xu, Guanya Shi, Ken Goldberg, Ang Chen, Mosharaf Chowdhury, Yuke Zhu, Linxi "Jim" Fan, Guanzhi Wang
Research Track A · General AI
Traditional robot programming is challenging: it requires orchestrating multimodal perception, managing physical contact dynamics, and handling diverse configurations and execution failures. We introduce ASPIRE (Agentic Skill Programming through Iterative Robot Exploration), a continual learning system that autonomousl…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.8
2026-07-02 · Zhilin Wang, Han Song, Runzhe Zhan, Jusen Du, Jiacheng Chen, Tianle Li, Qingyu Yin, Yulun Wu, Zhennan Shen, Tong Zhu, Yanshu Li, Guanjie Chen, Derek F. Wong, Yafu Li, Yu Cheng, Yang Yang
General AI
Autonomous agents are increasingly expected to improve executable policies through feedback, yet existing evaluations often collapse this process into a final score or confound it with open-ended software-engineering progress. We introduce Autonomous Policy Evolution, a controlled evaluation setting in which a harness-…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.6
2026-07-01 · Jiatong Li, Samuel Yeh, Sharon Li
Research Track A · General AI
Recurrent memory agents extend LLMs to arbitrarily long contexts by iteratively consolidating input into a fixed-size memory window. Despite their scalability, these agents exhibit a well-documented reliability problem: end-to-end performance degrades systematically as context length grows. We diagnose this failure by …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.6
2026-07-02 · Jingtao Xu, Zizhuo Lin, Jianwen Sun, Yi Yang, Yawei Luo
General AI
While Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in standard visual understanding, adapting them for active visual search in 360$^\circ$ panoramic environments exposes fundamental limitations. Specifically, standard MLLMs struggle to effectively model inherent panoramic properti…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.6
2026-07-02 · Junyi Wen, Ruiyan Zhuang, Yongjia Xu, Pengtu Li, Rui Zou, Hongyi Chen, Chingman Wan, Puxu Yang, Wuhui Chen, Yanlin Wang
General AI
Developing high-performance kernels for Neural Processing Units (NPUs) is a critical industry bottleneck, requiring developers to manually navigate implicit hardware constraints and strict memory hierarchies. While large language models offer immense automation potential, they fail catastrophically on NPUs due to a fun…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.6
2026-07-02 · Francesca Pistilli, Simone Alberto Peirone, Giuseppe Averta
General AI
Understanding human behavior while interacting with the surrounding world is crucial for many applications of embodied AI. First-person videos are particularly informative for this problem, as they well capture how activities reshape the scene over time. However, existing approaches often rely on implicit visual or lan…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.6
2026-07-02 · Dazhi Fu, Jiuding Yang, Yiwen Guo, Jicong Fan
General AI
Reliable reward and preference signals are critical for evaluating and optimizing large language models on open-ended tasks. Rubric-based judges offer a transparent way to decompose such judgments into explicit evaluation criteria, but existing annotation-free rubric generators typically rely on a single generic evalua…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.5
2026-03-17 · Jian Yang, Wei Zhang, Shawn Guo, Zhengmao Ye, Lin Jing, Shark Liu, Yizhi Li, Jiajun Wu, Cening Liu, X. Ma, Yuyang Song, Siwei Wu, Yuwen Li, L. Liao, T. Zheng, Ziling Huang, Zelong Huang, Che Liu, Yan Xing, Renyuan Li, Qingsong Cai, Hanxu Yan, Siyue Wang, Shikai Li, Jason Klein Liu, An Huang, Yongsheng Kang, Jinxing Zhang, Chuan Hao, Haowen Wang, Weicheng Gu, Ran Tao, Mingjie Tang, Peihao Wu, Jianzhou Wang, Xianglong Liu, Weifeng Lv, Bryan Dai
General AI
In this report, we introduce the IQuest-Coder-V1 series-(7B/14B/40B/40B-Loop), a new family of code large language models (LLMs). Moving beyond static code representations, we propose the code-flow multi-stage training paradigm, which captures the dynamic evolution of software logic through different phases of the pipe…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.5
2026-03-26 · Yicheng Zou, Dongsheng Zhu, Lin Zhu, Tong Zhu, Yunhua Zhou, Peiheng Zhou, Xinyu Zhou, Dongzhan Zhou, Zhiwang Zhou, Yuhao Zhou, Bowen Zhou, Zhanping Zhong, Zhijie Zhong, Haiteng Zhao, Penghao Zhao, Xiaomeng Zhao, Zhiyuan Zhao, Yechen Zhang, Jin Zhang, Wenwei Zhang, Hongjie Zhang, Zhuo Zhang, Wenlong Zhang, Bo Zhang, Chao Zhang, Chen Zhang, Yuhang Zang, Fei Yuan, Jiakang Yuan, Jiashuo Yu, Jinhui Yin, Haochen Ye, Qian Yao, Bowen Yang, Danni Yang, Kaichen Yang, Ziang Yan, Jun Xu, Yicheng Xu, Wanghan Xu, Xuenan Xu, Chao Xu, Ruiliang Xu, Shuhao Xing, Long Xing, Xinchen Xie, Ling-I Wu, Zijian Wu, Zhenyu Wu, Lijun Wu, Yue Wu, Jianyu Wu, Wen Wu, Fan Wu, Xilin Wei, Qi Wei, Bingli Wang, Rui Wang, Ziyi Wang, Zun Wang, Yi Wang, Haomin Wang, Yizhou Wang, Lintao Wang, Yiheng Wang, Longjiang Wang, Bin Wang, Jian Tong, Zhongbo Tian, Huanze Tang, Chen Tang, Shixiang Tang, Yu Sun, Qiushi Sun, Xuerui Su, Qisheng Su, Chenlin Su, Demin Song, Jin Shi, Fukai Shang, Yuchen Ren, Pengli Ren, Xiaoye Qu, Yuan Qu, Jiantao Qiu, Yu Qiao, Runyu Peng, Tianshuo Peng, Jiahui Peng, Qizhi Pei, Zhuoshi Pan, Linke Ouyang, Wenchang Ning, Yichuan Ma, Zerun Ma, Ningsheng Ma, Runyuan Ma, Chengqi Lyu, Haijun Lv, Han Lv, Lindong Lu, Kuikun Liu, Jiangning Liu, Yuhong Liu, Kai Liu, Hongwei Liu, Zhoumianze Liu, Mengjie Liu, Ziyu Liu, Wenran Liu, Yang Liu, Liwei Liu, Kaiwen Liu, Junyao Lin, Junming Lin, Tianyang Lin, Dahua Lin, Jianze Liang, Linyang Li, Peiji Li, Zonglin Li, Zehao Li, Pengze Li, Guoyan Li, Lingkai Kong, Linglin Jing, Zhenjiang Jin, Feifei Jiang, Qian Jiang, Junhao Huang, Zixian Huang, Haian Huang, Zhouqi Hua, Han Hu, Linfeng Hou, Yinan He, Conghui He, Tianyao He, Xu Guo, Qipeng Guo, Aijia Guo, Yuzhe Gu, Lixin Gu, Jingyang Gong, Qiming Ge, Jiaye Ge, Songyang Gao, Jianfei Gao, Xinyu Fang, Caihua fan, Yue Fan, Yanhui Duan, Zichen Ding, Shengyuan Ding, Xuanlang Dai, Erfei Cui, Ganqu Cui, Pei Chu, Tao Chu, Guangran Cheng, Yu Cheng, Kai Chen, Yongkang Chen, Chiyu Chen, Guanzhou Chen, Qiaosheng Chen, Sitao Chen, Xin Chen, Haojiong Chen, Yicheng Chen, Weihan Cao, Yuhang Cao, Qinglong Cao, Lei Bai
General AI
We introduce Intern-S1-Pro, the first one-trillion-parameter scientific multimodal foundation model. Scaling to this unprecedented size, the model delivers a comprehensive enhancement across both general and scientific domains. Beyond stronger reasoning and image-text understanding capabilities, its intelligence is aug…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.5
2026-03-29 · Chongyang Zhao, Mingsong Li, Haodong Lu, Dong Gong
Research Track A · General AI
Multimodal Continual Instruction Tuning aims to continually enhance Large Vision Language Models (LVLMs) by learning from new data without forgetting previously acquired knowledge. Mixture of Experts (MoE) architectures naturally facilitate this by incrementally adding new experts and expanding routers while keeping th…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.5
2026-03-31 · Shuang Chen, Quanxin Shou, Hangting Chen, Yucheng Zhou, Kaituo Feng, Wenbo Hu, Yi-Fan Zhang, Yunlong Lin, Wenxuan Huang, Mingyang Song, Dasen Dai, Bolin Jiang, Manyuan Zhang, Shi-Xue Zhang, Zhengkai Jiang, Lucas Wang, Zhao Zhong, Yu Cheng, Nanyun Peng
General AI
Unified multimodal models provide a natural and promising architecture for understanding diverse and complex real-world knowledge while generating high-quality images. However, they still rely primarily on frozen parametric knowledge, which makes them struggle with real-world image generation involving long-tail and kn…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.5
2026-04-03 · Lei Song, Shihan Guan, Youyong Kong
Research Track A · General AI
Non-Exemplar Continual Graph Learning (NECGL) seeks to eliminate the privacy risks intrinsic to rehearsal-based paradigms by retaining solely class-level prototype representations rather than raw graph examples for mitigating catastrophic forgetting. However, this design choice inevitably precipitates feature drift. As…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.5
2026-04-04 · Yuqi Zhu, Jintian Zhang, Zhenjie Wan, Yujie Luo, Shuofei Qiao, Zhengke Gui, Da Zheng, Lei Liang, Huajun Chen, Ningyu Zhang
General AI
Large language models (LLMs) excel at complex reasoning, yet their efficiency is limited by the surging cognitive overhead of long thought traces. In this paper, we propose LightThinker, a method that enables LLMs to dynamically compress intermediate thoughts into compact semantic representations. However, static compr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.5
2026-04-06 · Seoyoung Park, Haemin Lee, Hankook Lee
Research Track A · General AI
Task-free online continual learning has recently emerged as a realistic paradigm for addressing continual learning in dynamic, real-world environments, where data arrive in a non-stationary stream without clear task boundaries and can only be observed once. To consider such challenging scenarios, many recent approaches…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.5
2026-04-09 · Zhuang Qi, Ying-Peng Tang, Lei Meng, Guoqing Chao, Lei Wu, Han Yu, Xiangxu Meng
Research Track A
Exemplar replay has become an effective strategy for mitigating catastrophic forgetting in federated continual learning (FCL) by retaining representative samples from past tasks. Existing studies focus on designing sample-importance estimation mechanisms to identify information-rich samples. However, they typically ove…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.5
2026-04-21 · Fan Li, Chonghuinan Wang, Lina Lei, Yuping Qiu, Jiaqi Xu, Jiaxiu Jiang, Xinran Qin, Zhikai Chen, Fenglong Song, Zhixin Wang, Renjing Pei, Wangmeng Zuo
General AI
Common image editing tasks typically adopt powerful generative diffusion models as the leading paradigm for real-world content editing. Meanwhile, although reinforcement learning (RL) methods such as Diffusion-DPO and Flow-GRPO have further improved generation quality, efficiently applying Reinforcement Learning from H…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.5
2026-04-21 · Bobo Li, Rui Wu, Zibo Ji, Meishan Zhang, Hao Fei, Min Zhang, Mong-Li Lee, Wynne Hsu
General AI
Large Language Model agents have rapidly evolved from static text generators into dynamic systems capable of executing complex autonomous workflows. To enhance reliability, multi-agent frameworks assigning specialized roles are increasingly adopted to enable self-reflection and mutual auditing. While such role-playing …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.5
2026-04-21 · Qihua Dong, Gozde Sahin, Pei Wang, Zhaowei Cai, Robik Shrestha, Hao Yang, Davide Modolo
General AI
In this paper, we investigate the problem of how to effectively master tool-use to solve complex visual reasoning tasks for Multimodal Large Language Models. To achieve that, we propose a novel Tool-supervised Reinforcement Learning (ToolsRL) framework, with direct tool supervision for more effective tool-use learning.…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.5
2026-04-29 · Jinbiao Wei, Kangqi Ni, Yilun Zhao, Guo Gan, Arman Cohan
General AI
Computer-use agents provide a promising path toward general software automation because they can interact directly with arbitrary graphical user interfaces instead of relying on brittle, application-specific integrations. Despite recent advances in benchmark performance, strong computer-use agents remain expensive and …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.5
2026-04-30 · Qiyao Wang, Haoran Hu, Longze Chen, Hongbo Wang, Hamid Alinejad-Rokny, Yuan Lin, Min Yang
General AI
With the advancement of multimodal large language models (MLLMs) and coding agents, the website development has shifted from manual programming to agent-based project-level code synthesis. Existing benchmarks rely on idealized assumptions, especially for well-structured, information-rich inputs and static execution set…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.5
2026-05-07 · Oğuzhan Fatih Kar, Roman Bachmann, Yuanzheng Gong, Anders Boesen Lindbo Larsen, Afshin Dehghan
Research Track B · General AI
The web is complex, open-ended, and constantly changing, making it challenging to scale training data for visual web agents. Existing data collection attempts remain limited to offline trajectories for supervised fine-tuning or a handful of simulated environments for RL training, thus failing to capture web diversity. …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.5
2026-05-28 · Ziyan Liu, Zhezheng Hao, Yeqiu Chen, Hong Wang, Jingren Hou, Ruiyi Ding, Yongkang Yang, Wence Ji, Wei Xia, Feng Liu
General AI
Memory-augmented LLM agents tackle complex long-horizon tasks by recursively summarizing interaction trajectories into compact memory. However, existing approaches typically train these memory policies using outcome-based reinforcement learning, failing to localize where intermediate memory quality degrades. As interac…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.5
2026-06-02 · Zherui Yang, Fan Liu, Yansong Ning, Hao Liu
General AI
Recent progress in Large Language Model (LLM) agents has enabled promising advances in automated data science. However, existing approaches remain fundamentally limited by their static action sets and lack of principled long-horizon context management, hindering their ability to accumulate reusable experience across ta…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.5
2026-06-16 · Guibin Zhang, Xun Xu, Yanwei Yue, Zikun Su, Wangchunshu Zhou, Xiaobin Hu, Shuicheng Yan
Research Track A · General AI
Memory has become a standard substrate for self-evolving agents, yet retaining experience is not the same as learning how to evolve through it. Existing memory agents can store trajectories, retrieve reflections, or accumulate skills, but often lack the holistic competence to select useful experience, act on it, write …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.5
2026-06-29 · Rahul Khedar, Mayank Malhotra, Avinash Karn, Mouli V, Prakhar Mehrotra
Research Track B · General AI
Live product demonstrations are a recurring, high-cost activity in software organizations: a human presenter must select features, dispatch the corresponding interactions on a running application, narrate them coherently, and answer questions in real time. Existing automation addresses only fragments -- generalist brow…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.4
2026-06-23 · Morayo Danielle Adeyemi, Ryan A. Rossi, Franck Dernoncourt
General AI
"Talk short. Drop grammar. Save token." This caveman style is widely promoted as a way to cut inference cost, but whether it actually saves anything depends on which channel (the user's prompt or the model's response) is being compressed. We present Cavewoman, a two-channel evaluation protocol that scores every generat…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.4
2026-06-24 · Jiayu Li, Yixiao Fang, Tianyu Hu, Wei Cheng, Ping Huang, Zheheng Fan, Gang Yu, Xingjun Ma
General AI
Real-world photography requires capture-time guidance for both camera framing and subject pose. Yet existing aesthetic cropping benchmarks mainly evaluate post-hoc crop prediction and overlook subject-side recommendations, leaving the capture-time guidance capabilities of multimodal large language models (MLLMs) undere…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-03-30 · Songjun Tu, Chengdong Xu, Qichao Zhang, Yaocheng Zhang, Xiangyuan Lan, Linjing Li, Dongbin Zhao
General AI
Agentic reinforcement learning (RL) can benefit substantially from reusable experience, yet existing skill-based methods mainly extract trajectory-level guidance and often lack principled mechanisms for maintaining an evolving skill memory. We propose D2Skill, a dynamic dual-granularity skill bank for agentic RL that o…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-03-31 · Davide Di Gioia
General AI
Current autonomous AI agents, driven primarily by Large Language Models (LLMs), operate in a state of cognitive weightlessness: they process information without an intrinsic sense of network topology, temporal pacing, or epistemic limits. Consequently, heuristic agentic loops (e.g., ReAct) can exhibit failure modes in …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-02 · Payal Fofadiya, Sunil Tiwari
Research Track A · General AI
Long-horizon conversational agents require persistent memory for coherent reasoning, yet uncontrolled accumulation causes temporal decay and false memory propagation. Benchmarks such as LOCOMO and LOCCO report performance degradation from 0.455 to 0.05 across stages, while MultiWOZ shows 78.2% accuracy with 6.8% false …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-02 · Xueying Li, Feng Lyu, Hao Wu, Mingliu Liu, Jia-Nan Liu, Guozi Liu
General AI
Training-free Vision-Language Navigation (VLN) agents powered by foundation models can follow instructions and explore 3D environments. However, existing approaches rely on greedy frontier selection and passive spatial memory, leading to inefficient behaviors such as local oscillation and redundant revisiting. We argue…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-06 · Sixun Dong, Juhua Hu, Steven Li, Wei Wen, Qi Qian
General AI
Most vision-language models (VLMs) apply a large language model (LLM) as the decoder, where the response tokens are generated sequentially through autoregression. Therefore, the number of output tokens can be the bottleneck of the end-to-end latency. However, different models may require vastly different numbers of out…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-09 · Yifang Wang, Rui Sheng, Erzhuo Shao, Yifan Qian, Haotian Li, Nan Cao, Dashun Wang
General AI
Large language models (LLMs) are transforming scientific workflows, not only through their generative capabilities but also through their emerging ability to use tools, reason about data, and coordinate complex analytical tasks. Yet in most human-AI collaborations, the primary outputs, figures, are still treated as sta…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-13 · Stefan Miteski
Research Track A · General AI
Retrieval-Augmented Generation remains the dominant pattern for giving LLMs persistent memory, but a visible cluster of personal wiki-style memory architectures emerged in April 2026 -- design proposals from Karpathy, MemPalace, and LLM Wiki v2 that compile knowledge into an interlinked artifact for long-term use by a …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-14 · Sohyun An, Shuibenyang Yuan, Hayeon Lee, Cho-Jui Hsieh, Alexander Min
General AI
Reinforcement Learning (RL) has shown strong potential for optimizing search agents in complex information retrieval tasks. However, existing approaches predominantly rely on gold supervision, such as ground-truth answers, which is difficult to scale. To address this limitation, we propose Cycle-Consistent Search (CCS)…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-16 · Zihan Liang, Yufei Ma, Ben Chen, Zhipeng Qian, Huangyu Dai, Lingtao Mao, Xuxin Zhang, Chenyi Lei, Wenwu Ou
General AI
Reinforcement learning has emerged as an effective paradigm for training large language models to perform search-augmented reasoning. However, existing approaches rely on trajectory-level rewards that cannot distinguish precise search queries from vague or redundant ones within a rollout group, and collapse to a near-z…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-16 · Huanran Hu, Zihui Ren, Dingyi Yang, Liangyu Chen, Qixiang Gao, Tiezheng Ge, Qin Jin
General AI
Real-world video creation often involves a complex reasoning workflow of selecting relevant shots from noisy materials, planning missing shots for narrative completeness, and organizing them into coherent storylines. However, existing benchmarks focus on isolated sub-tasks and lack support for evaluating this full proc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-16 · Hao Gao, Shaoyu Chen, Yifan Zhu, Yuehao Song, Wenyu Liu, Qian Zhang, Xinggang Wang
General AI
High-level autonomous driving requires motion planners capable of modeling multimodal future uncertainties while remaining robust in closed-loop interactions. Although diffusion-based planners are effective at modeling complex trajectory distributions, they often suffer from stochastic instabilities and the lack of cor…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-20 · Andrew Zhang, Tong Ding, Sophia J. Wagner, Caiwei Tian, Ming Y. Lu, Rowland Pettit, Joshua E. Lewis, Alexandre Misrahi, Dandan Mo, Long Phi Le, Faisal Mahmood
General AI
Modern medicine generates vast multimodal data across siloed systems, yet no existing model integrates the full breadth and temporal depth of the clinical record into a unified patient representation. We introduce Apollo, a multimodal temporal foundation model trained and evaluated on over three decades of longitudinal…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-22 · Yuxuan Cai, Jie Zhou, Qin Chen, Liang He, Wei Li, Xin Li, Bo Zhang
Research Track A · General AI
Online lifelong learning enables agents to accumulate experience across interactions and continually improve on long-horizon tasks. However, existing methods typically treat retrieval from past experience as a passive operation, triggering it only at task initialization or after completing a step. Consequently, agents …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-22 · Inclusion AI, Tiwei Bie, Haoxing Chen, Tieyuan Chen, Zhenglin Cheng, Long Cui, Kai Gan, Zhicheng Huang, Zhenzhong Lan, Haoquan Li, Jianguo Li, Tao Lin, Qi Qin, Hongjun Wang, Xiaomei Wang, Haoyuan Wu, Yi Xin, Junbo Zhao
General AI
We present LLaDA2.0-Uni, a unified discrete diffusion large language model (dLLM) that supports multimodal understanding and generation within a natively integrated framework. Its architecture combines a fully semantic discrete tokenizer, a MoE-based dLLM backbone, and a diffusion decoder. By discretizing continuous vi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-22 · Qiguang Chen, Chengyu Luan, Jiajun Wu, Qiming Yu, Yi Yang, Yizhuo Li, Jingqi Tong, Xiachong Feng, Libo Qin, Wanxiang Che
General AI
Large vision-language models (LVLMs) have made substantial advances in reasoning tasks at the Olympiad level. Nevertheless, current Olympiad-level multimodal reasoning benchmarks for these models often emphasize single-image analysis and fail to exploit contextual information across multiple images. We present OMIBench…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-22 · Dongding Lin, Jian Wang, Yongqi Li, Wenjie Li
General AI
Situated conversational recommendation (SCR), which utilizes visual scenes grounded in specific environments and natural language dialogue to deliver contextually appropriate recommendations, has emerged as a promising research direction due to its close alignment with real-world scenarios. Compared to traditional reco…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-23 · Praval Sharma
General AI
Event extraction is essential for event understanding and analysis. It supports tasks such as document summarization and decision-making in emergency scenarios. However, existing event extraction approaches have limitations: (1) closed-domain algorithms are restricted to predefined event types and thus rarely generaliz…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-23 · Chee Wei Tan, Yuchen Wang, Shangxin Guo
General AI
This paper introduces a new paradigm for AI game programming, leveraging large language models (LLMs) to extend and operationalize Claude Shannon's taxonomy of game-playing machines. Central to this paradigm is Nemobot, an interactive agentic engineering environment that enables users to create, customize, and deploy L…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-24 · Jinghong Chen, Jingbiao Mei, Guangyu Yang, Bill Byrne
General AI
A common approach to question answering with retrieval-augmented generation (RAG) is to concatenate documents into a single context and pass it to a language model to generate an answer. While simple, this strategy can obscure the contribution of individual documents, making attribution difficult and contributing to th…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-28 · Jianghao Lin, Zi Ling, Chenyu Zhou, Tianyi Xu, Ruoqing Jiang, Zizhuo Wang, Dongdong Ge
General AI
Optimization modeling underpins real-world decision-making in logistics, manufacturing, energy, and public services, but reliably solving such problems from natural-language requirements remains challenging for current large language models (LLMs). In this paper, we propose \emph{Agora-Opt}, a modular agentic framework…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-28 · Guanglin Niu, Bo Li
General AI
Most multi-modal knowledge graph completion (MMKGC) models use one embedding scorer to do both retrieval over the full entity set and final decision making. We argue that this coupling is a core bottleneck: global high-recall search and local fine-grained disambiguation require different inductive biases. Therefore, we…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-28 · Xiyuan Yang, Jiaru Zou, Rui Pan, Ruizhong Qiu, Pan Lu, Shizhe Diao, Jindong Jiang, Hanghang Tong, Tong Zhang, Markus J. Buehler, Jingrui He, James Zou
General AI
Recursive or looped language models have recently emerged as a new scaling axis by iteratively refining the same model computation over latent states to deepen reasoning. We extend such scaling principle from a single model to multi-agent systems, and ask: Can agent collaboration itself be scaled through recursion? To …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-28 · Ran Gu, Benjamin Hou, Mélanie Hébert, Asmita Indurkar, Yifan Yang, Emily Y. Chew, Tiarnán D. L. Keenan, Zhiyong Lu
General AI
Despite strong performance of deep learning models in retinal disease detection, most systems produce static predictions without clinical reasoning or interactive explanation. Recent advances in multimodal large language models (MLLMs) integrate diagnostic predictions with clinically meaningful dialogue to support clin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-29 · Saber Zerhoudi, Michael Granitzer, Jelena Mitrovic
General AI
Training trustworthy agentic LLMs requires data that shows the grounded reasoning process, not just the final answer. Existing datasets fall short: question-answering data is outcome-only, chain-of-thought data is not tied to specific documents, and web-agent datasets track interface actions rather than the core retrie…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-29 · Wanrong Zheng, Yunhao Ge, Laurent Itti
General AI
Breakthrough progress in vision-based navigation through unknown environments has been achieved by using multimodal large language models (MLLMs). These models can plan a sequence of motions by evaluating the current view at each time step against the task and goal given to the agent. However, current zero-shot Vision-…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-04-30 · Yanting Wang, Chenlong Yin, Ying Chen, Jinyuan Jia
General AI
Long-context large language models (LLMs)-for example, Gemini-3.1-Pro and Qwen-3.5-are widely used to empower many real-world applications, such as retrieval-augmented generation, autonomous agents, and AI assistants. However, security remains a major concern for their widespread deployment, with threats such as prompt…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-05-01 · Yawen Qin, Ke Qiu, Qin Zhang
General AI
Dance serves as both a cultural cornerstone and a medium for personal expression, yet the rapid growth of online dance content has made personalized discovery increasingly difficult. Text-based dance retrieval offers a natural interface for users to search with choreographic intent, but it remains underexplored because…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-05-01 · Arunabh Srivastava, Mohammad A., Khojastepour, Srimat Chakradhar, Sennur Ulukus
General AI
Humans solve problems by executing targeted plans, yet large language models (LLMs) remain unreliable for structured workflow execution. We propose RunAgent, a multi-agent plan execution platform that interprets natural-language plans while enforcing stepwise execution through constraints and rubrics. RunAgent bridges …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-05-22 · Jonah R. Donaldson, Aliya Navaz, Konstantinos Doran, Alysta Lim, Mario Campanelli
General AI
The rapid advancement of Large Language Models (LLMs) has introduced new possibilities and challenges in physics education, necessitating rigorous evaluation of their capabilities as both problem solvers and automated assessors. This paper presents the results of three complementary studies that evaluated frontier mode…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-06-05 · Chengkai Zhang, Ziteng Liu, Junpu Wang, Zeyi Tao, Yang Wang, Sagar Chordia, Qin Huang
General AI
Targeted post-training aims to improve reasoning, math, and code without degrading strengths. Low-rank adapters are efficient but task-global; activation interventions are input-aware but often require separate probes, vectors, or inference-time steering. We introduce TALAN (Task-Aligned Latent Adaptation Networks), a …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-06-11 · Seokju Cho, Ryo Hachiuma, Abhishek Badki, Hang Su, Byung-Kwan Lee, Chan Hee Song, Sifei Liu, Subhashree Radhakrishnan, Seungryong Kim, Yu-Chiang Frank Wang, Min-Hung Chen
General AI
Spatial reasoning, the ability to determine where objects are, how they relate, and how they move in 3D, remains a fundamental challenge for vision-language models (VLMs). Tool-augmented agents attempt to address this by augmenting VLMs with specialist perception modules, yet their effectiveness is bounded by the actio…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-06-14 · Ali Sarabadani, Mahtab Tajvidiyan
Research Track A · General AI
Large Language Models (LLMs) struggle to incorporate new knowledge without forgetting or costly retraining. We propose DYNA, a lightweight framework that augments a frozen LLM with a temporal knowledge graph where events are nodes and temporal relations are directed, timestamped edges. The graph serves as an external, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-06-15 · Sanjay Basu
General AI
Aggregate accuracy benchmarks conceal a systematic structure in how large language models fail at electronic health record (EHR) question answering: questions requiring more inferential steps produce disproportionately more errors. Motivated by theoretical results on transformer compositionality limits, we introduce a …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-06-15 · Patomporn Payoungkhamdee, Napat Laosaengpha, Jenta Wonglertsakul, Pittawat Taveekitworachai, Pume Tuchinda, Panjapong Poobanchuen, Ekapol Chuangsuwanich, Can Udomcharoenchaikit, Samuel Cahyawijaya, Peerat Limkonchotiwat, Sarana Nutanong
General AI
Reasoning with a Code Interpreter (CI) has emerged as an effective paradigm for enhancing the reasoning capabilities of large language models (LLMs) through executable computation and iterative verification. Despite its growing adoption, the behavioral properties underlying effective code reasoning remain largely under…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-06-16 · Yifu Luo, Zeyu Chen, Haoyu Wang, Xinhao Hu, Yuxuan Zhang, Zhizhou Sha, Shiwei Liu
General AI
On-policy self-distillation (OPSD) has proven effective for post-training large language models (LLMs), yet its application to diffusion LLMs (dLLMs) remains unexplored. Existing OPSD methods are inherently autoregressive-centric. They inject privileged information via left-to-right prefix conditioning with token-level…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-06-16 · Michèle Finck
General AI
Large language models now produce legal text of at least median quality, yet no existing benchmark can evaluate whether they perform doctrinal legal reasoning, which forms the interpretive core of legal work, rather than the ancillary, paralegal tasks that most current legal-AI evaluations measure. This measurement gap…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-06-17 · Ruida Wang, Rui Pan, Pengcheng Wang, Shizhe Diao, Tong Zhang
General AI
Enhancing the formal math reasoning capabilities of Large Language Models (LLMs) has become a key focus in both mathematical and computer science communities in recent years. While significant progress has been made in using state-of-the-art Auto-Regressive (AR) LLMs for formal theorem proving, these models suffer from…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-06-17 · Leyang Shen, Yang Zhang, Xiaoyan Zhao, Chun Kai Ling, Tat-Seng Chua
General AI
Large language model (LLM)-based multi-agent systems (MAS) have demonstrated great potential in solving tasks with execution complexity, by distributing subtasks across cooperative agents. However, this divide-and-conquer paradigm falls short on decision-making tasks that are also prevalent in the real world. These tas…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.2
2026-06-23 · Ali Pourghasemi Fatideh, Wilder Baldwin, Maria Dhakal, Collin McMillan, Sepideh Ghanavati
General AI
LLM-based dialogue assistants have become mainstream tools for software developers, yet current evaluation benchmarks focus exclusively on functional correctness. This leaves a critical gap in assessing the quality and accuracy of these conversations when handling Non-Functional Requirements (NFRs), which are inherentl…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.0
2026-04-08 · Ziqiao Ma, Xueyang Yu, Haoyu Zhen, Yuncong Yang, Joyce Chai, Chuang Gan
Research Track A · General AI
Large Chunk Test-Time Training (LaCT) has shown strong performance on long-context 3D reconstruction, but its fully plastic inference-time updates remain vulnerable to catastrophic forgetting and overfitting. As a result, LaCT is typically instantiated with a single large chunk spanning the full input sequence, falling…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.0
2026-04-14 · Yifei Yan, Linqi Ye
Research Track A · General AI
As reinforcement learning for humanoid robots evolves from single-task to multi-skill paradigms, efficiently expanding new skills while avoiding catastrophic forgetting has become a key challenge in embodied intelligence. Existing approaches either rely on complex topology adjustments in Mixture-of-Experts (MoE) models…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.0
2026-04-22 · Saish Sachin Shinde
Research Track A · General AI
We present SCM (Sleep-Consolidated Memory), a research preview of a memory architecture for large language models that draws on neuroscientific principles to address a fundamental limitation in current systems: the absence of persistent, structured, and biologically plausible memory. Existing approaches rely on truncat…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.0
2026-06-04 · Ayushman Trivedi, Bhavika Melwani
Research Track A
Catastrophic forgetting is commonly interpreted as the irreversible erasure of previously acquired knowledge during sequential learning. In this work, we investigate an alternative perspective: that forgetting may arise not from complete destruction of task representations but from a loss of accessibility to preserved …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.0
2026-06-04 · Hongye Xu, Bartosz Krawczyk
Research Track A · General AI
Exemplar-free class-incremental learning (EFCIL) aims to acquire new classes over time without storing raw data. Historically, prototype rehearsal, which samples around stored class prototypes and mixes them with current-task data, has been a popular strategy to reduce catastrophic forgetting. However, recent drift-com…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.0
2026-06-04 · Hongye Xu, Bartosz Krawczyk
Research Track A · General AI
Continual learning (CL) seeks models that acquire new skills without erasing prior knowledge. In exemplar-free class-incremental learning (EFCIL), this challenge is amplified because past data cannot be stored, making representation drift for old classes particularly harmful. Prototype-based EFCIL is attractive for its…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.0
2026-06-06 · Daoqing Wang, Yuchen Xiao, Weixuan Huang, Zhilong Zhang, Shenghua Wan, Meng Li, Lei Yuan, Yang Yu
Research Track A · General AI
Multi-quadruped coordination has attracted increasing attention due to its enhanced payload capacity, broader contact coverage, and improved adaptability to challenging tasks. Existing methods for multi-quadruped manipulation typically focus on predefined or closed task families, often relying on multi-agent reinforcem…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.0
2026-06-11 · Shihao Xu, Tiancheng Zhou, Jiatong Ma, Yanli Ding, Yiming Yan, Ming Xiao, Guoyi Li, Haiyang Geng, Yunyun Han, Jianhua Chen, Yafeng Deng
General AI
Mental disorders are highly prevalent worldwide, but the shortage of psychiatrists and the inherent subjectivity of interview-based diagnosis create substantial barriers to timely and consistent mental-health assessment. Progress in AI-assisted psychiatric diagnosis is constrained by the absence of benchmarks that simu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.0
2026-06-11 · Minlin Zeng, Zhipeng Zhou, Yang Qiu, Martin J. McKeown, Zhiqi Shen
Research Track A · General AI
Gait-based Parkinson's disease assessment increasingly relies on heterogeneous sensors, but clinical systems rarely collect all modalities simultaneously. New sensors may arrive through device upgrades, protocol changes, or multi-center deployment, while historical patient data are often unavailable because of privacy …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.0
2026-06-11 · Ayushman Trivedi, Bhavika Melwani
Research Track A · General AI
Catastrophic forgetting is often viewed as the destruction of previously learned knowledge during sequential learning. Building on the Accessibility Collapse framework, we investigate the geometric structure of recoverability in continual learning. Using Split CIFAR-100 and a sequentially trained ResNet-18, we analyze …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.0
2026-06-12 · Salimeh Sekeh, Mary Wisell
Research Track A · General AI
Continual vision-language models are commonly addressed through sequential fine-tuning; however, although this paradigm enables adaptation to new environments (tasks), it inherently emphasizes the contribution of previously learned environments (tasks) at the expense of the stability required to preserve previously acq…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.0
2026-06-29 · Naeem Paeedeh, Mahardhika Pratama, Wolfgang Mayer, Mukesh Prasad, Weiping Ding, Yew-Soon Ong
Research Track A · General AI
Existing domain-incremental learning (DIL) strategies call for massive amounts of data to adapt to new domains and suffer from the overfitting problem in the case of data scarcity. This paper puts forward a relatively uncharted problem, namely, few-shot domain incremental learning (FSDIL), taking into account the probl…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.9
2026-06-20 · Rishi Srivastava
Research Track B · General AI
We introduce CFAgentBench, a reproducible, self-hostable environment and benchmark for autonomous construction-finance agents: a CFO/controller-class agent operating across the real software stack a US construction finance team runs - ERP, project management, email, documents, pay applications, payroll, certified payro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-05-07 · Mingwei Xu, Hao Fang
General AI
Reinforcement learning with verifiable rewards (RLVR), due to the deterministic verification, becomes a dominant paradigm for enhancing the reasoning ability of large language models (LLMs). The community witnesses the rapid change from the Proximal Policy Optimization (PPO) to Group Relative Policy Optimization (GRPO)…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-05-07 · Ziyu Zhai, Siyou Li, Juexi Shao, Juntao Yu
General AI
Developing ceramic glazes is a costly, time-consuming process of trial and error due to complex chemistry, placing a significant burden on independent artists. While recent advances in multimodal AI offer a modern solution, the field lacks the large-scale datasets required to train these models. We propose GlazyBench, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-05-07 · Siru Ouyang, Jun Yan, Yanfei Chen, Rujun Han, Zifeng Wang, Bhavana Dalvi Mishra, Rui Meng, Chun-Liang Li, Yizhu Jiao, Kaiwen Zha, Maohao Shen, Vishy Tirumalashetty, George Lee, Jiawei Han, Tomas Pfister, Chen-Yu Lee
General AI
LLM-based agents are increasingly deployed to handle streaming tasks, yet they often remain one-off problem solvers that fail to learn from past interactions. Reusable skills distilled from experience provide a natural substrate for self-evolution, where high-quality skill curation serves as the key bottleneck. Existin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-05-12 · Yanting Miao, Yutao Sun, Dexin Wang, Mengyu Zhou, Pascal Poupart, Lei Lv, Qi Zhao, Li Wang, Hao Li, Xiaoxi Jiang, Guanjun Jiang
General AI
Visual latent reasoning lets a multimodal large language model (MLLM) create intermediate visual evidence as continuous tokens, avoiding external tools or image generators. However, existing methods usually follow an output-as-input latent paradigm and yield unstable gains. We identify evidence for a feature-space mism…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-05-12 · Tom Sander, Hongyan Chang, Tomáš Souček, Tuan Tran, Valeriu Lacatusu, Sylvestre-Alvise Rebuffi, Alexandre Mourachko, Surya Parimi, Christophe Ropers, Rashel Moritz, Vanessa Stark, Hady Elsahar, Pierre Fernandez
General AI
We introduce TextSeal, a state-of-the-art watermark for large language models. Building on Gumbel-max sampling, TextSeal introduces dual-key generation to restore output diversity, along with entropy-weighted scoring and multi-region localization for improved detection. It supports serving optimizations such as specula…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-05-21 · Ruofan Jin, Zaixi Zhang
General AI
Vision-Language-Action (VLA) models have emerged as a promising paradigm for robotic manipulation by leveraging pre-trained vision-language representations. However, current VLA training methods suffer from two critical limitations: poor generalization to novel environments and low training efficiency requiring extensi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-05-22 · Beichen Zhang, Yuhong Liu, Jinsong Li, Yuhang Zang, Jiaqi Wang, Dahua Lin
General AI
Multimodal Large Language Models have advanced visual reasoning, yet a purely textual chain of thought remains a bottleneck for questions that require fine-grained focus or view transformations. The ''think with images'' paradigm narrows this gap, but existing approaches are either constrained by fixed predefined toolk…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-05-28 · Feng Han, Zhixiong Zhang, Zheming Liang, Yibin Wang, Jiaqi Wang
General AI
Vision-Language Models (VLMs) have achieved substantial progress across a wide range of understanding and reasoning tasks, driven by large-scale image-text training aimed at multimodal fusion. Ideally, replacing a textual question with its rendered-image counterpart should leave model performance essentially unaffected…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-05-28 · Valentina Bui Muti, Eugénie Dulout, Ziquan Fu
General AI
Large language models (LLMs) show promise for clinical reasoning and decision support, but evaluation in realistic, electronic health record-congruent settings remains limited. Existing benchmarks often rely on static datasets or unstructured inputs that do not reflect the structured, interoperable data formats used in…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-05-28 · Shuaidi Wang, Zhan Zhuang, Ruping Huang, Yu Zhang
General AI
Diffusion Large Language Models (dLLMs) have emerged as a promising non-autoregressive generative paradigm. Given the prohibitive computational cost of full fine-tuning, Parameter-Efficient Fine-Tuning (PEFT) has become the standard approach. However, existing PEFT methods (e.g., LoRA), originally tailored for autoregr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-05-28 · Amrita Mazumdar, Seonwook Park, Rajarshi Roy, Nikhil Srihari, Shengze Wang, Yuhao Zhou, Julia Wang, Koki Nagano, Shalini De Mello
General AI
Natural human conversation is full-duplex and audio-visual: people simultaneously speak and listen while continuously interpreting and producing nonverbal cues, such as nods, smiles, and gestures. To support successful human-agent interaction, agents must model full-duplex audiovisual conversation; however, existing fu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-06-18 · To Eun Kim, Xuhong He, Dishank Jain, Ambuj Agrawal, Negar Arabzadeh, Fernando Diaz
Research Track B · General AI
The decentralized deployment of LLM agents with diverse capabilities across diverse tasks motivates infrastructure for knowledge sharing across heterogeneous agent populations. Just as search engines index human-generated artifacts to support human problem solving, retrieval systems can organize agent-generated artifac…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-06-29 · Yuhong Deng, Yuyao Liu, David Hsu
General AI
Can the robot use a plate to cut a cake if no knife is available? Tool use greatly expands robot capabilities, but to use tools creatively beyond their intended functions, the robot faces the challenge of $\textit{open-world affordance grounding}$: select an open-category object to act as a tool and localize its specif…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-06-29 · Jiamei Jiang, Jiajing Zhang, Feifei Mo, Linjing Li, Daniel Zeng
General AI
Planning often requires symbolic specifications that are both executable and verifiable. For large language models deployed in autonomous or decision-support systems, failures in such formalization may lead to unverifiable decisions, execution failures, or unsafe downstream behavior. We present NL-PDDL-Bench, a multi-d…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-06-30 · Prakhar Dixit, Tim Oates
Research Track A · General AI
We propose Intelligent Schema Memory (ISM), a self-evolving memory-augmented system that improves mathematical reasoning for a frozen LLM under continual learning with hard episodic resets. ISM maintains a compact, self-refined bank of strategy schemas learned from both successful and failed episodes, with symbolic too…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-06-30 · Gaurab Baral, Aaditya Khanal, Yangyang Tao, Junxiu Zhou
General AI
This paper investigates knowledge distillation from a large reasoning model (DeepSeek-R1) to a compact student model (Qwen2.5-7B). Using historical problems from the John O'Bryan Mathematics Competition at Northern Kentucky University (2011-2025), we build a Chain-of-Thought (CoT) training corpus through a dual-agent f…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.6
2026-07-02 · Emmanuel George, Christopher Keefe, Peter Pak, Amir Barati Farimani
General AI
Parts manufactured with Fused Deposition Modeling (FDM) often require Design for Additive Manufacturing (DFAM) modifications to ensure printability, structural integrity, and reduced post-processing. Current slicers identify defects such as steep overhangs but are unable to modify the underlying geometry. This work pre…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.5
2026-03-06 · Xudong Wang, Jiahua Dong, Baichen Liu, Qi Lyu, Lianqing Liu, Zhi Han
Research Track A · General AI
Embodied navigation agents powered by large language models have shown strong performance on individual tasks but struggle to continually acquire new navigation skills, which suffer from catastrophic forgetting. We formalize this challenge as lifelong embodied navigation learning (LENL), where an agent is required to a…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.5
2026-03-15 · Jiayuan Du, Yuebing Song, Yiming Zhao, Xianghui Pan, Jiawei Lian, Yuchu Lu, Liuyi Wang, Chengju Liu, Qijun Chen
Research Track A · General AI
End-to-End autonomous driving (E2E-AD) systems face challenges in lifelong learning, including catastrophic forgetting, difficulty in knowledge transfer across diverse scenarios, and spurious correlations between unobservable confounders and true driving intents. To address these issues, we propose DeLL, a Deconfounded…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-03-26 · Yuqian Fu, Haohuan Huang, Kaiwen Jiang, Yuanheng Zhu, Dongbin Zhao
General AI
On-policy distillation (OPD) is appealing for large language model (LLM) post-training because it evaluates teacher feedback on student-generated rollouts rather than fixed teacher traces. In long-horizon settings, however, the common sampled-token variant is fragile: it reduces distribution matching to a one-token sig…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-03-30 · Hongtao Wu, Boyun Zheng, Dingjie Song, Yu Jiang, Jianfeng Gao, Lei Xing, Lichao Sun, Yixuan Yuan
General AI
Autonomous systems that generate scientific hypotheses, conduct experiments, and draft manuscripts have recently emerged as a promising paradigm for accelerating discovery. However, existing AI Scientists remain largely domain-agnostic, limiting their applicability to clinical medicine, where research is required to be…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-04-02 · Yang Zhou, Xiaofeng Wang, Hao Shao, Letian Wang, Guosheng Zhao, Jiangnan Shao, Jiagang Zhu, Tingdong Yu, Zheng Zhu, Guan Huang, Steven L. Waslander
General AI
Recently, world-action models (WAM) have emerged to bridge vision-language-action (VLA) models and world models, unifying their reasoning and instruction-following capabilities and spatio-temporal world modeling. However, existing WAM approaches often focus on modeling 2D appearance or latent representations, with limi…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-04-02 · Difan Jiao, Qianfeng Wen, Blair Yang, Zhenwei Tang, Ashton Anderson
General AI
We introduce ThinkTwice, a simple two-phase framework that jointly optimizes LLMs to solve reasoning problems and refine the answers, based on Group Relative Policy Optimization (GRPO). In each pair of training steps, ThinkTwice first optimizes the model on solving reasoning problems, then optimizes it on refining its …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.5
2026-04-13 · Peng Yuan, Yuyang Yin, Yuxuan Cai, Zheng Wei
Research Track B · General AI
Existing browser agent benchmarks face a fundamental trilemma: real-website benchmarks lack reproducibility due to content drift, controlled environments sacrifice realism by omitting real-web noise, and both require costly manual curation that limits scalability. We present WebForge, the first fully automated framewor…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-04-14 · Linhao Yu, Tianmeng Yang, Siyu Ding, Renren Jin, Naibin Gu, Xiangzhao Hao, Shuaiyi Nie, Deyi Xiong, Weichong Yin, Yu Sun, Hua Wu
General AI
RLVR improves reasoning in large language models, but its effectiveness is often limited by severe reward sparsity on hard problems. Recent hint-based RL methods mitigate sparsity by injecting partial solutions or abstract templates, yet they typically scale guidance by adding more tokens, which introduce redundancy, i…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-04-14 · NVIDIA, Aakshita Chandiramani, Aaron Blakeman, Abdullahi Olaoye, Abhibha Gupta, Abhilash Somasamudramath, Abhinav Khattar, Adeola Adesoba, Adi Renduchintala, Adil Asif, Aditya Agrawal, Aditya Vavre, Ahmad Kiswani, Aishwarya Padmakumar, Ajay Hotchandani, Akanksha Shukla, Akhiad Bercovich, Aleksander Ficek, Aleksandr Shaposhnikov, Alex Gronskiy, Alex Kondratenko, Alex Neefus, Alex Steiner, Alex Yang, Alexander Bukharin, Alexander Young, Ali Hatamizadeh, Ali Taghibakhshi, Alina Galiautdinova, Alisa Liu, Alok Kumar, Ameya Sunil Mahabaleshwarkar, Amir Klein, Amit Zuker, Amnon Geifman, Anahita Bhiwandiwalla, Ananth Subramaniam, Andrew Tao, Anjaney Shrivastava, Anjulie Agrusa, Ankur Srivastava, Ankur Verma, Ann Guan, Anna Shors, Annamalai Chockalingam, Anubhav Mandarwal, Aparnaa Ramani, Arham Mehta, Arti Jain, Arun Venkatesan, Asha Anoosheh, Ashwath Aithal, Ashwin Poojary, Asif Ahamed, Asit Mishra, Asli Sabanci Demiroz, Asma Kuriparambil Thekkumpate, Atefeh Sohrabizadeh, Avinash Kaur, Ayush Dattagupta, Barath Subramaniam Anandan, Bardiya Sadeghi, Barnaby Simkin, Ben Lanir, Benedikt Schifferer, Benjamin Chislett, Besmira Nushi, Bilal Kartal, Bill Thiede, Bita Darvish Rouhani, Bobby Chen, Boris Ginsburg, Brandon Norick, Branislav Kisacanin, Brian Yu, Bryan Catanzaro, Buvaneswari Mani, Carlo del Mundo, Chankyu Lee, Chanran Kim, Chantal Hwang, Chao Ni, Charles Wang, Charlie Truong, Cheng-Ping Hsieh, Chenhan Yu, Chenjie Luo, Cherie Wang, Chetan Mungekar, Chintan Patel, Chris Alexiuk, Chris Holguin, Chris Wing, Christian Munley, Christopher Parisien, Chuck Desai, Chunyang Sheng, Collin Neale, Cyril Meurillon, Dakshi Kumar, Dan Gil, Dan Su, Dane Corneil, Daniel Afrimi, Daniel Burkhardt Eliuth Triana, Daniel Egert, Daniel Fatade, Daniel Lo, Daniel Rohrer, Daniel Serebrenik, Daniil Sorokin, Daria Gitman, Daria Levy, Darko Stosic, David Edelsohn, David Messina, David Mosallanezhad, David Tamok, Deena Donia, Deepak Narayanan, Devin O'Kelly, Dheeraj Peri, Dhruv Nathawani, Di Wu, Dima Rekesh, Dina Yared, Divyanshu Kakwani, Dmitry Konyagin Brandon Tuttle, Dong Ahn, Dongfu Jiang, Dorrin Poorkay, Douglas O'Flaherty, Duncan Riach, Dusan Stosic, Dustin Van Stee, Edgar Minasyan, Edward Lin, Eileen Peters Long, Elad Segal, Elena Lantz, Elena Lewis, Ellie Evans, Elliott Ning, Eric Chung, Eric Harper, Eric Pham-Hung, Eric W. Tramel, Erick Galinkin, Erik Pounds, Esti Etrog, Evan Briones, Evan Wu, Evelina Bakhturina, Evgeny Tsykunov, Ewa Dobrowolska, Farshad Saberi Movahed, Farzan Memarian, Fay Wang, Fei Jia, Felipe Soares, Felipe Vieira Frujeri, Feng Chen, Fengguang Lin, Ferenc Galko, Fortuna Zhang, Frankie Siino, Frida Hou, Gantavya Bhatt, Gargi Prasad, Geethapriya Venkataramani, Geetika Gupta, George Armstrong, Gerald Shen, Giulio Borghesi, Gordana Neskovic, Gorkem Batmaz, Grace Lam, Grace Wu, Greg Pauloski, Greyson Davis, Grigor Nalbandyan, Guoming Zhang, Guy Farber, Guyue Huang, Haifeng Qian, Haran Kumar Shiv Kumar, Harry Kim, Harsh Sharma, Hayate Iso, Hayley Ross, Herbert Hum, Herman Sahota, Hexin Wang, Himanshu Soni, Hiren Upadhyay, Huy Nguyen, Iain Cunningham, Ido Galil, Ido Shahaf, Igino Padovani, Igor Gitman, Igor Shovkun, Ikroop Dhillon, Ilya Loshchilov, Ingrid Kelly, Itamar Schen, Itay Levy, Ivan Moshkov, Izik Golan, Izzy Putterman, Jain Tu, Jan Baczek, Jan Kautz, Jane Polak Scowcroft, Janica Rosenberg, Jared Casper, Jarrod Pflum, Jason Grant, Jason Sewall, Jatin Mitra, Jeffrey Glick, Jenny Chen, Jesse Oliver, Jiacheng Xu, Jiafan Zhu, Jialin Song, Jian Zhang, Jiaqi Zeng, Jie Lou, Jill Milton, Jim Chow, Jimmy Zhang, Jinhang Choi, Jining Huang, Jocelyn Huang, Joel Caruso, Joey Conway, Joey Guman, Johan Jatko, John Kamalu, Johnny Greco, Jonathan Cohen, Jonathan Raiman, Joseph Jennings, Joyjit Daw, Juan Yu, Julio Tapia, Junkeun Yi, Jupinder Parmar, Jyothi Achar, Kari Briski, Kartik Mattoo, Katherine Cheung, Katherine Luna, Keith Wyss, Kevin Shih, Kezhi Kong, Khanh Nguyen, Khushi Bhardwaj, Kirill Buryak, Kirthi Shankar Sivamani, Konstantinos Krommydas, Kris Murphy, Krishna C. Puvvada, Krzysztof Pawelec, Kumar Anik, Laikh Tewari, Laya Sleiman, Leo Du, Leon Derczynski, Li Ding, Lilach Ilan, Lingjie Wu, Lizzie Wei, Luis Vega, Lun Su, Maarten Van Segbroeck, Maer Rodrigues de Melo, Magaret Zhang, Mahan Fathi, Makesh Narsimhan Sreedhar, Makesh Sreedhar, Makesh Tarun Chandran, Manuel Reyes Gomez, Maor Ashkenazi, Marc Cuevas, Marc Romeijn, Margaret Zhang, Mark Cai, Mark Gabel, Markus Kliegl, Martyna Patelka, Maryam Moosaei, Matthew Varacalli, Matvei Novikov, Mauricio Ferrato, Mehrzad Samadi, Melissa Corpuz, Meng Xin, Mengdi Wang, Mengru Wang, Meredith Price, Micah Schaffer, Michael Andersch, Michael Boone, Michael Evans, Michael Z Wang, Miguel Martinez, Mikail Khona, Mike Chrzanowski, Mike Hollinger, Mingyuan Ma, Minseok Lee, Mohammad Dabbah, Mohammad Shoeybi, Mostofa Patwary, Nabin Mulepati, Nader Khalil, Najeeb Nabwani, Nancy Agarwal, Nanthini Balasubramaniam, Narimane Hennouni, Narsi Kodukula, Natalie Hereth, Nathaniel Pinckney, Nave Assaf, Negar Habibi, Nestor Qin, Neta Zmora, Netanel Haber, Nick Reamaroon, Nickson Quak, Nidhi Bhatia, Nikhil Jukar, Nikki Pope, Nikolai Ludwig, Nima Tajbakhsh, Nir Ailon, Nirmal Juluru, Nirmalya De, Nowel Pitt, Oleg Rybakov, Oleksii Hrinchuk, Oleksii Kuchaiev, Olivier Delalleau, Oluwatobi Olabiyi, Omer Ullman Argov, Omri Almog, Omri Puny, Oren Tropp, Otavio Padovani, Ouye Xie, Parth Chadha, Pasha Shamis, Paul Gibbons, Pavlo Molchanov, Peter Belcak, Peter Jin, Pinky Xu, Piotr Januszewski, Pooya Jannaty, Prachi Shevate, Pradeep Thalasta, Pranav Prashant Thombre, Prasoon Varshney, Prerana Gambhir, Pritam Gundecha, Przemek Tredak, Qing Miao, Qiyu Wan, Quan Tran Minh, Rabeeh Karimi Mahabadi, Rachel Oberman, Rachit Garg, Rahul Kandu, Raina Zhong, Ran El-Yaniv, Ran Zilberstein, Rasoul Shafipour, Renee Yao, Renjie Pi, Richard Mazzarese, Richard Wang, Rick Izzo, Ridhima Singla, Rima Shahbazyan, Rishabh Garg, Ritika Borkar, Ritu Gala, Riyad Islam, Robert Clark, Robert Hesse, Roger Waleffe, Rohit Varma Kalidindi, Rohit Watve, Roi Koren, Ron Fan, Ruchika Kharwar, Ruisi Cai, Ruoxi Zhang, Russell J. Hewett, Ryan Prenger, Ryan Timbrook, Ryota Egashira, Sadegh Mahdavi, Sagar Singh Ashutosh Joshi, Sahil Modi, Samuel Kriman, Sandeep Pombra, Sanjay Kariyappa, Sanjeev Satheesh, Santiago Pombo, Saori Kaji, Satish Pasumarthi, Saurav Mishra, Saurav Muralidharan, Scott Hara, Sean Narenthiran, Sebastian Rogawski, Seonjin Na, Seonmyeong Bak, Sepehr Sameni, Seth Poulos, Shahar Mor, Shantanu Acharya, Shaona Ghosh Adam Lord, Sharath Turuvekere Sreenivas, Shaun Kotek, Shaya Gharghabi, Shelby Thomas, Sheng-Chieh Lin, Shibani Likhite, Shiqing Fan, Shiyang Chen, Shreya Gopal, Shrimai Prabhumoye, Shubham Pachori, Shubham Toshniwal, Shuo Zhang, Shuoyang Ding, Shyam Renjith, Shyamala Prayaga, Siddhartha Jain, Simeng Sun, Sirisha Rella, Sirshak Das, Smita Ithape, Sneha Harishchandra S, Somshubra Majumdar, Soumye Singhal, Sri Harsha Singudasu, Sriharsha Niverty, Stas Sergienko, Stefana Gloginic, Stefania Alborghetti, Stephen Ge, Stephen McCullough, Sugam Dipak Devare, Suguna Varshini Velury, Sukrit Rao, Sumeet Kumar Barua, Sunny Gai, Suseella Panguluri, Sushil Koundinyan, Swathi Patnam, Sweta Priyadarshi, Swetha Bhendigeri, Syeda Nahida Akter, Sylendran Arunagiri, Tailling Yuan, Talor Abramovich, Tan Bui, Tan Yu, Terry Kong, Thanh Do, Thomas Gburek, Thorgane Marques, Tiffany Moore, Tijmen Blankevoort, Tim Moon, Timothy Ma, Tiyasa Mitra, Tomasz Grzegorzek, Tomer Asida, Tomer Bar Natan, Tomer Keren, Tomer Ronen, Traian Rebedea, Trenton Starkey, Tugrul Konuk, Twinkle Vashishth, Tyler Condensa, Udi Karpas, Ushnish De, Vahid Noorozi, Vahid Noroozi, Vanshil Atul Shah, Veena Vaidyanathan, Venkat Srinivasan, Venmugil Elango, Victor Cui, Vijay Korthikanti, Vikas Mehta, Virginia Adams, Virginia Wu, Vitaly Kurin, Vitaly Lavrukhin, Vladimir Anisimov, Wan Seo, Wanli Jiang, Wasi Uddin Ahmad, Wei Du, Wei Ping, Wei-Ming Chen, Wendy Quan, Wenliang Dai, Wenwen Gao, Will Jennings, William Zhang, Xiaowei Ren, Xiaowen Xin, Xin Li, Yang Yu, Yangyi Chen, Yaniv Galron, Yashaswi Karnati, Yejin Choi, Yev Meyer, Yi-Fu Wu, Yian Zhang, Ying Lin, Yonatan Geifman, Yonggan Fu, Yoshi Suhara, Youngeun Kwon, Yuan Zhang, Yuki Huang, Zach Moshe, Zhilin Wang, Zhiyu Cheng, Zhongbo Zhu, Zhuolin Yang, Zihan Liu, Zijia Chen, Zijie Yan, Zuhair Ahmed
General AI
We describe the pre-training, post-training, and quantization of Nemotron 3 Super, a 120 billion (active 12 billion) parameter hybrid Mamba-Attention Mixture-of-Experts model. Nemotron 3 Super is the first model in the Nemotron 3 family to 1) be pre-trained in NVFP4, 2) leverage LatentMoE, a new Mixture-of-Experts arch…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.5
2026-04-15 · Muhammad Ahmed Ullah Khan, Muhammad Haris Bin Amir, Didier Stricker, Muhammad Zeshan Afzal
Research Track A · General AI
Continual learning enables models to acquire new knowledge over time while retaining previously learned capabilities. However, its application to text-to-3D generation remains unexplored. We present ReConText3D, the first framework for continual text-to-3D generation. We first demonstrate that existing text-to-3D model…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-04-16 · Bowen Ping, Zijun Chen, Tingfeng Hui, Qize Yu, Chenxuan Li, Junchi Yan, Baobao Chang
General AI
Reinforcement Learning (RL) has emerged as a critical driver for enhancing the reasoning capabilities of Large Language Models (LLMs). While recent advancements have focused on reward engineering or data synthesis, few studies exploit the model's intrinsic representation characteristics to guide the training process. I…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-04-20 · Martiño Ríos-García, Nawaf Alampara, Chandan Gupta, Indrajeet Mandal, Sajid Mannan, Ali Asghar Aghajani, N. M. Anoop Krishnan, Kevin Maik Jablonka
General AI
Large language model (LLM)-based systems are increasingly deployed to conduct scientific research autonomously, yet whether their reasoning adheres to the epistemic norms that make scientific inquiry self-correcting is poorly understood. Here, we evaluate LLM-based scientific agents across eight domains, spanning workf…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-04-20 · Guanting Dong, Junting Lu, Junjie Huang, Wanjun Zhong, Longxiang Liu, Shijue Huang, Zhenyu Li, Yang Zhao, Xiaoshuai Song, Xiaoxi Li, Jiajie Jin, Yutao Zhu, Hanbin Wang, Fangyu Lei, Qinyu Luo, Mingyang Chen, Zehui Chen, Jiazhan Feng, Ji-Rong Wen, Zhicheng Dou
General AI
Large language models are increasingly expected to serve as general-purpose agents that interact with external, stateful tool environments. The Model Context Protocol (MCP) and broader agent skills offer a unified interface for connecting agents with scalable real-world services, but training robust agents remains limi…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-04-21 · Venus Team, Sunhao Dai, Yong Deng, Jinzhen Lin, Yusheng Song, Guoqing Wang, Xiaofeng Wu, Yuqi Zhou, Shuo Yang, Zhenzhe Ying, Zhanwei Zhang, Changhua Meng, Weiqiang Wang
General AI
Edge-scale deep research agents based on small language models are attractive for real-world deployment due to their advantages in cost, latency, and privacy. In this work, we study how to train a strong small deep research agent under limited open-data by improving both data quality and data utilization. We present DR…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-04-25 · Víctor Gallego
General AI
Can large language model agents discover hidden safety objectives through experience alone? We introduce EPO-Safe (Experiential Prompt Optimization for Safe Agents), a framework where an LLM iteratively generates action plans, receives sparse binary danger warnings, and evolves a natural language behavioral specificati…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.5
2026-05-02 · Maniru Ibrahim
Research Track A
Differentiable physical networks provide a simple setting in which learning can be studied through the interaction between trainable parameters and physical equilibrium constraints. We investigate sequential learning in differentiable resistor networks governed by Kirchhoff's laws. Although individual input--output map…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.5
2026-05-12 · Xuhao Hu, Xi Zhang, Haiyang Xu, Kyle Qiao, Jingyi Yang, Xuanjing Huang, Jing Shao, Ming Yan, Jieping Ye
Research Track B · General AI
Computer Use Agents (CUAs) can act through both atomic GUI actions, such as click and type, and high-level tool calls, such as API-based file operations, but this hybrid action space often leaves them uncertain about when to continue with GUI actions or switch to tools, leading to suboptimal execution paths. This diffi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.5
2026-05-12 · Phu-Quy Nguyen-Lam, Phu-Hoa Pham, Dao Sy Duy Minh, Chi-Nguyen Tran, Huynh Trung Kiet, Long Tran-Thanh
Research Track A · General AI
Object-centric representations promise a key property for few-shot learning: Rather than treating a scene as a single unit, a model can decompose it into individual object-level parts that can be matched and compared across different concepts. In practice, this potential is rarely realized. Continual learners either co…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-05-31 · Qi Hu, Yifeng Tang, Qinghua Wang, Lanyang Zhao, Pengji Zhang, Yuhao Qing, Xin Yao, Dong Huang, Lin Zhang, Zhuoran Ji
General AI
Large language models are increasingly deployed as coding agents, shifting safety from individual responses to action sequences. Existing benchmarks, however, primarily assess whether models refuse unsafe prompts, leaving impacts on stateful workspaces largely unexamined. We present SABER, a benchmark for environment-a…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-06-07 · Jiahao Wang, An Ping, Yanghai Wang, Yuanxing Zhang, Shihao Li, Hanyan Bian, Yichi Ren, Yize Zhang, Han Wang, Haowen Chen, Junze Li, Jiaqi Wang, Yiyang Hu, Zhuze Xu, Zijie Zhang, Jiaheng Liu
General AI
While Omni-modal Large Language Models (OLLMs) have demonstrated impressive capabilities in jointly processing audio and visual streams, their ability to strictly adhere to complex, multi-faceted user instructions remains largely unexplored. Existing benchmarks primarily focus on holistic video understanding or text-on…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-06-13 · Shubhang Bhatnagar, Dheeraj Baiju, Narendra Ahuja
General AI
Vision encoders for retrieval are typically trained with class-label supervision: each training pair reduces to a scalar that uniformly pushes the embedding apart or pulls it together, as if every visual attribute either differed or matched. A multimodal large language model (MLLM), shown the same pair, can articulate …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-06-16 · Peixian Zhou, Yuxu Chen, Chaorui Zhang, Wei Han, Bo Bai, Xueyan Niu
General AI
Large language models perform increasingly well on standardized logical reasoning benchmarks, but whether this ability remains robust beyond English is unclear. We introduce ChLogic, an English--Chinese aligned benchmark that tests whether models preserve logical reasoning performance when the same latent logical struc…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-06-16 · Tongxu Luo, Rongsheng Wang, Jiaxi Bi, Chenming Xu, Zhengyang Tang, Jianlong Chen, Juhao Liang, Ke Ji, Shuqi Guo, Yuhao Du, Fan Bu, Wenyu Du, Xiaotong Zhang, Kyle Li, Shaobo Wang, Linfeng Zhang, Yuxuan Liu, Xin Lai, Chenxin Li, Yiduo Guo, Zhexin Zhang, Xinyuan Wang, Tianyi Bai, Ziniu Li, Benyou Wang
General AI
Game generation is an emerging application of coding agents, requiring models to transform natural-language specifications into playable interactive systems. Unlike traditional coding tasks, game generation takes place within a game engine, where scripts, scenes, assets, rendering, and runtime interactions must jointly…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.5
2026-06-16 · Yuhang Huang, Xuan Lv, Junyan Xu, Zhiyuan Yu, Jiazhao Zhang, Ruizhen Hu, Wancheng Feng, Shilong Zou, Hewen Xiao, Ziqiao Zhou, Kaiyun Huang, Zhiyu Peng, Juzhan Xu, Hang Zhao, Chenyang Zhu, Renjiao Yi, Yifei Huang, Douhui Wu, Yan Zhang, Kexu Cheng, Chunhe Song, Yunzhi Xue, Xiuhong Zhang, Leitao Guo, Yunji Chen, Bin Wu, Haibin Yu, Kai Xu
General AI
World foundation models (WFMs) are powerful simulators, yet they predominantly operate in a single-view setting and lack the multi-view 3D consistency required for robotic manipulation. While robotic systems rely on multiple cameras (egocentric, eye-to-hand, and wrist-mounted) for policy learning, current multi-view wo…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.5
2026-06-16 · Aagam Sogani, Botao Rui, Swetha Vaidyanathan, Rishi Agarwal, Minghao Yan, Shivaram Venkataraman
Research Track B · General AI
Long-horizon web agents often fail in ways hidden by final-answer evaluation: they may visit useful pages, produce a well-formed answer, and terminate confidently while still missing fields, over-including unsupported items, or relying on stale evidence. We study these failures with Parallel WebBench, a parallel web-ex…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.4
2026-06-22 · Yuting Li, Weihang Fang, Haoyuan Gao, Linghe Kong, Yexin Li, Lichao Sun, Weiran Huang
Research Track A · General AI
The rapid deployment of Vision-Language Models (VLMs) in dynamic environments necessitates the ability to learn continuously without forgetting. However, traditional continual learning (CL) settings often rely on white-box paradigms, which is increasingly invalidated by the shift toward cloud-hosted models. In this pap…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.4
2026-06-23 · Yixuan Tang, Yi Yang
General AI
Dense retrieval embedding models are a fundamental component of modern retrieval-based AI systems. Most dense retrievers are trained with contrastive objectives, which require labeled positive and negative document pairs that are often costly and difficult to obtain. In this work, we investigate whether the autoregress…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.4
2026-06-23 · Fengfeng Liang, Yuechen Zhang, Jiaya Jia
General AI
Existing low-bit KV-cache quantizers often treat each cached key as a flat vector. Under RoPE, however, a key's contribution to a future attention logit decomposes into a position-dependent sum over two-dimensional frequency blocks. This makes key-cache quantization a block-wise bit-allocation problem: high-energy RoPE…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.4
2026-06-24 · Shen Nie, Qiyang Min, Shaoxuan Xu, Zihao Huang, Yuxuan Song, Yong Shan, Yankai Lin, Wayne Xin Zhao, Chongxuan Li, Ji-Rong Wen
General AI
Modern large language models are predominantly trained with autoregressive factorization and causal attention. We present iLLaDA, an 8B masked diffusion language model trained from scratch with fully bidirectional attention. iLLaDA keeps the masked diffusion objective throughout pre-training and supervised fine-tuning …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-03-30 · Seyed Parsa Neshaei, Richard Lee Davis, Tanja Käser
General AI
Reflective writing is known to support the development of students' metacognitive skills, yet learners often struggle to engage in deep reflection, limiting learning gains. Although large language models (LLMs) have been shown to improve writing skills, their use as conversational agents for reflective writing has prod…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-03-30 · Mih Dinh, SouYoung Jin
General AI
Large-scale image datasets frequently contain identifiable or sensitive content, raising privacy risks when training models that may memorize and leak such information. We present Unsafe2Safe, a fully automated pipeline that detects privacy-prone images and rewrites only their sensitive regions using multimodally guide…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-03-31 · Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa, Kai Greshake, Hanshen Xiao, Chaowei Xiao, G. Edward Suh
General AI
AI agents, predominantly powered by large language models (LLMs), are vulnerable to indirect prompt injection, in which malicious instructions embedded in untrusted data can trigger dangerous agent actions. This position paper discusses our vision for system-level defenses against indirect prompt injection attacks. We …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-02 · Alexander Pondaven, Ziyi Wu, Igor Gilitschenski, Philip Torr, Sergey Tulyakov, Fabio Pizzati, Aliaksandr Siarohin
General AI
Recent advances in video diffusion have enabled the development of "world models" capable of simulating interactive environments. However, these models are largely restricted to single-agent settings, failing to control multiple agents simultaneously in a scene. In this work, we tackle a fundamental issue of action bin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-06 · Xiangzhao Hao, Zefeng Zhang, Zhenyu Zhang, Linhao Yu, Yao Chen, Yiqian Zhang, Haiyun Guo, Shuohuan Wang, Yu Sun
General AI
Image degradation from blur, noise, compression, and poor illumination severely undermines multimodal understanding in real-world settings. Unified multimodal models that combine understanding and generation within a single architecture are a natural fit for this challenge, as their generative pathway can model the fin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-06 · Yuhang Zhou, Lizhu Zhang, Yifan Wu, Jiayi Liu, Xiangjun Fan, Zhuokai Zhao, Hong Yan
General AI
As large language model agents advance beyond software engineering (SWE) tasks toward machine learning engineering (MLE), verifying agent behavior becomes orders of magnitude more expensive: while SWE tasks can be verified via fast-executing unit tests, MLE verification requires running full ML pipelines -- data prepro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-06 · Gabriel Sarch, Linrong Cai, Qunzhong Wang, Haoyang Wu, Danqi Chen, Zhuang Liu
General AI
What does it take to build a visual reasoner that works across charts, science, spatial understanding, and open-ended tasks? The strongest vision-language models (VLMs) show such broad visual reasoning is within reach, but the recipe behind them remains unclear, locked behind proprietary reinforcement learning (RL) pip…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-07 · Naen Xu, Jiayi Sheng, Changjiang Li, Chunyi Zhou, Yuyuan Li, Tianyu Du, Jun Wang, Zhihui Fu, Jinbao Li, Shouling Ji
General AI
Puns are a common form of rhetorical wordplay that exploits polysemy and phonetic similarity to create humor. In multimodal puns, visual and textual elements synergize to ground the literal sense and evoke the figurative meaning simultaneously. Although Vision-Language Models (VLMs) are widely used in multimodal unders…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-07 · Hongxu Zhou
General AI
Intrinsic self-correction in Large Language Models (LLMs) frequently fails in open-ended reasoning tasks due to ``hallucination snowballing,'' a phenomenon in which models recursively justify early errors during free-text reflection. While structured feedback can mitigate this issue, existing approaches often rely on e…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-09 · Addison J. Wu, Ryan Liu, Shuyue Stella Li, Yulia Tsvetkov, Thomas L. Griffiths
General AI
Today's large language models (LLMs) are trained to align with user preferences through methods such as reinforcement learning. Yet models are beginning to be deployed not merely to satisfy users, but also to generate revenue for the companies that created them through advertisements. This creates the potential for LLM…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-13 · Junlin Liu, Shengnan An, Shuang Zhou, Dan Ma, Shixiong Luo, Ying Xie, Yuan Zhang, Wenling Yuan, Yifan Zhou, Xiaoyu Li, Ziwen Wang, Xuezhi Cao, Xunliang Cai
General AI
Contemporary large language models (LLMs) have demonstrated remarkable reasoning capabilities, particularly in specialized domains like mathematics and physics. However, their ability to generalize these reasoning skills to more general and broader contexts--often termed general reasoning--remains under-explored. Unlik…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-14 · Ya-Qi Yu, Fangyu Hong, Xiangyang Qu, Hao Wang, Gaojie Wu, Qiaoyu Luo, Nuo Xu, Huixin Wang, Wuheng Xu, Yongxin Liao, Zihao Chen, Haonan Li, Ziming Li, Dezhi Peng, Minghui Liao, Jihao Wu, Haoyu Ren, Dandan Tu
General AI
The effectiveness of Direct Preference Optimization (DPO) depends on preference data that reflect the quality differences that matter in multimodal tasks. Existing pipelines often rely on off-policy perturbations or coarse outcome-based signals, which are not well suited to fine-grained visual reasoning. We propose rDP…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-15 · Zhuofeng Li, Yi Lu, Dongfu Jiang, Haoxiang Zhang, Yuyang Bai, Chuan Li, Yu Wang, Shuiwang Ji, Jianwen Xie, Yu Zhang
Research Track A · General AI
The rapid rise in AI conference submissions has driven increasing exploration of large language models (LLMs) for peer review support. However, LLM-based reviewers often generate superficial, formulaic comments lacking substantive, evidence-grounded feedback. We attribute this to the underutilization of two key compone…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-16 · Emanuel Tewolde, Xiao Zhang, David Guzman Piedrahita, Vincent Conitzer, Zhijing Jin
General AI
It is increasingly important that LLM agents interact effectively and safely with other goal-pursuing agents, yet, recent works report the opposite trend: LLMs with stronger reasoning capabilities behave _less_ cooperatively in mixed-motive games such as the prisoner's dilemma and public goods settings. Indeed, our exp…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-16 · Hatice Merve Vural, Doga Kukul, Ege Erdem Ozlu, Demir Ekin Arikan, Bob Mankoff, Erkut Erdem, Aykut Erdem
General AI
Humor is one of the few cognitive tasks where getting the reasoning right matters as much as getting the answer right. While recent work evaluates humor understanding on benchmarks such as the New Yorker Cartoon Caption Contest (NYCC), it largely treats it as black-box prediction, overlooking the structured reasoning p…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-17 · Yige Xu, Yongjie Wang, Zizhuo Wu, Kaisong Song, Jun Lin, Zhiqi Shen
General AI
Reasoning in vision-language models (VLMs) has recently attracted significant attention due to its broad applicability across diverse downstream tasks. However, it remains unclear whether the superior performance of VLMs stems from genuine vision-grounded reasoning or relies predominantly on the reasoning capabilities …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-17 · Van-Truong Le
General AI
The complexity of Vietnam's legal texts presents a significant barrier to public access to justice. While Large Language Models offer a promising solution for legal text simplification, evaluating their true capabilities requires a multifaceted approach that goes beyond surface-level metrics. This paper introduces a co…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-17 · Seulgi Kim, Mohit Prabhushankar, Ghassan AlRegib
General AI
Vision Language models (VLMs) have demonstrated strong performance across a wide range of benchmarks, yet they often suffer from modality dominance, where predictions rely disproportionately on a single modality. Prior approaches primarily address this issue by steering model's attention allocation, implicitly assuming…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-17 · Xu Huang, Weixin Mao, Yinhao Li, Hua Chen, Jiabao Zhao
General AI
Vision-Language-Action (VLA) models have demonstrated significant potential for embodied decision-making; however, their application in complex chemical laboratory automation remains restricted by limited long-horizon reasoning and the absence of persistent experience accumulation. Existing frameworks typically treat p…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-20 · Xinyu Ma, Mingzhou Xu, Xuebo Liu, Chang Jin, Qiang Wang, Derek F. Wong, Min Zhang
General AI
Recent advancements in Reinforcement Learning with Verifiable Rewards (RLVR) have significantly improved Large Language Model (LLM) reasoning, yet models often struggle to explore novel trajectories beyond their initial latent space. While offline teacher guidance and entropy-driven strategies have been proposed to add…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-21 · Yi Zhong, Buqiang Xu, Yijun Wang, Zifei Shan, Shuofei Qiao, Guozhou Zheng, Ningyu Zhang
General AI
At present, executable visual workflows have emerged as a mainstream paradigm in real-world industrial deployments, offering strong reliability and controllability. However, in current practice, such workflows are almost entirely constructed through manual engineering: developers must carefully design workflows, write …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-22 · Joyjit Roy, Samaresh Kumar Singh
General AI
Security Operations Centers (SOCs) increasingly encounter difficulties in correlating heterogeneous alerts, interpreting multi-stage attack progressions, and selecting safe and effective response actions. This study introduces AgentSOC, a multi-layered agentic AI framework that enhances SOC automation by integrating pe…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-27 · Sercan Karakaş, Yusuf Şimşek
General AI
This paper investigates whether source trustworthiness shapes Turkish evidential morphology and whether large language models (LLMs) track this sensitivity. We study the past-domain contrast between -DI and -mIs in controlled cloze contexts where the information source is overtly external, while only its perceived reli…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-27 · Aaron J. Li, Nicolas Sanchez, Hao Huang, Ruijiang Dong, Jaskaran Bains, Katrin Jaradeh, Zhen Xiang, Bo Li, Feng Liu, Aaron Kornblith, Bin Yu
General AI
Large language models (LLMs) are increasingly deployed, yet their outputs can be highly sensitive to routine, non-adversarial variation in how users phrase queries, a gap not well addressed by existing red-teaming efforts. We propose Green Shielding, a user-centric agenda for building evidence-backed deployment guidanc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-27 · Yunze Xiao, Vivienne J. Zhang, Chenghao Yang, Ningshan Ma, Weihao Xuan, Jen-tse Huang
General AI
Applications based on large language models (LLMs), such as multi-agent simulations, require population diversity among agents. We identify a pervasive failure mode we term \emph{Persona Collapse}: agents each assigned a distinct profile nonetheless converge into a narrow behavioral mode, producing a homogeneous simula…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-28 · Qianqian Chen, Anglin Liu, Jingyang Zhang, Yudong Zhang
Research Track A · General AI
Accurate brain lesion segmentation in MRI is vital for effective clinical diagnosis and treatment planning. Due to high annotation costs and strict data privacy regulations, universal models require employing Continual Learning (CL) to adapt to evolving clinical tasks without losing previously acquired knowledge. Howev…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-29 · Fei Bai, Huatong Song, Shuang Sun, Daixuan Cheng, Yike Yang, Chuan Hao, Renyuan Li, Feng Chang, Yuan Wei, Ran Tao, Bryan Dai, Jian Yang, Wayne Xin Zhao
General AI
Claw-style environments support multi-step workflows over local files, tools, and persistent workspace states. However, scalable development around these environments remains constrained by the absence of a systematic framework, especially one for synthesizing verifiable training data and integrating it with agent trai…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-29 · Gongbo Zhang, Wen Wang, Ye Tian, Li Yuan
General AI
Diffusion large language models (dLLMs) offer parallel decoding and bidirectional context, but state-of-the-art dLLMs require billions of parameters for competitive performance. While existing distillation methods for dLLMs reduce inference steps within a single architecture, none address cross-architecture knowledge t…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-04-30 · Neemias B da Silva, Rodrigo Minetto, Daniel Silver, Thiago H Silva
General AI
Large Language Models (LLMs) are increasingly used as proxies for human perception in urban analysis, yet it remains unclear whether persona prompting produces meaningful and reproducible behavioral diversity. We investigate whether distinct personas influence urban sentiment judgments generated by multimodal LLMs. Usi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-05-01 · Ziyang Huang, Yi Cao, Ali K. Shargh, Jing Luo, Ruidong Mei, Mohd Zaki, Zhan Liu, Wyatt Bunstine, William Jurayj, Somdatta Goswami, Tyrel McQueen, Michael Shields, Jaafar El-Awady, Paulette Clancy, Benjamin Van Durme, Nicholas Andrews, William Walden, Daniel Khashabi
General AI
Large language models are increasingly deployed as autonomous coding agents and have achieved remarkably strong performance on software engineering benchmarks. However, it is unclear whether such success transfers to computational scientific workflows, where tasks require not only strong coding ability, but also the ab…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-05-01 · Xihao Chen, Yangyang Guo, Roger Zimmermann
General AI
Key-Value (KV) cache has become a de facto component of modern Large Vision-Language Models (LVLMs) for inference. While it enhances decoding efficiency in Large Language Models (LLMs), its direct adoption in LVLMs introduces substantial GPU memory overhead due to the large number of vision tokens processed during the …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-02 · Devleena Das, Rajeev Patwari, Elliott Delaye, Ashish Sirasao
General AI
Aggressive weight quantization to 2-bit precision offers substantial throughput and memory gains for large language model (LLM) inference, but typically incurs severe accuracy degradation. These gains are particularly relevant for edge and on-device deployment, where memory capacity and bandwidth are primary constraint…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-08 · Radeen Mostafa, Sawradip Saha
Research Track B · General AI
We present SUPERBROWSER, an autonomous web-navigation agent designed against a single guiding hypothesis: a web agent should browse the way a person browses. A human reading a page does not retain every pixel they have seen; they look at a few candidate targets, decide on one, and remember only what is needed to keep t…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-08 · Matthew Ho, Brian Liu, Jixuan Chen, Audrey Wang, Lianhui Qin
General AI
Advanced scientific simulators expose specialized input languages that turn simulation goals into executable configurations, but learning them can cost domain scientists hours to days. We study simulator setup as a problem of agent-tool interface grounding: what minimal simulator-specific adaptations are needed for an …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-08 · Lawrence Keunho Jang, Mareks Woodside, Geronimo Carom, Andrew Keunwoo Jang, Jing Yu Koh, Ruslan Salakhutdinov
Research Track B · General AI
A useful phone agent needs to be personally intelligent. It should reason over a user's identity, history, and preferences as they exist on the device, not just follow isolated instructions in an impersonal sandbox. Existing mobile agent benchmarks lack this kind of personalization. We introduce iOSWorld, the first int…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-09 · Suozhao Ji, Baodong Wu, Zehao Wang, Lei Xia, Qingping Li, Ruisong Wang, Wenbo Ding, Zhenhua Zhu, Boxun Li, Guohao Dai, Yu Wang
General AI
Long-term LLM agents need persistent memory that can track changing facts and provide relevant evidence across sessions. Existing memory systems often store observations as isolated records, summaries, or indexed fragments, which makes evidence aggregation, fact revision, and memory maintenance difficult. We propose In…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-09 · Wenhao Liu, Hao Shi, Yunhe Li, Weizhi Fei, Xiangyuan Wang, Mengzhe Ruan, Hanxu Hou, Peisong Wang, Linqi Song, Shuang Qiu
General AI
Long chain-of-thought (CoT) trajectories in large language model (LLM) reasoning cause severe inference bottlenecks due to rapid key-value (KV) cache growth. Current decoding-time compression methods mitigate this issue via token eviction, but typically assume a uniform budget distribution across all layers and heads. …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-10 · Michal Chudoba, Sergey Alyaev, Petra Galuscakova, Tomasz Wiktorski
General AI
There are two main Parameter-Efficient Fine-Tuning (PEFT) techniques for Large Language Models (LLMs). While Low-Rank Adaptation (LoRA) introduces additional weights between the LLM layers, Soft Prompting introduces additional fine-tuning-specific raw tokens to an LLM input. However, both require modification to the co…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-11 · Guojun Liao
General AI
Current discussions of AI in scientific discovery are often dominated by two visible capabilities: search over existing knowledge and execution through optimization, simulation, and automation. Both are important, but neither fully captures the central act of discovery: the formation and evolution of models. This paper…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-11 · Zilin Xiao, Qi Ma, Chun-cheng Jason Chen, Xintao Chen, Avinash Atreya, Hanjie Chen, Vicente Ordonez
General AI
Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is poorly suited for complex reasoning tasks: a semantically similar problem may demand an entirely different solution strategy, wh…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-11 · Elias Lumer, Sahil Sen, Kevin Paul, Vamse Kumar Subbiah
General AI
Recursive language models (RLMs) showed that recursion over model calls is an effective strategy for long-context reasoning, and production coding agents have begun to write code that spawns subagents at scale, most recently in Anthropic's dynamic workflows. We name and study the pattern between these two lines of work…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-11 · Zihao Wang, Yiming Li, Yutong Wu, Zheyu Liu, Kangjie Chen, Fok Kar Wai, Pin-Yu Chen, Vrizlynn L. L. Thing, Bo Li, Dacheng Tao, Tianwei Zhang
Research Track B · General AI
Web agents driven by large language models (LLMs) are increasingly deployed in real-world environments, where they operate over untrusted web content and execute actions with direct consequences. This makes them vulnerable to prompt-injection attacks, in which seemingly benign content embeds adversarial instructions th…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-13 · Jing Jin, Robert Chu, Ning Yan, Masood S. Mortazavi
General AI
Large language models (LLMs) have facilitated impressive progress in software engineering, code generation, tooling, and systems. Concurrently, a significant body of research has developed which explores a growing variety of methods and systems for applying LLMs to hardware and chip design (e.g., systems for RTL code g…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-15 · Dylan Banarse, Stephen Todd, William Latham, Frederic Fol Leymarie
General AI
This paper investigates the creative process of automated design and artistic evaluation using an evolutionary system. We consider how a multimodal artificial intelligence (AI) model can communicate and guide a combined generative and evolutionary computational system. This creates a framework for the evolution of aest…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-15 · Y. H. Zhou, Z. M. Ma, Y. J. Zhou, Y. T. Li, H. X. Xiang, Y. M. Cheng, T. L. Chen, K. J. Zhang, Z. H. Nan, J. H. Ni, Z. Wu, Q. Y. Pan, S. Zhang, S. Cheng, M. Y. Luo
Research Track B · General AI
SMS fraud is increasingly cross-channel: a message directs the user to a webpage, and the final risk depends on how the SMS claim aligns with the page content and requested user action. However, existing evaluations either focus on message-only smishing classification or expose URL and domain cues that allow models to …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-16 · Ahmed Ryan, Saad Sakib Noor, Md Erfan, Shaswata Mitra, Sudip Mittal, Md Rayhanur Rahman
General AI
Classifying Cyber Threat Intelligence (CTI) using MITRE Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK) is essential for proactive defense, but historically required extensive human effort. Pre-Large Language Model (LLM) automation sped up this process, but could not resolve the complex language and mult…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-16 · Nick Bettencourt, Xiaowei Ding, Kay Giesecke
General AI
As high-quality public web corpora become increasingly exhausted, clean long-context documents have become a scarce and expensive source of training data for large language models (LLMs). Existing long-context corpora are often proprietary and costly to acquire, synthetically generated, or concentrated in narrow domain…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-16 · Wujian Peng, Lingchen Meng, Yuxuan Cai, Xianwei Zhuang, Yuhuan Yang, Rongyao Fang, Chenfei Wu, Junyang Lin, Zuxuan Wu, Shuai Bai
General AI
Unified Multimodal Modeling aims to integrate visual understanding and generation within a single system. However, existing approaches typically rely on two disparate visual tokenizers, which splits the representation space and hinders truly unified modeling. We propose UniAR, a unified autoregressive framework where a…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-17 · Timothy Agboada, Shikha Chandel, Yadav Raj Ghimire, Leila Hashemi-Beni
General AI
Visual Question Answering (VQA) in the Remote Sensing (RS) domain presents unique challenges due to the high resolution, multi scale object distribution, and semantic complexity of aerial imagery. While general domain Foundation Models have achieved remarkable success, their direct application to RSVQA is hindered by m…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-17 · Yingshan Susan Wang, Cedegao E. Zhang, Linlu Qiu, Zexue He, Pengyuan Li, Alex Pentland, Roger P. Levy, Yoon Kim
General AI
Learning to simulate human users in interactive settings could advance the training of agent assistants, evaluation of personalization systems, research in the social sciences, and more. Existing approaches generally do so by training a large language model (LLM) to match a single ground truth response, either by maxim…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-17 · Siyi Gu, Jialin Chen, Sophia Zhou, Arman Cohan, Rex Ying
General AI
Post-training of reasoning language models is commonly driven by supervised distillation and reinforcement learning with verifiable rewards. Distillation often relies on chain-of-thought annotations that are expensive to obtain and may themselves be noisy, incomplete, or partially incorrect; even when the final solutio…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.2
2026-06-24 · Ilia Kulikov, Chenxi Whitehouse, Tianhao Wu, Yixin Nie, Swarnadeep Saha, Eryk Helenowski, Weizhe Yuan, Olga Golovneva, Jack Lanchantin, Yoram Bachrach, Jakob Foerster, Xian Li, Han Fang, Sainbayar Sukhbaatar, Jason Weston
General AI
We introduce Autodata, a general method that enables AI agents to act as data scientists who build high quality training and evaluation data. We show how to train (meta-optimize) such a data scientist agent, so that it learns to create even stronger data. We describe the overall formulation, and a specific practical im…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.0
2026-03-22 · Elif Ceren Gok Yildirim, Murat Onur Yildirim, Joaquin Vanschoren
Research Track A · General AI
The continual learning literature has rapidly shifted from traditional class incremental learning (CIL) techniques to foundation model (FM)-based CIL methods without a clear understanding of how these newer approaches compare to strong, lightweight convolutional baselines. This abrupt transition has created a substanti…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.0
2026-04-01 · Marwan Hassani, Tamara Verbeek, Sjoerd van Straten
Research Track A
Predictive process monitoring (PPM) focuses on predicting future process trajectories, including next activity predictions. This is crucial in dynamic environments where processes change or face uncertainty. However, current frameworks often assume a static environment, overlooking dynamic characteristics and concept d…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.0
2026-04-02 · Xinlei Yu, Zhangquan Chen, Yongbo He, Tianyu Fu, Cheng Yang, Chengming Xu, Yue Ma, Xiaobin Hu, Zhe Cao, Jie Xu, Guibin Zhang, Jiale Tao, Jiayi Zhang, Siyuan Ma, Kaituo Feng, Haojie Huang, Youxing Li, Ronghao Chen, Huacan Wang, Chenglin Wu, Zikun Su, Xiaogang Xu, Kelu Yao, Kun Wang, Chen Gao, Yue Liao, Ruqi Huang, Tao Jin, Cheng Tan, Jiangning Zhang, Wenqi Ren, Yanwei Fu, Yong Liu, Yu Wang, Xiangyu Yue, Yu-Gang Jiang, Shuicheng Yan
Research Track A · General AI
Latent space is rapidly emerging as a native substrate for language-based models. While modern systems are still commonly understood through explicit token-level generation, an increasing body of work shows that many critical internal processes are more naturally carried out in continuous latent space than in human-rea…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.0
2026-04-03 · Linyu Li, Zhi Jin, Yichi Zhang, Dongming Jin, Yuanpeng He, Haoran Duan, Gadeng Luosang, Nyima Tashi
Research Track A · General AI
Real-world multimodal knowledge graphs (MMKGs) are dynamic, with new entities, relations, and multimodal knowledge emerging over time. Existing continual knowledge graph reasoning (CKGR) methods focus on structural triples and cannot fully exploit multimodal signals from new entities. Existing multimodal knowledge grap…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.0
2026-04-13 · Wei Li, Hangjie Yuan, Zixiang Zhao, Borui Kang, Ziwei Liu, Tao Feng
Research Track A
Continual Learning (CL) aims to train neural networks on a dynamic stream of tasks without forgetting previously learned knowledge. Among optimization-based approaches, C-Flat has emerged as a promising solution due to its plug-and-play nature and its ability to encourage uniformly low-loss regions for both new and old…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.0
2026-04-15 · Qianyu Chen, Shujian Yu
Research Track A
Functional magnetic resonance imaging (fMRI) is widely used for studying and diagnosing brain disorders, with functional connectivity (FC) matrices providing powerful representations of large-scale neural interactions. However, existing diagnostic models are trained either on a single site or under full multi-site acce…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.0
2026-04-16 · Peifeng Zhang, Zice Qiu, Donghua Yu, Shilei Cao, Juepeng Zheng, Yutong Lu, Haohuan Fu
Research Track A · General AI
In continual visual question answering (VQA), existing Continual Learning (CL) methods are mostly built for symmetric, unimodal architectures. However, modern Vision-Language Models (VLMs) violate this assumption, as their trainable components are inherently asymmetric. This structural mismatch renders VLMs highly pron…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.0
2026-04-16 · Amirhosein Javadi, Tuomas Oikarinen, Tara Javidi, Tsui-Wei Weng
Research Track A · General AI
Catastrophic forgetting remains a fundamental challenge in continual learning, in which models often forget previous knowledge when fine-tuned on a new task. This issue is especially pronounced in class incremental learning (CIL), which is the most challenging setting in continual learning. Existing methods to address …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.0
2026-04-18 · Dongkyu Cho, Xiyue Li, Samrachana Adhikari, Rumi Chunara
Research Track A · General AI
Continual learning aims to update models under distribution shift without forgetting, yet many high-stakes deployments, such as healthcare, also require interpretability. In practice, models that adapt well (e.g., deep networks) are often opaque, while models that are interpretable (e.g., decision trees) are brittle un…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.0
2026-04-22 · Beining Wu, Jun Huang
Research Track A
Federated continual learning (FCL) allows distributed autonomous fleets to adapt collaboratively to evolving terrain types across extended mission lifecycles. However, current approaches face several key challenges: 1) they use uniform protection strategies that do not account for the varying sensitivities to forgettin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.0
2026-04-22 · Zeyu Shen, Peter Henderson
Research Track A · General AI
Mixture-of-Experts models, now popular for scaling capacity at fixed inference speed, switch experts at nearly every token. Once a model outgrows available GPU memory, this churn can render optimizations like offloading and pre-fetching ineffective. We make the case that the options framework in reinforcement learning …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.0
2026-04-23 · Paul-Tiberiu Iordache, Elena Burceanu
Research Track A · General AI
Continual learning (CL) studies how models acquire tasks sequentially while retaining previously learned knowledge. Despite substantial progress in benchmarking CL methods, comparative evaluations typically keep the fine-tuning regime fixed. In this paper, we argue that the fine-tuning regime, defined by the trainable …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.0
2026-04-25 · Yida Xue, Ningyu Zhang, Tingwei Wu, Zhe Ma, Daxiong Ji, Zhao Wang, Guozhou Zheng, Huajun Chen
General AI
The vast and underexplored ocean plays a critical role in regulating global climate and supporting marine biodiversity, yet artificial intelligence has so far delivered limited impact in this domain due to a fundamental data bottleneck. Specifically, ocean data are highly fragmented across disparate sources and inheren…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.0
2026-04-27 · Sivajeet Chand, Kevin Nguyen, Peter Kuntz, Alexander Pretschner
Research Track A · General AI
Large language models (LLMs) perform strongly on general-purpose code generation, yet their applicability to enterprise domain-specific languages (DSLs) remains underexplored, especially for repository-scale change generation spanning multiple files and folder structures from a single natural-language (NL) instruction.…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.0
2026-04-29 · Aditya A. Ramesh, Alex Lewandowski, Jürgen Schmidhuber
Research Track A · General AI
Continual learning agents with finite capacity must balance acquiring new knowledge with retaining the old. This requires controlled forgetting of knowledge that is no longer needed, freeing up capacity to learn. Weight decay, viewed as a mechanism for forgetting, can serve this role by gradually discarding information…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.0
2026-05-04 · Haixin Wang, Hejie Cui, Chenwei Zhang, Xin Liu, Shuowei Jin, Shijie Geng, Xinyang Zhang, Nasser Zalmout, Zhenyu Shi, Yizhou Sun
General AI
Recent progress in multi-turn reinforcement learning (RL) has significantly improved reasoning LLMs' performances on complex interactive tasks. Despite advances in stabilization techniques such as fine-grained credit assignment and trajectory filtering, instability remains pervasive and often leads to training collapse…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.0
2026-05-10 · Kun Xiang, Terry Jingchen Zhang, Zirong Liu, Bokai Zhou, Yueling Tang, Junjie Yu, Jiacong Lu, Shangrui Huang, Heng Li, Likui Zhang, Kunkun Liu, Changzheng Zhang, Yangle Fang, Boqiang Guo, Hui-Ling Zhen, Dandan Tu, Yinya Huang, Xiaodan Liang
General AI
We introduce SeePhys Pro, a fine-grained modality transfer benchmark that studies whether models preserve the same reasoning capability when critical information is progressively transferred from text to image. Unlike standard vision-essential benchmarks that evaluate a single input form, SeePhys Pro features four sema…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.0
2026-05-20 · Farima Fatahi Bayat, Moin Aminnaseri, Pouya Pezeshkpour, Estevam Hruschka
General AI
Large language models (LLMs) have become increasingly capable of following instructions and complex reasoning, making prompting a flexible interface for adapting models without parameter updates. Yet prompt design remains labor-intensive and highly sensitive to formatting, phrasing, and instruction order, motivating au…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.0
2026-05-20 · Shuofei Qiao, Yunxiang Wei, Jiazheng Fan, Bin Wu, Busheng Zhang, Mengru Wang, Yuqi Zhu, Ningyu Zhang, Keyan Ding, Qiang Zhang, Huajun Chen
General AI
The exponential growth of global academic output has confronted researchers and AI agents with an unprecedented ``information explosion,'' where fragmented and unstructured knowledge organization impedes deep interdisciplinary integration. Current academic retrieval tools predominantly rely on superficial keyword match…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.0
2026-06-28 · Alex Kwon
Research Track A · General AI
LLM agents carry conclusions across steps and sessions in compressed memory, and memory products (e.g., mem0, LangMem) rewrite conversation into stored "facts" that later steps trust. We show this rewriting manufactures confidence: across our constructed agent settings, a casual, hedged remark becomes a confident, date…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-05-04 · Hamza Ahmed Durrani, Rafay Suleman Durrani
General AI
The integration of Large Language Model (LLM) reasoning principles into classical robot path planning represents a rapidly emerging research direction. In this paper, we propose a Semantic Risk-Aware Heuristic (SRAH) planner that encodes LLM-inspired cost functions penalising geometrically cluttered or high-risk zones …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-05-04 · Purna Sai Garigipati, Onur Ayan, Kishor Chandra Joshi, Xueli An
General AI
Artificial Intelligence (AI) will play an essential role in 6G. It will fundamentally reshape the network architecture itself and drive major changes in the design of network entities, interfaces, and procedures. The adoption of agentic AI in next-generation networks is expected to enhance network intelligence and auto…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-05-07 · Hyeongwon Kang, Jeongseob Kim, Jinwoo Park, Pilsung Kang
General AI
Recent studies have explored large language models for time-series anomaly detection, yet existing approaches often rely on a single general-purpose model to directly infer anomaly indices or intervals, limiting controllability, interpretability, and reliability for complex anomaly patterns. We propose SAGE (Specialize…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-05-07 · Isaac David, Arthur Gervais
General AI
Security updates create a short but important window in which defenders and attackers can compare vulnerable and patched software. Yet in many operational settings, the most accessible artifacts are binary packages rather than source patches or advisory text. This paper asks whether a language-model agent, restricted t…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-05-07 · Zeyu Yang, Qi Ma, Jason Chen, Anshumali Shrivastava
General AI
Retrieval-augmented agents are increasingly the interface to large organizational knowledge bases, yet most still treat retrieval as a black box: they issue exploratory queries, inspect returned snippets, and iteratively reformulate until useful evidence emerges. This approach resembles how a newcomer searches an unfam…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-05-11 · Kefei Tian, Yuansheng Lian, Kai Yang, Xiangdong Chen, Shen Li
General AI
Safety-critical planning in complex environments, particularly at urban intersections, remains a fundamental challenge for autonomous driving. Existing methods, whether rule-based or data-driven, frequently struggle to capture complex scene semantics, infer potential risks, and make reliable decisions in rare, high-ris…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-05-12 · Haoyu Wang, Yuliang Song, Tao Li, Zhiwei Deng, Yaqing Wang, Deepak Ramachandran, Eldan Cohen, Dan Roth
General AI
Large Language Models (LLMs) struggle to solve complex combinatorial problems through direct reasoning, so recent neuro-symbolic systems increasingly use them to synthesize executable solvers. A central design question is how the LLM should represent the solver, and whether it should also attempt to optimize search. We…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-05-12 · Wufei Ma, Chloe Wang, Siyi Chen, Jiawei Peng, Patrick Li, Alan Yuille
General AI
While self-supervised pretraining has reduced vision systems' reliance on synthetic data, simulation remains an indispensable tool for closed-loop optimization and rigorous out-of-distribution (OOD) evaluation. However, modern simulation platforms often present steep technical barriers, requiring extensive expertise in…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-05-12 · Zhong Li, Zihan Guo, Xiaohan Lu, Juntao Wang, Jie Song, Chao Shen, Jiageng Wu, Mingyang Sun
Research Track A · General AI
Large language models (LLMs) are increasingly used to translate natural-language optimization problems into mathematical formulations and solver code, but matching the reference objective value is not a reliable test of correctness: an artifact may agree numerically while still changing the underlying optimization sema…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-05-19 · Han Li, Vibhor Malik, Zahra Zanjani Foumani, Alberto Castelo, Shuang Xie, Ailin Fan, Keat Yang Koay, Yuanzheng Zhu, Meysam Feghhi, Ronie Uliana, Zhaoyu Zhang, Angelo Ocana Martins, Mingyu Zhao, Francis Pelland, Jonathan Faerman, Nikolas LeBlanc, Aaron Glazer, Andrew McNamara, Zhong Wu, Lingyun Wang
Research Track B · General AI
A/B testing remains the gold standard for evaluating modifications to e-commerce storefronts, yet it diverts traffic, requires weeks to reach statistical significance, and risks degrading user experience. We present SimGym, a framework for simulating A/B tests on e-commerce storefronts using vision-language model (VLM)…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-05-22 · Joydeep Chandra
General AI
Temporal knowledge-graph data marketplaces face three coupled failures in static designs: stale hybrid index shortcuts reduce recall as edges evolve, stationary Shapley pricing misattributes value after distribution shifts, and uncoordinated agents over-consume a shared differential-privacy budget. We present CHRONOS, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-05-22 · Liupeng Li, Haoqian Kang, Zhenyu Lu, Jinpeng Wang, Bin Chen, Ke Chen, Yaowei Wang
General AI
High-resolution (HR) image perception presents a key bottleneck for multimodal large language models (MLLMs). While visual search offers a promising solution, existing methods struggle with the trade-off between coverage and efficiency. Visual expert-assisted search is efficient but prone to blind spots when proposals …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-05-22 · Michal Shlapentokh-Rothman, Prachi Garg, Yu-Xiong Wang, Derek Hoiem
General AI
Keyframe selection is a direct way to provide verifiable visual evidence for long-video question answering (QA). Queries differ in what they require, and finding the right frames depends on knowing what to look for. Existing keyframe selectors either score every frame against a single query, or decompose the query into…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-05-27 · Julia Hindel, Simon Bultmann, Houman Masnavi, Daniele Cattaneo, Abhinav Valada
Research Track A · General AI
Self-supervised online traversability estimation enables robots to continuously learn from unlabeled open-world experiences and adapt their navigation behavior toward safe and efficient trajectories. Existing approaches either rely on handcrafted proprioceptive traversability scores, limiting robot-agnosticism, or clus…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-05-28 · Tenghao Huang, Kung-Hsiang Huang, Prafulla Kumar Choubey, Yilun Zhou, Muhao Chen, Jonathan May, Chien-Sheng Wu
Research Track B · General AI
Web agents, which couple language models with browsing and tool-use capabilities, show promise as open web assistants. Yet progress is increasingly limited by the lack of scalable, process-level supervision. Existing benchmarks are largely manually constructed, providing only coarse start-goal annotations without inter…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-05-28 · Sy-Tuyen Ho, Minghui Liu, Huy Nghiem, Furong Huang
General AI
Autonomous AI research agents aim to accelerate scientific discovery by automating the research pipeline, from hypothesis generation to peer review. However, existing benchmarks rarely test a fundamental bottleneck: whether Large Language Models can judge the methodological viability of a research idea before expending…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-05-29 · Yilun Qiu, Xiaoyan Zhao, Yang Zhang, Yuxin Chen, Cilin Yan, Jiayin Cai, Xiaolong Jiang, Yao Hu, Yoko Yamakata, Tat-Seng Chua
General AI
As Large Language Models (LLMs) evolve from general-purpose assistants to user-centric agents, personalization has become central to aligning model behavior with individual preferences, making the evaluation of personalized alignment a critical bottleneck. Existing evaluation methods-ranging from automatic metrics to L…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-06-29 · Cheng Gong, Haoyang Wang, Chao Lu, Zirui Li, Jianwei Gong
Research Track A · General AI
Autonomous driving policies should be able to improve continually as deployment exposes them to increasingly diverse and long-tail traffic situations. However, most learning-based policies are trained or fine-tuned on expert demonstrations and then rely largely on generalization to handle challenging closed-loop scenar…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.8
2026-06-29 · Nico Daheim, Iryna Gurevych
General AI
With rapidly improving capabilities, Large Language Models (LLMs) are increasingly used in many complex real-world tasks. Beyond requiring in-depth knowledge and reasoning skills, many of these tasks exhibit a high degree of subjectivity and require that the outputs of the model can be trusted. While a lot of progress …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.6
2026-07-01 · Amirreza Rouhi, Rajat Aggarwal, Parikshit Sakurikar, Anoop M. Namboodiri, Sashi P. Reddi
General AI
Foundation video diffusion models are increasingly viewed as world simulators for embodied agents, yet their pretraining on internet-scale generic video leaves them poorly aligned with real-world deployment domains. We study parameter-efficient adaptation of a pretrained foundation video world model to retail scenes: w…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.6
2026-07-02 · Qiaowei Miao, Kehan Li, Yawei Luo, Yi Yang
General AI
Generative diffusion models excel at synthesizing high-quality images, videos, and 3D content under multimodal control. However, arbitrary user-defined modality-to-4D (X-to-4D) generation remains challenging due to the high cost of constructing diverse datasets and the limited scalability of existing methods. This pape…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.6
2026-07-02 · Jiale Amber Wang, Kaiyuan Wang, Pengyu Nie
General AI
Software tests and code evolve together: a code change should be followed by new or updated tests that record the new software behavior. Yet existing test generation and update benchmarks often isolate the test from the code change, and rely on static metadata that does not verify whether a test is executable or semant…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-03-20 · Chiyu Ma, Shuo Yang, Kexin Huang, Jinda Lu, Haoming Meng, Shangshang Wang, Bolin Ding, Soroush Vosoughi, Guoyin Wang, Jingren Zhou
General AI
We present Future-KL Influenced Policy Optimization (FIPO), a reinforcement learning algorithm designed to overcome reasoning bottlenecks in large language models. While GRPO style training scales effectively, it typically relies on outcome-based rewards (ORM) that distribute a global advantage uniformly across every t…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-03-30 · Zefeng He, Siyuan Huang, Xiaoye Qu, Yafu Li, Tong Zhu, Yu Cheng, Yang Yang
General AI
Recent multimodal generation models have achieved remarkable progress on general-purpose generation tasks, yet continue to struggle with complex instructions and specialized downstream tasks. Inspired by the success of advanced agent frameworks such as Claude Code, we propose GEMS (Agent-Native Multimodal GEneration wi…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-03-31 · Zhaohe Liao, Kaixun Jiang, Zhihang Liu, Yujie Wei, Junqiu Yu, Quanhao Li, Hong-Tao Yu, Pandeng Li, Yuzheng Wang, Zhen Xing, Shiwei Zhang, Chen-Wei Xie, Yun Zheng, Xihui Liu
General AI
Although image generation has boosted various applications via its rapid evolution, whether the state-of-the-art models are able to produce ready-to-use academic illustrations for papers is still largely unexplored. Directly comparing or evaluating the illustration with VLM is native but requires oracle multi-modal und…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.5
2026-03-31 · Xiaoyan Zhang, Jiangpeng He
Research Track A · General AI
Visual food recognition in real-world dietary logging scenarios naturally exhibits severe data imbalance, where a small number of food categories appear frequently while many others occur rarely, resulting in long-tailed class distributions. In practice, food recognition systems often operate in a continual learning se…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-02 · Ao Qu, Han Zheng, Zijian Zhou, Yihao Yan, Yihong Tang, Shao Yong Ong, Fenglu Hong, Kaichen Zhou, Chonghe Jiang, Minwei Kong, Jiacheng Zhu, Xuan Jiang, Sirui Li, Cathy Wu, Bryan Kian Hsiang Low, Jinhua Zhao, Paul Pu Liang
General AI
Large language model (LLM)-based evolution is a promising approach for open-ended discovery, where progress requires sustained search and knowledge accumulation. Existing methods still rely heavily on fixed heuristics and hard-coded exploration rules, which limit the autonomy of LLM agents. We present CORAL, the first …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-05 · Haonian Ji, Kaiwen Xiong, Siwei Han, Peng Xia, Shi Qiu, Yiyang Zhou, Jiaqi Liu, Jinlong Li, Bingzhou Li, Zeyu Zheng, Cihang Xie, Huaxiu Yao
General AI
AI agents deployed as persistent assistants must maintain correct beliefs as their information environment evolves. In practice, evidence is scattered across heterogeneous sources that often contradict one another, new information can invalidate earlier conclusions, and user preferences surface through corrections rath…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-06 · Chenxi Wang, Zhuoyun Yu, Xin Xie, Wuguannan Yao, Runnan Fang, Shuofei Qiao, Kexin Cao, Guozhou Zheng, Xiang Qi, Peng Zhang, Shumin Deng
General AI
Learning from experience is critical for building capable large language model (LLM) agents, yet prevailing self-evolving paradigms remain inefficient: agents learn in isolation, repeatedly rediscover similar behaviors from limited experience, resulting in redundant exploration and poor generalization. To address this …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-06 · Chaoyou Fu, Haozhi Yuan, Yuhao Dong, Yi-Fan Zhang, Yunhang Shen, Xiaoxing Hu, Xueying Li, Jinsen Su, Chengwu Long, Xiaoyao Xie, Yongkang Xie, Xiawu Zheng, Xue Yang, Haoyu Cao, Yunsheng Wu, Ziwei Liu, Xing Sun, Caifeng Shan, Ran He
General AI
With the rapid advancement of video understanding, existing benchmarks are becoming increasingly saturated, exposing a critical discrepancy between inflated leaderboard scores and real-world model capabilities. To address this widening gap, we introduce Video-MME-v2, a comprehensive benchmark designed to rigorously eva…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-07 · Weiyue Li, Ruizhi Qian, Yi Li, Yongce Li, Yunfan Long, Jiahui Cai, Yan Luo, Mengyu Wang
General AI
Large language models (LLMs) are widely explored for reasoning-intensive research tasks, yet resources for testing whether they can infer scientific conclusions from structured biomedical evidence remain limited. We introduce MedConclusion, a large-scale dataset of 5.7M PubMed structured abstracts for biomedical conclu…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-13 · Hanqi Xiao, Vaidehi Patil, Zaid Khan, Hyunji Lee, Elias Stengel-Eskin, Mohit Bansal
General AI
As large language models (LLMs) become the engine behind conversational systems, their ability to reason about the intentions and states of their dialogue partners (i.e., form and use a theory-of-mind, or ToM) becomes increasingly critical for safe interaction with potentially adversarial partners. We propose a novel p…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-16 · Yixuan Ding, Wei Huang, Ruijie Quan, Xiaojuan Qi, Yi Yang
General AI
Diffusion-based image editing has achieved strong visual fidelity under natural language instructions, yet most existing systems still operate at the level of surface instruction following, without reasoning about the implicit contextual constraints embedded in real user requests. This often leads to visually plausible…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-17 · Sai Srinivas Kancheti, Aditya Sanjiv Kanade, Vineeth N. Balasubramanian, Tanuja Ganu
General AI
Multimodal Reasoning Models (MRMs) leveraging Chain-of-Thought (CoT) based thinking have revolutionized mathematical and logical problem-solving. However, we show that this paradigm struggles with generalized spatial intelligence. We perform a comprehensive evaluation of seventeen models across thirteen spatial benchma…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-18 · Bo Li, Ningyuan Deng, Tianyu Dong, Shaobo Wang, Shaolin Zhu, Lijie Wen
General AI
Multimodal large language models (MLLMs) have shown impressive capabilities, yet they often struggle to effectively capture the fine-grained textual information within images crucial for accurate image translation. This often leads to a modality gap between visual text inputs and textual inputs/outputs for image transl…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-19 · Xinyu Zhu, Yuzhu Cai, Zexi Liu, Cheng Wang, Fengyang Li, Wenkai Jin, Wanxu Liu, Zehao Bing, Bingyang Zheng, Jingyi Chai, Shuo Tang, Rui Ye, Yuwen Du, Xianghe Pang, Yaxin Du, Tingjia Miao, Yuzhi Zhang, Ruoxue Liao, Zhaohan Ding, Linfeng Zhang, Yanfeng Wang, Weinan E, Siheng Chen
General AI
The convergence of large language models and agents is catalyzing a new era of scientific discovery: Agentic Science. While the scientific method is inherently iterative, existing agent frameworks are predominantly static, narrowly scoped, and lack the capacity to learn from trial and error. To bridge this gap, we pres…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-19 · Yueyang Ding, HaoPeng Zhang, Rui Dai, Yi Wang, Tianyu Zong, Kaikui Liu, Xiangxiang Chu
General AI
Comprehensive understanding of time series remains a significant challenge for Large Language Models (LLMs). Current research is hindered by fragmented task definitions and benchmarks with inherent ambiguities, precluding rigorous evaluation and the development of unified Time Series Reasoning Models(TSRMs). To bridge …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-20 · Yejin Yoon, Minseo Kim, Taeuk Kim
General AI
Users often omit essential details in their requests to LLM-based agents, resulting in under-specified inputs for tool use. This poses a fundamental challenge for tool-augmented agents, as API execution typically requires complete arguments, highlighting the need for personalized tool calling. To study this problem, we…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-20 · Sua Lee, Sanghee Park, Jinbae Im
General AI
Multimodal Large Language Models (MLLMs) have been increasingly used as automatic evaluators-a paradigm known as MLLM-as-a-Judge. However, their reliability and vulnerabilities to biases remain underexplored. We find that many MLLM judges fail to reliably integrate key visual or textual cues, yielding unreliable evalua…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-20 · Yilei Jiang, Jinyuan Hu, Qianyin Xiao, Yaozhi Zheng, Ruize Ma, Kaituo Feng, Jiaming Han, Tianshuo Peng, Kaixuan Fan, Manyuan Zhang, Xiangyu Yue
General AI
Game development sits at the intersection of creative design and intricate software engineering, demanding the joint orchestration of game engines, real-time loops, and tightly coupled state across many files. While Large Language Models (LLMs) and code agents now solve isolated programming tasks with ease, they consis…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-23 · Vipula Rawte, Ryan Rossi, Franck Dernoncourt, Nedim Lipka
General AI
Large Language Models (LLMs) have demonstrated remarkable fluency and versatility across a wide range of NLP tasks, yet they remain prone to factual inaccuracies and hallucinations. This limitation poses significant risks in high-stakes domains such as healthcare, law, and scientific communication, where trust and veri…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-26 · Fanqing Meng, Lingxiao Du, Zijian Wu, Guanzheng Chen, Xiangyan Liu, Jiaqi Liao, Chonghe Jiang, Zhenglin Wan, Jiawei Gu, Pengfei Zhou, Rui Huang, Ziqi Zhao, Shengyuan Ding, Ailing Yu, Bo Peng, Bowei Xia, Hao Sun, Haotian Liang, Ji Xie, Jiajun Chen, Jiajun Song, Liu Yang, Ming Xu, Qionglin Qiu, Runhao Fu, Shengfang Zhai, Shijian Wang, Tengfei Ma, Tianyi Wu, Weiyang Jin, Yan Wang, Yang Dai, Yao Lai, Youwei Shu, Yue Liu, Yunzhuo Hao, Yuwei Niu, Jinkai Huang, Jiayuan Zhuo, Zhennan Shen, Linyu Wu, Cihang Xie, Yuyin Zhou, Jiaheng Zhang, Zeyu Zheng, Mengkang Hu, Michael Qizhe Shieh
General AI
Language-model agents are increasingly used as persistent coworkers that assist users across multiple working days. During such workflows, the surrounding environment may change independently of the agent: new emails arrive, calendar entries shift, knowledge-base records are updated, and evidence appears across images,…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-27 · Yale Song, Yiwen Song, Nick Losier, Nathan Hodson, Ye Jin, Rhyard Zhu, Yan Xu, Daniel Vlasic, Carina Claassen, Jasmine Leon, Khanh G. LeViet, Zack Chomyn, Joe Timmons, Brett Slatkin, Scott Penberthy, Tomas Pfister
General AI
While diffusion models generate high-fidelity video clips, transforming them into coherent storytelling engines remains challenging. Current agentic pipelines automate this via chained modules but suffer from semantic drift and cascading failures due to independent, handcrafted prompting. We present Co-Director, a hier…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-27 · Qiliang Liang, Hansi Wang, Zhong Liang, Yang Liu
General AI
LLM agents increasingly rely on reusable skills, capability packages that combine instructions, control flow, constraints, and tool calls. In most current agent systems, however, skills are still represented by text-heavy artifacts, including SKILL.md-style documents and structured records whose machine-usable evidence…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-27 · Chenkai Pan, Xinglong Xu, Yuhang Xu, Yujun Wu, Siyuan Li, Jintao Chen, Conghui He, Jingxuan Wei, Cheng Tan
General AI
Reliably transferring specialized human knowledge from text into large language models remains a fundamental challenge in artificial intelligence. Fine-tuning on domain corpora has enabled substantial capability gains, but the process operates without feedback: when a model fails on a domain task, there is no method to…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-27 · Jiaqi Wang, Wenhao Zhang, Weijie Shi, Yaliang Li, James Cheng
General AI
On-policy distillation (OPD) has shown strong potential for transferring reasoning ability from frontier or domain-specific models to smaller students. While effective on static single-turn tasks, its behavior in multi-turn agent settings remains underexplored. In this work, we identify a key limitation of vanilla OPD …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-28 · Lei Xiong, Kun Luo, Ziyi Xia, Wenbo Zhang, Jin-Ge Yao, Zheng Liu, Jingying Shao, Jianlyu Chen, Hongjin Qian, Xi Yang, Qian Yu, Hao Li, Chen Yue, Xiaan Du, Yuyang Wang, Yesheng Liu, Haiyu Xu, Zhicheng Dou
General AI
Autonomous scientific research is significantly advanced thanks to the development of AI agents. One key step in this process is finding the right scientific literature, whether to explore existing knowledge for a research problem, or to acquire evidence for verifying assumptions and supporting claims. To assess AI age…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-04-29 · Xingwei Tan, Marco Valentino, Mahmud Elahi Akhter, Yuxiang Zhou, Maria Liakata, Nikolaos Aletras
General AI
Large Language Models (LLMs) are known to acquire reasoning capabilities through shared inference patterns in pre-training data, which are further elicited via Chain-of-Thought (CoT) practices. However, whether fundamental reasoning patterns, such as induction, deduction, and abduction, can be decoupled from specific p…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-06-02 · Tao Chen, Gangwei Jiang, Pengyu Cheng, Siyuan Huang, Yihao Liu, Jingwei Ni, Jiaqi Guo, Mengyu Zhou, Kai Tang, Junling Liu, Qinliang Su, Xiaoxi Jiang, Guanjun Jiang
General AI
Reward models (RMs) provide critical feedback signals for LLM post-training, notably in reinforced fine-tuning (RFT) and reinforcement learning (RL) pipelines. However, current reward evaluation relies on heterogeneous criteria such as rule-based verifiers, ground-truth references, procedural checklists, and complex ru…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-06-03 · XiuYu Zhang, Yi Shan, Junfeng Fang, Zhenkai Liang
General AI
Large language models are increasingly evaluated by other models, raising a natural question: can a model predict how a judge will score its own output? We find that the ability is largely present before any targeted training: prompted few-shot, a base model already predicts an external judge's multi-attribute quality …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-06-04 · Woojung Song, Nalim Kim, Sangjun Song, Chaewon Heo, Jongwon Lim, Yohan Jo
General AI
Role-playing language agents (RPLAs) should play characters whose values and behavior evolve as the story progresses, not maintain a fixed persona. Existing benchmarks measure factual recall at a given chapter, not whether responses align with the character's psychological trajectory, especially in scenarios the source…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-06-04 · Qi Xu, Yue Tan, Shihao Chen, Jiahao Meng, Anna Wang, Shunping Ji, Hao Fei, Jason Li
General AI
Temporal Grounding (TG) aims to localize video segments corresponding to a textual query. Prior research predominantly focuses on single-segment retrieval. Real-world scenarios, however, often require localizing multiple disjoint segments for a single query -- a setting we term One-to-Many Temporal Grounding (OMTG). Pr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.5
2026-06-10 · Megha Manoj, Sue Ann Campbell
Research Track A
Neural assemblies, transiently coordinated groups of neurons, observed in the hippocampus are thought to underlie the formation of episodic memories. Acetylcholine (ACh), a neuromodulator, that is received by the hippocampus, plays a critical role in memory and learning. A well supported hypothesis suggests that high l…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.5
2026-06-13 · Yuheng Lu, Qingcheng Zeng, Heli Qi, Puxuan Yu, Fuheng Zhao, Rui Yang, Hitomi Yanaka, Naoto Yokoya, Weihao Xuan
General AI
Deep research agents are increasingly evaluated on their ability to search for evidence, reason over retrieved sources, and produce grounded answers. Existing browsing benchmarks, however, largely assume that the user's query and the supporting evidence are written in the same language, leaving open whether agentic sea…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.5
2026-06-30 · Julien Lefebvre, Stefan Duffner, Mathieu Lefort
Research Track A · General AI
Online Continual Self-Supervised Learning (OCSSL) aims to learn representations from a continuous stream of unlabeled data, without knowledge of task boundaries and under memory constraints. Existing methods rely either on replay buffers that exploit latent space structure, or on regularization alone. We present CLIMB …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.4
2026-06-25 · Zhongxin Guo, Danrui Qi, Hanwen Gu, Peng Cheng, Yongqiang Xiong
Research Track B · General AI
Agents often repeatedly solve similar task instances from scratch, leading to unnecessary reasoning cost and long execution traces. Prior work has explored workflow reuse and executable skill induction, but it remains unclear which task scenarios admit procedural skills and how the shared procedural structure should be…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-03-20 · Xuanwang Zhang, Yuteng Han, Jinnan Qi, Mulong Xie, Zhen Wu, Xinyu Dai
Research Track B · General AI
Despite significant advances in autonomous web navigation, current methods remain far from human-level performance in complex web environments. We argue that this limitation stems from Topological Blindness, where agents are forced to explore via trial-and-error without access to the global topological structure of the…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-03-26 · Abhishek Bhandwaldar, Mihir Choudhury, Ruchir Puri, Akash Srivastava
General AI
We present an empirical study of how far general-purpose coding agents -- without hardware-specific training -- can optimize hardware designs from high-level algorithmic specifications. We introduce an agent factory, a two-stage pipeline that constructs and coordinates multiple autonomous optimization agents. In Stage~…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-03-26 · Yunus Talha Erzurumlu, Jiyong Kwag, Alper Yilmaz
General AI
Cross-view geo-localization (CVGL) estimates a camera's location by matching a street-view image to geo-referenced overhead imagery, enabling GPS-denied localization and navigation. Existing methods almost universally formulate CVGL as an image-retrieval problem in a contrastively trained embedding space. This ties per…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-03-27 · Shanglin Wu, Yuyang Luo, Yueqing Liang, Kaiwen Shi, Yanfang Ye, Ali Payani, Kai Shu
Research Track A · General AI
Large language model (LLM) multi-agent systems can scale along two distinct dimensions: by increasing the number of agents and by improving through accumulated experience over time. Although prior work has studied these dimensions separately, their interaction under realistic cost constraints remains unclear. In this p…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-03-30 · Iman Sharifi, Alex Zongo, Peng Wei
General AI
The growing deployment of small Unmanned Aerial Systems (sUASs) in low-altitude airspaces has increased the need for reliable tactical deconfliction under safety-critical constraints. Tactical deconfliction involves short-horizon decision-making in dense, partially observable, and heterogeneous multi-agent environments…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-03-31 · Iulian Lucău, Adelin-George Voicu
General AI
This paper evaluates whether commercial large language models (LLMs) can function as reliable political advisory tools by comparing their outputs against official legislative reasoning. Using a dataset of 15 Romanian Senate law proposals paired with their official explanatory memoranda (expuneri de motive), we test six…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-03-31 · Shi Li, Vinkle Srivastav, Nicolas Chanel, Saurav Sharma, Nabani Banik, Lorenzo Arboit, Kun Yuan, Pietro Mascagni, Nicolas Padoy
General AI
Surgical procedures are inherently complex and risky, requiring extensive expertise and constant focus to well navigate evolving intraoperative scenes. Computer-assisted systems such as surgical visual question answering (VQA) offer promises for education and intraoperative support. Current surgical VQA research largel…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-03-31 · Xue Jiang, Tianyu Zhang, Ge Li, Mengyang Liu, Taozhi Chen, Zhenhua Xu, Binhua Li, Wenpin Jiao, Zhi Jin, Yongbin Li, Yihong Dong
General AI
Recent advances in reasoning Large Language Models (LLMs) have primarily relied on upfront thinking, where reasoning occurs before final answer. However, this approach suffers from critical limitations in code generation, where upfront thinking is often insufficient as problems' full complexity only reveals itself duri…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-02 · Bangji Yang, Hongbo Ma, Jiajun Fan, Ge Liu
General AI
Large Language Models employing Chain-of-Thought reasoning achieve strong performance but suffer from excessive token consumption that inflates inference costs. Existing efficiency methods such as explicit length penalties, difficulty estimators, or multi-stage curricula either degrade reasoning quality or require comp…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-02 · Jona Ruthardt, Manu Gaur, Deva Ramanan, Makarand Tapaswi, Yuki M. Asano
General AI
Pretrained Vision Transformers (ViTs) such as DINOv2 and MAE provide generic image features that can be applied to a variety of downstream tasks such as retrieval, classification, and segmentation. However, such representations tend to focus on the most salient visual cues in the image, with no way to direct them towar…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-02 · Gengsheng Li, Tianyu Yang, Junfeng Fang, Mingyang Song, Mao Zheng, Haiyun Guo, Dan Zhang, Jinqiao Wang, Tat-Seng Chua
General AI
Reinforcement learning with verifiable rewards (RLVR) has become a standard paradigm for post-training large language models. While Group Relative Policy Optimization (GRPO) is widely adopted, its coarse credit assignment uniformly penalizes failed rollouts, lacking the token-level focus needed to efficiently address s…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-04 · Ying Yao
Research Track A · General AI
Unsustainable land-use practices in ecologically sensitive regions threaten biodiversity, water resources, and the livelihoods of millions. This paper presents a deep reinforcement learning (RL) framework for optimizing land-use allocation in the Lake Malawi Basin to maximize total ecosystem service value (ESV). Drawin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-06 · Hengrui Gu, Xiaotian Han, Yujing Bian, Kaixiong Zhou
General AI
Reinforcement learning with verifiable rewards (RLVR) has significantly advanced the reasoning capabilities of large language models (LLMs). However, it faces a fundamental limitation termed \textit{restricted exploration}, where the policy rapidly converges to a narrow set of solutions. While entropy regularization is…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-07 · Shao Wang, Rui Ren, Lin Gui
General AI
The serving paradigm of large language models (LLMs) is rapidly shifting towards complex multi-agent workflows where specialized agents collaborate over massive shared contexts. While Low-Rank Adaptation (LoRA) enables the efficient co-hosting of these specialized agents on a single base model, it introduces a critical…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-07 · Changgeon Ko, Jisu Shin, Hoyun Song, Huije Lee, Eui Jun Hwang, Jong C. Park
General AI
Large language model (LLM) agents are increasingly acting as human delegates in multi-agent environments, where a representative agent integrates diverse peer perspectives to make a final decision. Drawing inspiration from social psychology, we investigate how the reliability of this representative agent is undermined …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-07 · Jintao Sun, Hu Zhang, Donglin Di, Gangyi Ding, Zhedong Zheng
General AI
Vision-Language models (VLMs) have demonstrated remarkable capability in ground-view visual understanding but often fracture when deployed on high-altitude Unmanned Aerial Vehicles (UAVs). The failure largely stems from a pronounced domain shift, characterized by tiny and densely packed objects, repetitive textures, an…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-08 · Bingxuan Li, Simo Du, Yue Guo
Research Track A · General AI
Clinical expertise improves not only by acquiring medical knowledge, but by accumulating experience that yields reusable diagnostic patterns. Recent LLMs-based diagnostic agents have shown promising progress in clinical reasoning for decision support. However, most approaches treat cases independently, limiting experie…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-09 · Shiwan Zhao, Zhihu Wang, Xuyang Zhao, Jiaming Zhou, Caiyue Xu, Chenfei Liu, Liting Zhang, Yuhang Jia, Yanzhe Zhang, Hualong Yu, Zichen Xu, Qicheng Li, Yong Qin
Research Track A · General AI
Post-training has become central to turning pretrained large language models (LLMs) into aligned and deployable systems. Recent progress spans supervised fine-tuning (SFT), preference optimization, reinforcement learning (RL), process supervision, verifier-guided methods, distillation, and multi-stage pipelines. Yet th…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-09 · Haolei Xu, Haiwen Hong, Hongxing Li, Rui Zhou, Yang Zhang, Longtao Huang, Hui Xue, Yongliang Shen, Weiming Lu, Yueting Zhuang
General AI
Multimodal Mixture-of-Experts (MoE) models have achieved remarkable performance on vision-language tasks. However, we identify a puzzling phenomenon termed Seeing but Not Thinking: models accurately perceive image content yet fail in subsequent reasoning, while correctly solving identical problems presented as pure tex…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-09 · Xingyu Xia, Lekai Zhou, Yujie Tang, Xiaozhou Zhu, Hai Zhu, Wen Yao
General AI
Aerial vision-and-language navigation (Aerial VLN) aims to enable unmanned aerial vehicles (UAVs) to interpret natural language instructions and autonomously navigate complex three-dimensional environments by grounding language in visual perception. This survey provides a critical and analytical review of the Aerial VL…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-09 · Emmy Liu, Kaiser Sun, Millicent Li, Isabelle Lee, Lindia Tjuatja, Jen-tse Huang, Graham Neubig
General AI
Large language models (LLMs) can perform remarkably complex tasks, yet the fine-grained details of how these capabilities emerge during pretraining remain poorly understood. Scaling laws on validation loss tell us how much a model improves with additional compute, but not what skills it acquires in which order. To reme…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-13 · Yuqian Yuan, Wenqiao Zhang, Juekai Lin, Yu Zhong, Mingjian Gao, Binhe Yu, Yunqi Cao, Wentong Li, Yueting Zhuang, Beng Chin Ooi
General AI
Large Multimodal Models (LMMs) have achieved remarkable progress in general-purpose vision--language understanding, yet they remain limited in tasks requiring precise object-level grounding, fine-grained spatial reasoning, and controllable visual manipulation. In particular, existing systems often struggle to identify …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-14 · Jaywon Koo, Jefferson Hernandez, Ruozhen He, Hanjie Chen, Chen Wei, Vicente Ordonez
General AI
We introduce HypoExplore, an agentic framework that formulates neural architecture discovery for visual recognition as a hypothesis-driven scientific inquiry. Given a human-specified high-level research direction, HypoExplore ideates, implements, evaluates, and improves neural architectures through evolutionary branchi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-14 · Muhammad Kamran Janjua, Hugo Silva, Di Niu, Bahador Rashidi
General AI
Multimodal language models (MLLMs) are increasingly paired with vision tools (e.g., depth, flow, correspondence) to enhance visual reasoning. However, despite access to these tool-generated visual cues, MLLMs often fail to benefit from them. Existing approaches typically feed raw tool outputs into the model, but these …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-14 · Han Bao, Penghao Zhang, Yue Huang, Zhengqing Yuan, Yanchi Ru, Rui Su, Yujun Zhou, Xiangqi Wang, Kehan Guo, Nitesh V Chawla, Yanfang Ye, Xiangliang Zhang
General AI
Large Language Models (LLMs) are increasingly integrated into real-world decision-making, including in the domain of public policy. Yet, their ability to comprehend and reason about policy-related content remains underexplored. To fill this gap, we present \textbf{\textit{PolicyBench}}, the first large-scale cross-syst…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-14 · Himangi Mittal, Gaurav Mittal, Nelson Daniel Troncoso, Yu Hu
Research Track B · General AI
Computer Use Agents (CUAs) fundamentally rely on graphical user interface (GUI) grounding to translate language instructions into executable screen actions, but editing-level grounding in dense coding interfaces, where sub-pixel accuracy is required to interact with dense IDE elements, remains underexplored. Existing a…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-16 · Federico Pierucci, Matteo Prandi, Marcantonio Bracale Syrnikov, Marcello Galisai, Piercosma Bisconti
General AI
This paper advances a methodological proposal for safety research in agentic AI. As systems acquire planning, memory, tool use, persistent identity, and sustained interaction, safety can no longer be analysed primarily at the level of the isolated model. Population-level risks arise from structured interaction among ag…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-16 · Moin Aminnaseri, Farima Fatahi Bayat, Nikita Bhutani, Jean-Flavien Bussotti, Kevin Chan, Rafael Li Chen, Yanlin Feng, Jackson Hassell, Estevam Hruschka, Eser Kandogan, Hannah Kim, James Levine, Seiji Maekawa, Jalal Mahmud, Kushan Mitra, Naoki Otani, Pouya Pezeshkpour, Nima Shahbazi, Chen Shen, Dan Zhang
General AI
NL2SQL systems aim to address the growing need for natural language interaction with data. However, real-world information rarely maps to a single SQL query because (1) users express queries iteratively (2) questions often span multiple data sources beyond the closed-world assumption of a single database, and (3) queri…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-16 · XiangRui Zhang, Qiang Li, Haining Wang
General AI
Binary analysis increasingly relies on large language models (LLMs) to perform semantic reasoning over complex program behaviors. However, existing approaches largely adopt a one-pass execution paradigm, where reasoning operates over a fixed program representation constructed by static analysis tools. This formulation …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-16 · Alexey Khoroshilov, Alexey Chernysh, Orkhan Ekhtibarov, Nini Kamkia, Dmitry Zmitrovich
General AI
Large language models have demonstrated strong performance on general-purpose programming tasks, yet their ability to generate executable algorithmic trading strategies remains underexplored. Unlike standard code benchmarks, trading-strategy generation requires simultaneous mastery of domain-specific financial logic, k…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-17 · Vitor F. Grizzi, Thang Duc Pham, Luke N. Pretzie, Jiayi Xu, Murat Keceli, Cong Liu
General AI
Computational X-ray absorption near-edge structure (XANES) is widely used to probe local coordination environments, oxidation states, and electronic structure in chemically complex systems. However, the use of computational XANES at scale is constrained more by workflow complexity than by the underlying simulation meth…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-17 · Yunhe Li, Hao Shi, Bowen Deng, Wei Wang, Mengzhe Ruan, Hanxu Hou, Zhongxiang Dai, Siyang Gao, Chao Wang, Shuang Qiu, Linqi Song
General AI
Although most of the automated theorem-proving approaches depend on formal proof systems, informal theorem proving can align better with large language models' (LLMs) strength in natural language processing. In this work, we identify a primary bottleneck in informal theorem proving as a lack of insight, namely the diff…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-17 · Deshan Sumanathilaka, Nicholas Micallef, Julian Hough, Saman Jayasinghe
General AI
Recent advances in language models have substantially improved Natural Language Understanding (NLU). Although widely used benchmarks suggest that Large Language Models (LLMs) can effectively disambiguate, their practical applicability in real-world narrative contexts remains underexplored. SemEval-2026 Task 5 addresses…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-17 · Siddhant Bharadwaj, Ashish Vashist, Fahimul Aleem, Shruti Vyas
General AI
Image geolocalization has traditionally been addressed through retrieval-based place recognition or geometry-based visual localization pipelines. Recent advances in Vision-Language Models (VLMs) have demonstrated strong zero-shot reasoning capabilities across multimodal tasks, yet their performance in geographic infere…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-19 · Mohit Dubey
Research Track B · General AI
Multi-agent systems (MAS) powered by large language models suffer from severe token inefficiency arising from two compounding sources: (i) unstructured parallel execution, where all agents activate simultaneously irrespective of input readiness; and (ii) unrestricted context sharing, where every agent receives the full…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-20 · Ghazal Khalighinejad, Raghuveer Thirukovalluru, Alexander H. Oh, Bhuwan Dhingra
General AI
Many recent document embedding models are trained on document-as-image representations, embedding rendered pages as images rather than the underlying source. Meanwhile, existing benchmarks for scientific document retrieval, such as ArXivQA and ViDoRe, treat documents as images of pages, implicitly favoring such represe…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-20 · Liubomyr Horbatko
General AI
Modern sequence models are dominated by Transformers, where self-attention mixes information from the visible context in an input-dependent way. However, when retrieval is not sharp and attention remains diffuse over an effective support $S_{\mathrm{eff}}(t)$, the influence of any individual token is diluted, typically…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-22 · Marisa Hudspeth, Patrick J. Burns, Brendan O'Connor
General AI
We introduce a benchmark dataset for question answering and translation in bilingual Latin and English settings, containing about 7,800 question-answer pairs. The questions are drawn from Latin pedagogical sources, including exams, quizbowl-style trivia, and textbooks ranging from the 1800s to the present. After automa…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-22 · Fulong Fan, Peilin Liu, Fengzhe Liu, Shuyan Yang, Gang Yan
General AI
Large language models perform well on many reasoning tasks, yet they often lack awareness of whether their current knowledge or reasoning state is complete. In non-interactive puzzle settings, the narrative is fixed and the underlying structure is hidden; once a model forms an early hypothesis under incomplete premises…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-23 · Buqiang Xu, Yijun Chen, Jizhan Fang, Ruobin Zhong, Yunzhi Yao, Yuqi Zhu, Lun Du, Shumin Deng
Research Track A · General AI
Long-term conversational agents need memory systems that capture relationships between events, not merely isolated facts, to support temporal reasoning and multi-hop question answering. Current approaches face a fundamental trade-off: flat memory is efficient but fails to model relational structure, while graph-based m…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-23 · Maximilian Stralz, Meshal Alharbi, Yujun Huang, Gioele Zardini
General AI
Designing multi-agent robotic systems requires reasoning across tightly coupled decisions spanning heterogeneous domains, including robot design, fleet composition, and planning. Much effort has been devoted to isolated improvements in these domains, whereas system-level co-design considering trade-offs and task requir…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-24 · Erez Yosef, Oron Anschel, Shunit Haviv Hakimi, Asaf Gendler, Adam Botach, Nimrod Berman, Igor Kviatkovsky
General AI
Recent advancements in large language models have led to significant improvements across various tasks, including mathematical reasoning, which is used to assess models' intelligence in logical reasoning and problem-solving. Models are evaluated on mathematical reasoning benchmarks by verifying the correctness of the f…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-27 · Lirong Gao, Zeqing Wang, Yuyan Cai, Jiayi Deng, Yanmei Gu, Yiming Zhang, Jia Zhou, Yanfei Zhang, Junbo Zhao
General AI
While Large Language Models (LLMs) have increasingly assisted in historical tasks such as text processing, their capacity for professional-level historical reasoning remains underexplored. Existing benchmarks primarily assess basic knowledge breadth or lexical understanding, failing to capture the higher-order skills, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-28 · Xueying Zeng, Youquan Xian, Sihao Liu, Xudong Mou, Yanze Li, Lei Cui, Bo Li
Research Track A · General AI
With the rapid evolution of Android applications, traditional machine learning-based detection models suffer from concept drift. Additionally, they are constrained by shallow features, lacking deep semantic understanding and interpretability of decisions. Although Large Language Models (LLMs) demonstrate remarkable sem…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-28 · Mengyao Du, Han Fang, Haokai Ma, Jiahao Chen, Kai Xu, Quanjun Yin, Ee-Chien Chang
Research Track B · General AI
Web agents have emerged as an effective paradigm for automating interactions with complex web environments, yet remain vulnerable to prompt injection attacks that embed malicious instructions into webpage content to induce unintended actions. This threat is further amplified for screenshot-based web agents, which opera…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-29 · Bochao Liu, Zhipeng Qian, Yang Zhao, Xinyuan Jiang, Zihan Liang, Yufei Ma, Junpeng Zhuang, Ben Chen, Shuo Yang, Hongen Wan, Yao Wu, Chenyi Lei, Xiao Liang
General AI
Operating and maintaining (O&M) large-scale online engine systems (search, recommendation, advertising) demands substantial human effort for release monitoring, alert response, and root cause analysis. While LLM-based agents are a natural fit for these tasks, the deployment bottleneck is not reasoning capability but or…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-29 · Tianqi Gao, Chengkai Huang, Zihan Wang, Cao Liu, Ke Zeng, Lina Yao
General AI
Large language models (LLMs) have recently been adopted for recommendation by framing user preference modeling as a language generation problem. However, existing latent reasoning approaches typically represent user intent with a single latent vector, which struggles to capture the inherently multi-faceted nature of us…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-29 · Yuanze Hu, Gen Li, Yuqin Lan, Qingchen Yu, Zhichao Yang, Junwei Jing, Zhaoxin Fan, Xiaotie Deng
General AI
Multimodal large language models (MLLMs) have achieved impressive progress on general multimodal tasks, yet they remain brittle on dial-based measurement reading. In this paper, we study this problem through controlled benchmarks and feature-space probing, and show that current MLLMs not only achieve unsatisfactory acc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-30 · Xin Zhou, Dingkang Liang, Xiwu Chen, Feiyang Tan, Dingyuan Zhang, Hengshuang Zhao, Xiang Bai
General AI
Driving world models serve as a pivotal technology for autonomous driving by simulating environmental dynamics. However, existing approaches predominantly focus on future scene generation, often overlooking comprehensive 3D scene understanding. Conversely, while Large Language Models (LLMs) demonstrate impressive reaso…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-30 · Keming Wu, Zuhao Yang, Kaichen Zhang, Shizun Wang, Haowei Zhu, Sicong Leng, Zhongyu Yang, Qijie Wang, Sudong Wang, Ziting Wang, Zili Wang, Hui Zhang, Haonan Wang, Hang Zhou, Yifan Pu, Xingxuan Li, Fangneng Zhan, Bo Li, Lidong Bing, Yuxin Song, Ziwei Liu, Wenhu Chen, Jingdong Wang, Xinchao Wang, Xiaojuan Qi, Shijian Lu, Bin Wang
General AI
Recent visual generation models have made major progress in photorealism, typography, instruction following, and interactive editing, yet they still struggle with spatial reasoning, persistent state, long-horizon consistency, and causal understanding. We argue that the field should move beyond appearance synthesis towa…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-04-30 · Ivan Bercovich
General AI
Terminal-agent benchmarks have become a primary signal for measuring the coding and system-administration capabilities of large language models. As the market for evaluation environments grows, so does the pressure to ship tasks quickly, often without thorough adversarial review of the verification logic. This paper is…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-05-01 · Saeid Jamshidi, Foutse Khomh, Carol Fung, Kawser Wazed Nafi
General AI
The adoption of Internet of Things (IoT) systems at the network edge of smart architectures is increasing rapidly, intensifying the need for security mechanisms that are both adaptive and resource-efficient. In such environments, runtime defence mechanisms are no longer limited to detection alone but become a resource-…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-06-03 · Jingwen Chen, Wenkai Yang, Shengda Fan, Wenbo Nie, Chenxing Sun, Shaodong Zheng, Yangen Hu, Lu Pan, Ke Zeng, Yankai Lin
Research Track A · General AI
Experience internalization converts contextual experience from past interactions into reusable parametric capability, offering a promising path toward continual learning in large language models (LLMs). While prior work has predominantly focused on single-iteration transfer, we discover that under multi-iteration exper…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-06-04 · Guancheng Tu, Xiangjun Fu, Suhao Yu, Yao Tang, Haoqiang Kang, Lianhui Qin, Yizhe Zhang, Jiatao Gu
General AI
Large language models often improve reasoning by generating explicit chain-of-thought (CoT), demonstrating the importance of intermediate computation. However, textual CoT forces this computation through a discrete, serial, and communication-oriented token stream: each reasoning step must be verbalized before the model…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-06-08 · Bojie Rong, Zheyu Shen, Qiaoping Wang, Pengfei Kang, Yang Xu, Yawen Wei, Hanyu Wu, Zhi Zhao, Leihao Pei, Linquan Jiang
Research Track B · General AI
We present AliyunConsoleAgent, a web agent framework for automated documentation verification in real-world cloud consoles. Major cloud platforms encompass hundreds of products with rapid feature iteration, causing console UIs to frequently diverge from their corresponding documentation. Verifying that documented proce…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-06-08 · Mingqi Yuan, Xiaoquan Sun, Shihao Luo, Jiayu Chen
Research Track A · General AI
Online task-free continual learning (TFCL) requires intelligent agents to sequentially accumulate knowledge from an unbounded, non-stationary data stream under strict single-pass constraints and without any explicit task identifiers. Existing online TFCL paradigms primarily rely on parameter-efficient prompt tuning or …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-06-11 · Arnav Kumar Jain, Yilin Wu, Jesse Farebrother, Gokul Swamy, Andrea Bajcsy
General AI
The potential impacts of world models (WMs, i.e., learned simulators) on robotics are far-reaching -- policy evaluation, policy improvement, and test-time planning -- all with limited real-world interaction. To unlock these downstream capabilities, a WM needs to jointly satisfy three desiderata: $\textit{(i)}$ fidelity…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-06-11 · Charles Moslonka, Amaury de Vitry, Arthur Garnier, Hicham Randrianarivo, Emmanuel Malherbe
General AI
Finance reporting is a natural proving ground for large language models, and the very-long-context capabilities of recent models across all sizes make rigorous evaluation in this domain an increasingly pressing need. Yet most public financial resources reduce the task to plain-text SEC 10-K filings paired with a handfu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-06-11 · Zhao-Heng Yin, Guanya Shi, Pieter Abbeel, C. Karen Liu
General AI
Articulated tool manipulation remains a major challenge in dexterous robotics due to the need to coordinate internal degrees of freedom and contact-rich interactions. While prior work has largely focused on rigid objects, articulated tool use remains underexplored because of its physical complexity and the difficulty o…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-06-11 · Jiwen Liu, Shujuan Li, Zhixue Fang, Xiaohan Li, Yan Zhou, Zijie Meng, Zhimin Zhang, Yawen Luo, Guoxin Zhang, Yu-Shen Liu, Pengfei Wan
General AI
Cloning camera motion from reference videos is an important task in video generation, as videos provide intuitive and precise control. Existing methods either directly use parametric representations that fail to handle multi-shot generation or synthesize cross-paired data, which suffer from data scarcity, resulting in …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-06-15 · Amr Mohamed, Guokan Shang, Michalis Vazirgiannis
General AI
Diffusion large language models (dLLMs) offer a promising alternative to autoregressive decoding by iteratively refining masked sequences, enabling parallel token updates and bidirectional conditioning. Their practical efficiency, however, is limited by sampling procedures that execute a fixed number of reverse denoisi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-06-16 · Dongyue Lu, Rong Li, Ao Liang, Lingdong Kong, Wei Yin, Lai Xing Ng, Benoit R. Cottereau, Camille Simon Chane, Wei Tsang Ooi
General AI
Event cameras sense the world through asynchronous brightness changes with microsecond latency and high dynamic range, offering motion fidelity far beyond frame-based sensors and capturing temporal structure that conventional exposures often miss. These properties make events a powerful complement to RGB in autonomous …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-06-16 · Qi Chai, Wenhao Shen, Nanjie Yao, Yue Xia, Kaiyong Zhao, Jie Ma, Guosheng Lin, Hao Wang
General AI
Zero-Shot Object-Goal Navigation (ZS-OGN) requires embodied agents to explore and locate target objects without any prior training. To this end, recent methods leverage foundation models. But they typically rely on static priors and lack adaptation, which leads to repeated errors and costly trial and error. In this pap…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-06-16 · Ziqi Zhou, Yubo Ye, Sumeet Atul Vadhavka, Linwei Wang, Zhiqiang Tao
General AI
Building personalized cardiac electrophysiology (EP) digital twins requires identifying the appropriate model structure for each patient, not merely fitting parameters. Traditional methods rely on experts to manually prescribe hybrid physics-neural architectures, which requires deep domain expertise and does not transf…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-06-17 · Anoushka Vyas, Aarushi Dhanuka, Sina Khoshfetrat Pakazad, Henrik Ohlsson
General AI
Production data integration is bottlenecked by repeated, lossy handoffs between data owners, engineers, and analysts who must collaboratively discover, structure, and query enterprise data. We present Data Intelligence Agents (DIA), a system of three agents (Data Interpreter, Schema Creator, and Query Generator) that c…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-06-17 · Yuzhe Huang, Jiaping Wu, Jiaming Jiang, Hezhe Lin, Aikebaier Aierken, Yunlong Wang, Kun Cheng, Ziyuan Jiao, Yuanxin Zhong
General AI
Establishing a universal benchmark for tactile representation learning in robotic manipulation remains challenging due to the diversity of tactile sensor designs, data formats, and robot embodiments. Rather than seeking to establish such, we explore a scalable and promising direction for future development: egocentric …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-07-02 · Xue Qin, Simin Luan, Cong Yang, Zhijun Li
Research Track A · General AI
Long-running adaptive intelligent agents face a structural tension between knowledge consolidation and information integrity. Memory consolidation is conventionally treated as an agent-changing operation: a model is fine-tuned, a prompt rewritten, a policy distilled, or a reflection appended to the context that governs…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.2
2026-06-23 · Haorui Ji, Weizhe Liu, Hongdong Li, Hengkai Guo
General AI
Sparse voxel representation has emerged as a scalable foundation for image-to-3D Gaussian Splatting (3DGS) generation, yet current methods struggle to preserve high-frequency visual details of input images due to two structural bottlenecks. First, they adopt discriminative 2D features optimized for semantic abstraction…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.2
2026-06-23 · Lavinia Ghita, Dhruv Desai, Ioana Boier
General AI
Large Language Models (LLMs) achieve strong performance across a growing range of domains, yet their scale poses deployment challenges in applications where latency and cost constraints are critical. This paper derives empirical scaling laws for domain-specific LLM compression, quantifying how in-domain and general kno…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.2
2026-06-24 · Samuel Valland Lyngset, Tor Viljen Raanaas, Gard Sveipe, Eirik Møller Nilsen, Jim Torresen, Kai Olav Ellefsen, Tobias Lømo
General AI
When fine-tuning Large Language Models (LLMs), there has been success in minimizing both memory usage and computation with Parameter-Efficient Fine-Tuning (PEFT), like Low Rank Adaptation (LoRA). In this article, we have explored whether this approach is transferable to the world of robotics and Reinforcement Learning …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.2
2026-06-24 · Filippos Bellos, Andre S. Gala-Garza, Miaowei Wang, Alyssa M. Hardin, Ahmad M. Hider, Yayuan Li, Jing Bi, Susan Liang, Chenliang Xu, Donald S. Likosky, Jason J. Corso
General AI
We introduce SurgAtlas, the largest surgical video-language dataset to date, comprising 15,291 videos (2,391 hours) spanning 18 surgical specialties and over 5,000 procedure types, sourced entirely from publicly available YouTube content. SurgAtlas is also the first surgical video-language dataset to include open surge…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.0
2026-04-01 · Xiao Zhang, Juntao Lyu, Tianyu Hu, Qianchuan Zhao, Huimin Ma
Research Track A · General AI
Large Language Models (LLMs) generalize across tasks via reusable representations and flexible reasoning, yet remain brittle in real deployment under evolving tasks and continual distribution shift. A common approach is Test-Time Adaptation (TTA), existing ones of which updates models with hand-designed unsupervised ob…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.0
2026-04-01 · Zhanzhi Lou, Hui Chen, Yibo Li, Qian Wang, Bryan Hooi
Research Track B · General AI
Test-Time Learning (TTL) enables language agents to iteratively refine their performance through repeated interactions with the environment at inference time. At the core of TTL is an adaptation policy that updates the actor policy based on experience from previous episodes, thereby improving future behavior. Existing …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.0
2026-04-15 · Karthik Singaravadivelan, Anant Gupta, Zekun Wang, Christopher MacLellan, Christopher J. MacLellan
Research Track A
Topic modeling seeks to uncover latent semantic structure in text corpora with minimal supervision. Neural approaches achieve strong performance but require extensive tuning and struggle with lifelong learning due to catastrophic forgetting and fixed capacity, while classical probabilistic models lack flexibility and a…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.0
2026-04-17 · Guransh Singh
Research Track A
Adapting pre-trained vision-language models (VLMs) for robotic control requires injecting high-magnitude continuous gradients from a flow-matching action expert into a backbone trained exclusively with cross-entropy. This cross-modal gradient asymmetry - the spectral dimensionality mismatch between low-rank MSE regress…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.0
2026-04-22 · Sachin Kumar
Research Track B · General AI
Can small language models achieve strong tool-use performance without complex adaptation mechanisms? This paper investigates this question through Meta-Tool, a controlled empirical study comparing hypernetwork-based LoRA adaptation against carefully designed few-shot prompting. Using a Llama-3.2-3B-Instruct backbone, w…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.0
2026-04-23 · Yi-Ling Liu, Melvin Laux, Mariela De Lucas Alvarez, Frank Kirchner, Rebecca Adam
Research Track A · General AI
Autonomous underwater vehicles are required to perform multiple tasks adaptively and in an explainable manner under dynamic, uncertain conditions and limited sensing, challenges that classical controllers struggle to address. This demands robust, generalizable, and inherently interpretable control policies for reliable…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.0
2026-05-04 · Junjie Yu, Pengrui Lu, Weiye Si, Hongliang Lu, Jiabao Wu, Kaiwen Tao, Kun Wang, Lingyu Yang, Qiran Zhang, Xiuting Guo, Xuanyu Wang, Yang Wang, Yanjie Wang, Yi Yang, Zijian Hu, Ziyi Yang, Zonghan Zhou, Binghao Qiang, Borui Zhang, Chenning Li, Enchang Zhang, Feifan Chen, Feng Jian, Fengyin Sun, Hao Qiu, Hao Zheng, Haoran Zhu, Hongyu Liu, Jianbin Deng, Jiaxin Song, Jiaying Chi, Jiayou Shi, Jie Fang, Jinghui Zhong, Jingyu Zhou, Jinze Li, Junfeng Yi, Junyan Yu, Junzhi Xue, Ni Song, Pengyi Chen, Qi Chen, Quansheng Li, Rui Tao, Shenghai Gong, Shenhang Lu, Tianqi Shen, Tianxiang Zhu, Tiehan Kang, Tingyu Li, Wendi Wu, Xiao Shen, Xiao Zhou, Xiaotao Zhang, Xinrong Li, Xuankun Yang, Xun Zhang, Yan Li, Ye Lu, Yi Wang, Yibo Zhou, Yichi Zhang, Yihao Sun, Yijun Huang, Yixin Zhu, Yixuan Wu, Yuchen Sun, Yue Wu, Yuheng Sun, Yukun Li, Yutian Tu, Yuxuan Qin, Yuzhuo Wu, Zeyu Li, Zhengyu Lou, Zhenning Ran, Zizhu He, Pengfei Liu
General AI
Benchmarks within the OpenClaw ecosystem have thus far evaluated exclusively assistant-level tasks, leaving the academic-level capabilities of OpenClaw largely unexamined. We introduce AcademiClaw, a bilingual benchmark of 80 complex, long-horizon tasks sourced directly from university students' real academic workflows…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.0
2026-05-04 · Thanasis Pantsios, Dimitrios Karageorgiou, Christos Koutlis, George Karantaidis, Olga Papadopoulou, Symeon Papadopoulos
Research Track A · General AI
The rapid advancement of generative Artificial Intelligence (AI) has introduced significant challenges for reliable AI-generated image detection. Existing detectors often suffer from performance degradation under distribution shifts and when encountering newly emerging generative models. In this work, we propose a data…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.0
2026-05-14 · William Lugoloobi, Samuelle Marro, Jabez Magomere, Joss Wright, Chris Russell
Research Track B · General AI
As LLM-based agents increasingly browse the web on users' behalf, a natural question arises: can websites passively identify which underlying model powers an agent? Doing so would represent a significant security risk, enabling targeted attacks tailored to known model vulnerabilities. Across 14 frontier LLMs and four w…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.0
2026-05-20 · Aditya Chetan, Eric Cai, Peeyush Kushwaha, Bharath Raj Nagoor Kani, Utkarsh Mall, Qianqian Wang, Noah Snavely, Bharath Hariharan
General AI
The emergence of Large Vision-Language Models (LVLMs) has significantly advanced video understanding capabilities. However, existing benchmarks focus predominantly on coarse-grained tasks such as action segmentation, classification, captioning, and retrieval. Furthermore, these benchmarks often rely on entities that ca…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.0
2026-05-28 · Zhixin Cai, Jun Bai, Yang Liu, Jiaqi Li, Yichi Zhang, Taichuan Li, Zhuofan Chen, Zixia Jia, Zilong Zheng, Wenge Rong
General AI
Explaining why dense retrievers assign high relevance scores remains challenging because retrieval decisions are made through opaque high-dimensional embeddings. Existing explanations often focus on surface signals, such as lexical matches, token alignments, or post-hoc textual rationales, and thus provide limited insi…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 13.0
2026-06-27 · Han Luo, Bingbing Wen, Lucy Lu Wang
General AI
LLM agents are expected to act over multiple turns, using search, browsing interfaces, and terminal tools to complete user goals. Yet not every goal is well specified or achievable in the available environment. In such cases, a reliable agent should recognize that further interaction is unlikely to help and abstain fro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.0
2026-06-29 · Matan Schliserman, Gon Buzaglo, Itay Evron, Daniel Soudry
Research Track A
We characterize weakly regularized continual classification in homogeneous models as sequential projections onto task margin sets. This result generalizes prior analyses restricted to either stationary (single-task) deep models or continual linear models. We show that global convergence generally fails, even for simple…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-05-04 · Yuecai Zhu, Nikolaos Tsantalis, Peter C. Rigby
General AI
The promise of Large Language Models in automated software engineering is often measured by functional correctness, overlooking the critical issue of long term maintainability. This paper presents a systematic audit of technical debt in AI-generated software, revealing that AI does not eliminate flaws but rather introd…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-05-04 · Ruichao Liang, Jing Chen, Xianglong Li, Huangpeng Gu, Yebo Feng, Yue Xue, Cong Wu, Yang Liu
General AI
Smart contract vulnerabilities in Decentralized Finance caused over billions of dollars losses every year, yet the security community faces a critical bottleneck: identifying a vulnerability is not the same as proving it is exploitable. Manual PoC construction is prohibitively labor-intensive, leaving most disclosed vu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-05-07 · Weien Li, Rui Song, Zeyu Li, Haochen Liu, Gonghao Zhang, Difan Jiao, Zhenwei Tang, Bowei He, Haolun Wu, Xue Liu, Ye Yuan
General AI
Visual document retrieval has become essential for accessing information in visually rich documents. Existing approaches fall into two camps. Late-interaction retrievers achieve strong quality through fine-grained token-level matching but store hundreds of vectors per page, incurring large index footprints and high ser…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-05-07 · Xiangyuan Xue, Yifan Zhou, Zidong Wang, Shengji Tang, Philip Torr, Wanli Ouyang, Lei Bai, Zhenfei Yin
General AI
Large language models (LLMs) are increasingly used as interactive agents, but optimizing them for long-horizon decision making remains difficult because current methods are largely purely reactive, which weakens both exploration and credit assignment over extended trajectories. In this work, we present Strategic Trajec…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-05-07 · Yuhang Lai, Jiazhan Feng, Yee Whye Teh, Ning Miao
General AI
Large Language Models (LLMs) demonstrate strong capabilities for solving scientific and mathematical problems, yet they struggle to produce valid, challenging, and novel problems - an essential component for advancing LLM training and enabling autonomous scientific research. Existing problem generation approaches eithe…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-05-12 · Di Wu, Zixiang Ji, Asmi Kawatkar, Bryan Kwan, Jia-Chen Gu, Nanyun Peng, Kai-Wei Chang
Research Track B · General AI
Long-term memory is crucial for agents in specialized web environments, where success depends on recalling interface affordances, state dynamics, workflows, and recurring failure modes. However, existing memory benchmarks for agents mostly focus on user histories, short traces, or downstream task success, leaving open …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-05-12 · Haiwen Diao, Penghao Wu, Hanming Deng, Jiahao Wang, Shihao Bai, Silei Wu, Weichen Fan, Wenjie Ye, Wenwen Tong, Xiangyu Fan, Yan Li, Yubo Wang, Zhijie Cao, Zhiqian Lin, Zhitao Yang, Zhongang Cai, Yuwei Niu, Yue Zhu, Bo Liu, Chengguang Lv, Haojia Yu, Haozhe Xie, Hongli Wang, Jianan Fan, Jiaqi Li, Jiefan Lu, Jingcheng Ni, Junxiang Xu, Kaihuan Liang, Lianqiang Shi, Linjun Dai, Linyan Wang, Oscar Qian, Peng Gao, Pengfei Liu, Qingping Sun, Rui Shen, Ruisi Wang, Shengnan Ma, Shuang Yang, Siyi Xie, Siying Li, Tianbo Zhong, Xiangli Kong, Xuanke Shi, Yang Gao, Yongqiang Yao, Yves Wang, Zhengqi Bai, Zhengyu Lin, Zixin Yin, Wenxiu Sun, Ruihao Gong, Quan Wang, Lewei Lu, Lei Yang, Ziwei Liu, Dahua Lin
General AI
Recent large vision-language models (VLMs) remain fundamentally constrained by a persistent dichotomy: understanding and generation are treated as distinct problems, leading to fragmented architectures, cascaded pipelines, and misaligned representation spaces. We argue that this divide is not merely an engineering arti…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-05-20 · Chongrui Ye, Yuxiang Liu, Yu Wang, Haofei Yu, Yining Zhao, Ge Liu, Julian McAuley, Jiaxuan You
Research Track A · Research Track B · General AI
Language agents increasingly operate over streams of related tasks, yet existing memory systems struggle to convert accumulated experience into reusable knowledge. Retrieval-augmented and structured memory methods record per-session observations effectively, but often couple acquisition and consolidation into a single …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-05-21 · Shuo Yang, Jinda Lu, Kexin Huang, Chiyu Ma, Shaohang Wei, Yuyang Liu, Guoyin Wang, Jingren Zhou, Li Yuan
General AI
Reinforcement Learning with Verifiable Rewards (RLVR) has become a promising paradigm for scaling reasoning capabilities of Large Language Models (LLMs). However, the sparsity of binary verifier rewards often leads to low efficiency and optimization instability. To stabilize training, existing methods typically impose …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-05-28 · Junyan Ye, Jun He, Zilong Huang, Dongzhi Jiang, Xuan Yang, Rui Chen, Weijia Li
General AI
Image generation models have evolved from text-conditioned pixel synthesis toward multimodal agents endowed with visual comprehension and tool invocation capabilities. Yet, existing agents remain at the mercy of underlying black-box image models. Their workflow is trapped in a repetitive cycle of prompt rewriting for g…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-05-28 · Ziwen Xu, Haiwen Hong, Linsong Yu, Benglei Cui, Longtao Huang, Hui Xue, Ningyu Zhang
General AI
Large Language Models (LLMs) must continuously learn and update knowledge to remain effective in dynamic real-world environments. While Low-Rank Adaptation (LoRA) is widely used for such memory updates, existing studies mainly rely on qualitative downstream evaluations, leaving the quantitative capacity limits and unde…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-05-28 · Grant Hamblin, Kevin Song, Zhanda Zhu, Anand Jayarajan, Sihang Liu, Nandita Vijaykumar, Gennady Pekhimenko
General AI
Software engineering (SWE) agents are transitioning from code generation to full software development lifecycle automation. A critical phase in this lifecycle is specification design: transforming initial proposals into carefully considered requirements through expert review. Existing benchmarks such as SWE-Bench are i…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-05-28 · M. Ross Kunz, John Merickel, Keith Wilson
General AI
Numeric tabular datasets are the dominant data format in scientific practice, yet large language models lack native mechanisms for representing numeric datasets in a meaningful way across heterogeneous feature spaces. Existing approaches either target predictive modeling over individual datasets, which requires a share…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-05-28 · You-Zhe Xie, Yu-Hsuan Li, Jie-Ying Lee, Kaipeng Zhang, Yu-Lun Liu, Zhixiang Wang
General AI
As video diffusion models (VDMs) advance toward world models, a key question arises: do they truly understand causality, or merely overfit to statistical temporal patterns? Existing benchmarks mostly rely on synthetic data, limiting real-world generalization due to the sim-to-real gap. We present YoCausal, a two-level …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-05-29 · Navin Sriram Ravie, Andrew Jong, Krrish Jain, John Liu, Omar Alama, Bijo Sebastian, Sebastian Scherer
Research Track A · General AI
In robotics, dangers and adversity modes are often embodiment-specific and relative to each agent. A frontier of autonomous mobile robotics is to enable agents to operate effectively in the wild in unseen unstructured environments. A significant challenge in unseen unstructured environments is that it may not be possib…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-05-29 · Rosario Forte, Giuseppe Lando, Antonino Furnari
Research Track A · General AI
Continuous episodic memory is a core capability for autonomous agents operating in dynamic, real-world environments, yet current streaming video benchmarks provide limited tools for diagnosing what models remember and for how long. We introduce \egostream, a diagnostic benchmark for streaming episodic memory evaluation…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-05-29 · Zhikun Xu, Yu Feng, Jacob Dineen, Taiwei Shi, Jieyu Zhao, Ben Zhou
General AI
Large language model agents trained with reinforcement learning (RL) often learn brittle, task-specific shortcuts. We hypothesize that agents generalize better when their successful trajectories are structurally compressible, decomposed into a small set of reusable abstract patterns. To formalize this, we introduce Reu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-06-29 · Asif Shahriar, Hongyu Cai, Hadjer Benkraouda, Gang Wang, Z. Berkay Celik
General AI
Researchers and practitioners increasingly apply Large Language Models (LLMs) for automated vulnerability detection. Recent work has shown that LLMs are susceptible to the same cognitive heuristics that bias human judgment. Yet, no work has investigated whether these heuristics affect a model's assessment of code vulne…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.8
2026-07-02 · Rintaro Otsubo, Ryo Fujii, Reina Ishikawa, Taiki Kanaya, Kanta Sawafuji, Hiroki Kajita, Shigeki Sakai, Hideo Saito, Ryo Hachiuma
Research Track A · General AI
Vision-Language Models (VLMs) have demonstrated immense promise in Spatio-Temporal Video Grounding (STVG). However, current evaluation protocols are largely confined to zero-shot assessments on general, daily-life benchmarks. This creates a critical disconnect from real-world applications in specialized fields, where m…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.6
2026-07-02 · Jijie Zhang, Zhe Ren, Quan Zhang, Dandan Guo
General AI
Large language models (LLMs) exhibit remarkable reasoning capabilities, but their task-specific fine-tuning is notoriously plagued by overconfidence, severely hindering trustworthy deployment. We propose Data-Adaptive Lower-Rank Adaptation (DALorRA), a simple and effective variational Bayesian sparse framework that shi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.6
2026-07-02 · Wentao Zhang, Liliana Hotsko, Woojeong Kim, Pengyu Nie, Stuart Shieber, Yuntian Deng
General AI
Many everyday programming tasks resist clean rule-based implementation, such as alerting on important log lines, repairing malformed JSON, or ranking search results by intent, and are increasingly outsourced to large language model APIs at the cost of locality, reproducibility, and price. We propose fuzzy-function prog…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-03-17 · Shuvam Banerji Seal, Aheli Poddar, Alok Mishra, Dwaipayan Roy
General AI
This paper introduces AgriIR, a configurable retrieval augmented generation (RAG) framework designed to deliver grounded, domain-specific answers while maintaining flexibility and low computational cost. Instead of relying on large, monolithic models, AgriIR decomposes the information access process into declarative mo…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.5
2026-03-22 · Shenghan Chen, Yiming Liu, Yanzhen Wang, Yujia Wang, Xiankai Lu
Research Track A · General AI
Balancing performance trade-off on long-tail (LT) data distributions remains a long-standing challenge. In this paper, we posit that this dilemma stems from a phenomenon called "tail performance degradation" (the model tends to severely overfit on head classes while quickly forgetting tail classes) and pose a solution …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 12.5
2026-03-25 · Risa Shinoda, Kaede Shiohara, Nakamasa Inoue, Kuniaki Saito, Hiroaki Santo, Fumio Okura
General AI
Understanding animal species from multimodal data poses an emerging challenge at the intersection of computer vision and ecology. While recent biological models, such as BioCLIP, have demonstrated strong alignment between images and textual taxonomic information for species identification, the integration of the audio …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-03-25 · Isha Puri, Mehul Damani, Idan Shenfeld, Marzyeh Ghassemi, Jacob Andreas, Yoon Kim
General AI
Given a question, a language model (LM) implicitly encodes a distribution over possible answers. In practice, post-training procedures for LMs often collapse this distribution onto a single dominant mode. While this is generally not a problem for benchmark-style evaluations that assume one correct answer, many real-wor…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-03-30 · He Du, Qiming Ge, Jiakai Hu, Aijun Yang, Zheng Cai, Zixian Huang, Sheng Yuan, Qinxiu Cheng, Xinchen Xie, Yicheng Chen, Yining Li, Jiaxing Xie, Huanan Dong, Yaguang Wu, Xiangjun Huang, Jian Yang, Hui Wang, Bowen Zhou, Bowen Li, Qipeng Guo, Kai Chen
General AI
We present Kernel-Smith, a framework for high-performance GPU kernel and operator generation that combines a stable evaluation-driven evolutionary agent with an evolution-oriented post-training recipe. On the agent side, Kernel-Smith maintains a population of executable candidates and iteratively improves them using an…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.5
2026-04-01 · Xingxing Weng, Ruifeng Ni, Chao Pang, XiangYu Hao, Yishan Wang, Xiaokang Zhang, Wei Xu, Gui-Song Xia
Research Track A · General AI
Current remote sensing vision-language models (RS VLMs) demonstrate impressive performance in image interpretation but rely on static training data, limiting their ability to accommodate continuously emerging sensing modalities and downstream tasks. This exposes a fundamental challenge: enabling RS VLMs to continually …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.5
2026-04-01 · Jie Mei, Li-Leng Peng, Keith Fuller, Jenq-Neng Hwang
Research Track A
For continual learning, text-prompt-based methods leverage text encoders and learnable prompts to encode semantic features for sequentially arrived classes over time. A common challenge encountered by existing works is how to learn unique text prompts, which implicitly carry semantic information of new classes, so that…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-04-05 · Shenzhi Yang, Guangcheng Zhu, Bowen Song, Sharon Li, Haobo Wang, Xing Zheng, Yingfan Ma, Zhongqi Chen, Weiqiang Wang, Gang Chen
General AI
Reinforcement Learning with Verifiable Rewards (RLVR) effectively trains reasoning models that rely on abundant perfect labels, but its vulnerability to unavoidable noisy labels due to expert scarcity remains critically underexplored. In this work, we take the first step toward a systematic analysis of noisy label mech…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-04-05 · Satyam Kumar, Saurabh Jha
General AI
Deploying large language models (LLMs) on heterogeneous edge devices demands frameworks that jointly optimize energy efficiency, inference quality, and reliability. Our prior QEIL v1 (Kumar & Jha, 2026) achieved 4.82x IPW improvement but relied on static efficiency factors, greedy optimization, and unverified candidate…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-04-06 · Yujian Liu, Jiabao Ji, Li An, Tommi Jaakkola, Yang Zhang, Shiyu Chang
General AI
Agent skills, which are reusable, domain-specific knowledge artifacts, have become a popular mechanism for extending LLM-based agents, yet formally benchmarking skill usage performance remains scarce. Existing skill benchmarking efforts focus on overly idealized conditions, where LLMs are directly provided with hand-cr…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-04-12 · Sandro Andric
General AI
Large language models are increasingly used as agents in social, economic, and policy simulations. A common assumption is that stronger reasoning should improve simulation fidelity. We argue that this assumption can fail when the objective is not to solve a strategic problem, but to sample plausible boundedly rational …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-04-13 · Yuqing Yang, Tengxiao Liu, Wang Bill Zhu, Taiwei Shi, Linxin Song, Robin Jia
General AI
As LLM-based assistants become persistent and personalized, they must extract and retain useful information from past conversations as memory. However, the types of information worth remembering vary considerably across tasks. We formalize the heterogeneous memory extraction task and introduce BEHEMOTH, a benchmark tha…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-04-16 · Yiqun Sun, Pengfei Wei, Lawrence B. Hsieh
General AI
Retrieval-Augmented Generation (RAG) grounds LLM responses in external evidence but treats the model as a passive consumer of search results: it never sees how the corpus is organized or what it has not yet retrieved, limiting its ability to backtrack or combine scattered evidence. We present Corpus2Skill, which distil…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-04-16 · Joongwon Kim, Wannan Yang, Kelvin Niu, Hongming Zhang, Yun Zhu, Eryk Helenowski, Ruan Silva, Zhengxing Chen, Srinivasan Iyer, Manzil Zaheer, Daniel Fried, Hannaneh Hajishirzi, Sanjeev Arora, Gabriel Synnaeve, Ruslan Salakhutdinov, Anirudh Goyal
General AI
Test-time scaling has become a powerful way to improve large language models. However, existing methods are best suited to short, bounded outputs that can be directly compared, ranked or refined. Long-horizon coding agents violate this premise: each attempt produces an extended trajectory of actions, observations, erro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.5
2026-04-17 · Fazeng Li, Gan Sun, Chenxi Liu, Yao He, Wei Cong, Yang Cong
Research Track A
Hand-eye calibration through visual localization is a critical capability for robotic manipulation in open-world environments. However, most deep learning-based calibration models suffer from catastrophic forgetting when adapting into unseen data amongst open-world scene changes, while simple rehearsal-based continual …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.5
2026-04-17 · Eunju Lee, MiHyeon Kim, JuneHyoung Kwon, Yoonji Lee, JiHyun Kim, Soojin Jang, YoungBin Kim
Research Track A · General AI
Pretrained Vision-Language Models (VLMs) like CLIP show promise in continual learning, but existing Few-Shot Class-Incremental Learning (FSCIL) methods assume homogeneous domains and balanced data distributions, limiting real-world applicability where data arises from heterogeneous disciplines with imbalanced sample av…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-04-24 · Bin Wu, Arastun Mammadli, Xiaoyu Zhang, Emine Yilmaz
General AI
The rapid growth of AI agent ecosystems is transforming how complex tasks are delegated and executed, creating a new challenge of identifying suitable agents for a given task. Unlike traditional tools, agent capabilities are often compositional and execution-dependent, making them difficult to assess from textual descr…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-04-28 · Yuxin Zhang, Xiangyu Tony Zhang, Daijiao Liu, Fei Tian, Yayue Deng, Jun Chen, Qingjian Lin, Haoyang Zhang, Yuxin Li, Jinglan Gong, Yechang Huang, Liang Zhao, Chengyuan Yao, Hexin Liu, Eng Siong Chng, Xuerui Yang, Gang Yu, Xiangyu Zhang, Daxin Jiang
General AI
Recent advancements in large audio language models have extended Chain-of-Thought (CoT) reasoning into the auditory domain, enabling models to tackle increasingly complex acoustic and spoken tasks. To elicit and sustain these extended reasoning chains, the prevailing paradigm -- driven by the success of text-based reas…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.5
2026-05-21 · Anuj Apte, Pranav Deshpande, Niraj Kumar, Shouvanik Chakrabarti, Junhyung Lyle Kim
Research Track A
Standard neural network training relies on learning-rate schedules tied to a fixed horizon, leading to strong path dependence and costly re-tuning as data availability changes. Schedule-Free (SF) methods address this by removing explicit schedules, yet SF-AdamW, the current state-of-the-art anytime optimizer, consisten…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-05-28 · Srikar Prabhas Kandagatla, Sreehitha R. Narayana, Chandana Magapu, Swetha Mohan, Shamanth Kuthpadi, Hongjie Chen, Ryan A. Rossi, Franck Dernoncourt, Nesreen Ahmed
General AI
Music recommendation systems typically treat songs as opaque tokens, relying on collaborative interaction histories which overlooks semantic or acoustic content. Prior work has explored LLM-augmented, multimodal, and text-enhanced approaches to sequential recommendation, and while some methods partially combine semanti…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.5
2026-05-29 · Haoxiang Zhang, Qixin Xu, Zhuofeng Li, Lei Zhang, Pengcheng Jiang, Yu Zhang, Julian McAuley
Research Track B · General AI
Long-horizon search agents accumulate large amounts of retrieved content across many tool calls, making context-budget efficiency increasingly important. A minimal intervention is to mask stale observations from the context as the trajectory progresses, but it remains unclear when this form of context management helps …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-06-01 · Haowen Hou, Zhen Huang, Zheming Liang, Qingyi Si, Chenglin Li, Shuai Dong, Kele Shao, Ruilin Li, Dianyi Wang, Nan Duan, Jiaqi Wang
General AI
Video is temporally redundant: adjacent frames usually share most objects, background, and layout. Yet existing video multimodal large language models (video MLLMs) usually encode each sampled frame as an independent RGB image, causing visual tokens to repeat content already present in earlier frames. This suggests a m…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-06-04 · Hanxu Hu, Zdeněk Šnajdr, Pinzhen Chen, Jannis Vamvas, Rico Sennrich
Research Track A · General AI
Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific languages, with limited zero-shot transfer at test time. To translate extremely low…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-06-08 · Yang Tian, Rui Wang, Xumeng Wen, Junjie Li, Shizhao Sun, Lei Song, Jiang Bian, Bo Zhao
General AI
Long-horizon agentic tasks pose a fundamental credit assignment challenge for outcome-base reinforcement learning: trajectory-level rewards verify final correctness but provide limited guidance on which intermediate reasoning steps or tool interactions contribute to the outcome. The difficulty is especially pronounced …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.5
2026-06-09 · Can Lin, Tao Feng, Hangjie Yuan, Dan Zhang, Yifan Zhu, Zhonghong Ou
Research Track A · Research Track B · General AI
Graphical User Interfaces (GUIs) serve as the dominant medium for human-computer interaction, yet building GUI agents that generalize across the vast diversity of real-world interface environments, with the same flexibility and robustness that humans naturally exhibit, remains unsolved. Notably, GUI data are inherently…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-06-11 · Siyi Chen, Xiaoyan Zhang, Meng Wu, Jonathan Tremblay, Valts Blukis, Stan Birchfield, Rene Vidal, Alvaro Velasquez, Sijia Liu, Qing Qu
General AI
Multi-agent systems communicate mostly through text, paying a lossy and expensive decode and re-encode cost. KV-cache communication is a promising alternative, yet most prior work is homogeneous, using duplicate copies of the same model, and avoids the central challenge of cross-model latent alignment; existing heterog…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-06-15 · DreamX Team, Yancheng Bai, Rui Chen, Xiangxiang Chu, Rujing Dang, Hao Dou, Bingjie Gao, Qiwen Gu, Siyu Hong, Jiachen Lei, Geng Li, Jifan Li, Ruimin Lin, Qingfeng Shi, Bingze Song, Lei Sun, Jing Tang, Ruitian Tian, Jun Wang, Jiahong Wu, Pengfei Zhang, Shen Zhang, Jiashu Zhu
General AI
DreamX-World 1.0 is a general-purpose interactive text/image-to-video world model for controllable long-horizon generation. It supports camera navigation, revisits to previously observed regions, and promptable events across photorealistic, game-style, and stylized domains. Our data engine combines camera-accurate Unre…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.4
2026-06-22 · Subarnaduti Paul, Yohan Jung, Mohammad Emtiyaz Khan, Siddharth Swaroop, Thomas Möllenhoff, Martin Mundt
Research Track A · General AI
Continual learning remains a major challenge for modern deep networks, partly because commonly used optimizers lack inherent mechanisms for continual adaptation. One such natural mechanism is fast and slow adaptation to balance stability and plasticity. This mechanism has deep roots in neuroscience and biology, but the…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.4
2026-06-23 · Xirui Li, Zhe Liu, Xiaoqing Ye, Wenhua Han, Yifeng Pan, Junyu Han, Hengshuang Zhao
General AI
Multimodal driving planning faces a long-standing tension between two paradigms: scoring-based methods benefit from dense reward supervision but are confined to a fixed action vocabulary, while anchor-based methods generate proposals dynamically yet suffer from sparse supervision constrained to a single ground-truth tr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.4
2026-06-23 · Animesh Animesh, Satheesh K Perepu, Kaushik Dey
Research Track A · General AI
In cooperative multi-agent reinforcement learning (MARL), from a deployment perspective, it is challenging and expensive to train agents from scratch for each new environment or task. In this work, we propose GCT-MARL, a transfer learning framework that builds on the multi-view graph contrastive backbone of MAIL and au…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.4
2026-06-23 · Chenhao Dang, Jing Ma, Mingjie Liao
General AI
The composition of training data, governed by the diversity of sources and their mixing strategy, is a cornerstone of Large Language Model (LLM) pre-training. Online Data Mixing (ODM), the technique of adaptively adjusting data mixtures during training, has emerged as a promising direction to improve efficiency. Howeve…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-03-25 · Bingqing Wei, Zhongyu Xia, Dingai Liu, Xiaoyu Zhou, Zhiwei Lin, Yongtao Wang
General AI
Vision-language models (VLMs) have shown remarkable general capabilities, yet embodied agents built on them fail at complex tasks, often skipping critical steps, proposing invalid actions, and repeating mistakes. These failures arise from a fundamental gap between the static training data of VLMs and the physical inter…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-03-26 · Zirui Zhang, Haoyu Dong, Kexin Pei, Chengzhi Mao
General AI
Robust perception and reasoning require consistency across sensory modalities. Yet current multimodal models often violate this principle, yielding contradictory predictions for visual and textual representations of the same concept. Rather than masking these failures with standard voting mechanisms, which can amplify …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-03-26 · Sicheng Zuo, Yuxuan Li, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu
General AI
Vision-language-action models have reshaped autonomous driving to incorporate languages into the decision-making process. However, most existing pipelines only utilize the language modality for scene descriptions or reasoning and lack the flexibility to follow diverse user instructions for personalized driving. To addr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-03-27 · Mahesh Bhosale, Abdul Wasi, Shantam Srivastava, Shifa Latif, Tianyu Luan, Mingchen Gao, David Doermann, Xuan Gong
General AI
While powerful in image-conditioned generation, multimodal large language models (MLLMs) can display uneven performance across demographic groups, highlighting fairness risks. In safety-critical clinical settings, such disparities risk producing unequal diagnostic narratives and eroding trust in AI-assisted decision-ma…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-03-29 · Zhaopeng Feng, Liangcai Su, Zhen Zhang, Xinyu Wang, Xiaotian Zhang, Xiaobin Wang, Runnan Fang, Qi Zhang, Baixuan Li, Shihao Cai, Rui Ye, Hui Chen, Jiang Yong, Joey Tianyi Zhou, Chenxiong Qian, Pengjun Xie, Bryan Hooi, Zuozhu Liu, Jingren Zhou
Research Track B · General AI
As large language models (LLMs) evolve into autonomous agents for long-horizon information-seeking, managing finite context capacity has become a critical bottleneck. Existing context management methods typically commit to a single fixed strategy throughout the entire trajectory. Such static designs may work well in so…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-03-30 · Ikechukwu Uchendu, Swati Goel, Karly Hou, Ebrahim Songhori, Kuang-Huei Lee, Joe Wenjie Jiang, Vijay Janapa Reddi, Vincent Zhuang
General AI
We propose using Vision-Language Models (VLMs) for macro placement in chip floorplanning, a complex optimization task that has recently shown promising advancements through machine learning methods. Because human designers rely heavily on spatial reasoning to arrange components on the chip canvas, we hypothesize that V…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-01 · Yutao Yang, Junsong Li, Qianjun Pan, Jie Zhou, Kai Chen, Qin Chen, Jingyuan Zhao, Ningning Zhou, Xin Li, Liang He
Research Track A · General AI
Existing methods for AI psychological counselors predominantly rely on supervised fine-tuning using static dialogue datasets. However, this contrasts with human experts, who continuously refine their proficiency through clinical practice and accumulated experience. To bridge this gap, we propose an Experience-Driven Li…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-02 · Ruozhen He, Nisarg A. Shah, Qihua Dong, Zilin Xiao, Jaywon Koo, Vicente Ordonez
General AI
Existing visual grounding benchmarks primarily evaluate alignment between image regions and literal referring expressions, where models can often succeed by matching a prominent named category. We explore a complementary and more challenging setting of scenario-based visual grounding, where the target must be inferred …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-02 · Zhengxi Lu, Zhiyuan Yao, Jinyang Wu, Chengcheng Han, Qi Gu, Xunliang Cai, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen
General AI
Agent skills, structured packages of procedural knowledge and executable resources that agents dynamically load at inference time, have become a reliable mechanism for augmenting LLM agents. Yet inference-time skill augmentation is fundamentally limited: retrieval noise introduces irrelevant guidance, injected skill co…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-03 · Renze Lou, Baolin Peng, Wenlin Yao, Qianhui Wu, Hao Cheng, Suman Nath, Wenpeng Yin, Jianfeng Gao
Research Track B · General AI
As web agents rapidly evolve, an increasing body of work has moved beyond conventional atomic browser interactions and explored tool use as a higher-level action paradigm. Although prior studies have shown the promise of tools, their conclusions are often drawn from limited experimental scales and sometimes non-compara…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-06 · Tien Nguyen, Muhammad Ali Gulzar, Kirshanthan Sundararajah
General AI
Scientific software relies on high-precision computation, yet finite floating-point representations can introduce precision errors that propagate in safety-critical domains. Despite the growing use of large language models (LLMs) in scientific applications, their reliability in handling floating-point numerical stabili…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-06 · Weian Mao, Xi Lin, Wei Huang, Yuxin Xie, Tianfu Fu, Bohan Zhuang, Song Han, Yukang Chen
General AI
Extended reasoning in large language models (LLMs) creates severe KV cache memory bottlenecks. Leading KV cache compression methods estimate KV importance using attention scores from recent post-RoPE queries. However, queries rotate with position during RoPE, making representative queries very few, leading to poor top-…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-07 · Ahmet Rasim Emirdagi, Süleyman Aslan, Mısra Yavuz, Görkay Aydemir, Yunus Bilge Kurt, Nasrin Rahimi, Burak Can Biner, M. Akın Yılmaz
General AI
Metal artifacts from high-attenuation implants severely degrade CT image quality, obscuring critical anatomical structures and posing a challenge for standard deep learning methods that require extensive paired training data. We propose a paradigm shift: reframing artifact reduction as an in-context reasoning task by a…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-09 · Jiaquan Zhang, Chaoning Zhang, Shuxu Chen, Zhenzhen Huang, Pengcheng Zheng, Zhicheng Wang, Ping Guo, Fan Mo, Sung-Ho Bae, Jie Zou, Jiwei Wei, Yang Yang
Research Track A · General AI
Although LLM agents can leverage tools for complex tasks, they still need memory to maintain cross-turn consistency and accumulate reusable information in long-horizon interactions. However, retrieval-based external memory systems incur low online overhead but suffer from unstable accuracy due to limited query construc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-13 · Buseong Kim, Heejun Gwon
Research Track A · General AI
In large language models performing long-form reasoning, the KV cache grows rapidly with decode length, creating bottlenecks in memory and inference stability. Existing reasoning-oriented KV compression has mostly followed an eviction-centered view: estimate token importance more accurately, then discard lower-ranked e…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-13 · Jiayuan Rao, Tianlin Gui, Haoning Wu, Yanfeng Wang, Weidi Xie
General AI
Modeling open-play soccer tactics is a formidable challenge due to the stochastic, multi-agent nature of the game. Existing computational approaches typically produce single, deterministic trajectory forecasts or focus on highly structured set-pieces, fundamentally failing to capture the inherent variance and branching…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-14 · Yikun Liu, Jiangchao Yao, Weidi Xie, Yanfeng Wang
General AI
Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image, and a relative caption that specifies the desired modification. Despite the rapid development of CIR models, their performance is not well characterized by existing benchmarks, which inherently contain indeter…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-14 · Serdar Kadioglu, Karthik Uppuluri, Akash Singirikonda
General AI
There is growing interest in leveraging large language models (LLMs) for text-to-model translation and optimization tasks. This paper aims to advance this line of research by introducing \textsc{Text2Model} and \textsc{Text2Zinc}. \textsc{Text2Model} is a suite of co-pilots based on several LLM strategies with varying …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-16 · Zihao Xu, John Harvill, Ziwei Fan, Yizhou Sun, Hao Ding, Hao Wang
General AI
Large Language Models (LLMs) incur significant computational and memory costs when processing long prompts, as full self-attention scales quadratically with input length. Token compression aims to address this challenge by reducing the number of tokens representing inputs. However, existing prompt-compression approache…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-17 · Sarthak Mittal, Leo Gagnon, Guillaume Lajoie
General AI
Frontier models have demonstrated exceptional capabilities following the integration of task-reward-based reinforcement learning (RL) into their training pipelines, enabling systems to evolve from pure reasoning models into sophisticated agents. However, debate persists regarding whether RL genuinely instills new skill…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-17 · Shriram Chennakesavalu, Kirill Shmilovich, Hayley Weir, Colin Grambow, John Bradshaw, Patricia Suriana, Chen Cheng, Kangway Chuang
General AI
Large Language Models (LLMs) have the potential to accelerate small molecule drug design due to their ability to reason about information from diverse sources and formats. However, their practical utility remains unclear due to the lack of benchmarks that reflect real-world scenarios. In this work, we introduce a suite…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-20 · Joonhyuk Lee, Virginia Ma, Sarah Zhao, Yash Nair, Asher Spector, Regev Cohen, Emmanuel J. Candès
General AI
Verification of model outputs is rapidly emerging as a key primitive for both training and real-world deployment of large language models (LLMs). In practice, this often involves using imperfect LLM judges and reward models since ground truth acquisition can be time-consuming and expensive. We introduce Fully Unsupervi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-20 · HaeJun Yoo, Yongseop Shin, Insung Lee, Myoung-Wan Koo, Du-Seong Chang
General AI
Audio-text retrieval systems based on Contrastive Language-Audio Pretraining (CLAP) achieve strong performance on traditional benchmarks; however, these benchmarks rely on caption-style queries that differ substantially from real-world search behavior, limiting their assessment of practical retrieval robustness. We pre…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-20 · Yakoub Bazi, Mohamad M. Al Rahhal, Mansour Zuair, Faroun Mohamed
General AI
Change visual question answering (Change VQA) addresses the problem of answering natural-language questions about semantic changes between bi-temporal remote sensing (RS) images. Although vision-language models (VLMs) have recently been studied for temporal RS image understanding, Change VQA remains underexplored in th…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-20 · Salman Rahman, Jingyan Shen, Anna Mordvina, Hamid Palangi, Saadia Gabriel, Pavel Izmailov
General AI
Large language models have achieved significant reasoning improvements through reinforcement learning with verifiable rewards (RLVR). Yet as model capabilities grow, constructing high-quality reward signals becomes increasingly difficult, making it essential to understand when RLVR can succeed under weaker forms of sup…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-21 · Zhiyuan Peng, Wei Tao, Xin Yin, Chenhao Ying, Yuan Luo, Yiwen Guo
General AI
Large language models (LLMs) have achieved strong results in code generation, but their ability to generate GUI applications, especially games, remains insufficiently studied. Existing benchmarks mainly evaluate correctness through test cases, which are inadequate for GUI applications because these systems are interact…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-23 · Run Hao, Zhuoran Tan
General AI
Model Context Protocol (MCP) is increasingly adopted for tool-integrated LLM agents, but its multi-layer design and third-party server ecosystem expand risks across tool metadata, untrusted outputs, cross-tool flows, multimodal inputs, and supply-chain vectors. Existing MCP benchmarks largely measure robustness to mali…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-23 · Zixu Li, Yupeng Hu, Zhiheng Fu, Zhiwei Chen, Yongqi Li, Liqiang Nie
General AI
Composed Image Retrieval (CIR) is an important image retrieval paradigm that enables users to retrieve a target image using a multimodal query that consists of a reference image and modification text. Although research on CIR has made significant progress, prevailing setups still rely simple modification texts that typ…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-24 · William Dawson, Louis Beal, Yoann Curé, Giuseppe Fisicaro, Dorian Rolland, Luigi Genovese
General AI
Large language models (LLMs) and agentic systems have recently demonstrated potential for automating scientific workflows, including atomistic simulations. However, their deployment in high-performance computing (HPC) environments remains limited by the lack of mechanisms ensuring correctness, reproducibility, and safe…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-24 · Mengzhuo Chen, Junjie Wang, Fangwen Mu, Yawen Wang, Zhe Liu, Huanxiang Feng, Qing Wang
General AI
Failure attribution, i.e., identifying the responsible agent and decisive step of a failure, is particularly challenging in LLM-based multi-agent systems (MAS) due to their natural-language reasoning, nondeterministic outputs, and intricate interaction dynamics. A reliable benchmark is therefore essential to guide and …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-27 · Zahra Dehghanighobadi, Asja Fischer
General AI
Long-context reasoning is a critical capability of large language models (LLMs), enabling applications such as long-document understanding, summarization, and code generation. However, efficient autoregressive inference relies on the key-value (KV) cache, whose memory footprint grows linearly with sequence length, lead…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-28 · Zhou Hanlin, Chan Huah Yong
General AI
Long-horizon LLM tasks often fail not because a single answer is unattainable, but because knowledge states drift across rounds, intermediate commitments remain implicit, and interruption fractures the evolving evidence chain. This paper presents ADEMA as a knowledge-state orchestration architecture for long-horizon kn…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-04-28 · Hector G. Rodriguez, Marcus Rohrbach
General AI
Multimodal large language models (MLLMs) achieve ever-stronger performance on visual-language tasks. Even as traditional visual question answering benchmarks approach saturation, reliable deployment requires satisfying low error tolerances in real-world out-of-distribution (OOD) scenarios. Precisely, selective predicti…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-05-30 · Soham Roy, Sarthakbrata Halder, Arya Bharaty, Vaibhav Bhaskar, Yash Sinha, Dhruv Kumar, Srikant Panda, Murari Mandal
Research Track B · General AI
Deceptive web content, widely instantiated across the internet and commonly known as \textit{social-engineering attacks}, manipulates autonomous web agents into submitting users' personally identifiable information (PII) to attacker-controlled endpoints. In this paper, we show that social-engineering attacks are highly…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-06-07 · Zhengyi Zhuo, Yan Liu
General AI
Software engineering agents (SWE agents) increasingly work through tool-mediated trajectories in real repositories, yet their behavior remains difficult to characterize in concrete, observable terms. These trajectories record tool use, intermediate reasoning, evidence selection, and self-directed stopping, but they do …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-06-08 · Anton Bolychev, Georgiy Malaniya, Sinan Ibrahim, Pavel Osinenko
General AI
Training reinforcement learning (RL) policies from scratch is costly: it requires careful reward and environment design, extensive tuning, and substantial computation. Yet many control problems already have a functional but suboptimal policy available as a baseline. This paper proposes a method for embedding such a bas…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-06-08 · Yinyu Huang, Yilin Zhang, Sofia Michopoulou, Christopher Kipps, Rahman Attar
General AI
Alzheimer's disease (AD) progression is highly heterogeneous and is typically observed through sparse and irregular longitudinal data, posing challenges for prediction and personalised monitoring. Existing machine learning approaches have improved AD prediction using multimodal data, yet often focus on static classific…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-06-09 · Michele Lucente, Silvia Pascoli, Filippo Sala, Matteo Zandi
General AI
We present DarkAgents: a multi-agent system that leverages the reasoning and code-generation capabilities of large language models (LLMs), together with deterministic tested human-written code, to build orchestrated pipelines for theoretical astroparticle physics research. While related approaches have been proposed in…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-06-09 · Kevin Qinghong Lin, Batu EI, Yuhong Shi, Pan Lu, Philip Torr, James Zou
General AI
Data tells stories that shape society; the data journalist's job is to turn raw information into stories non-experts can trust. A high-quality news feature takes a newsroom team weeks: hunting for context, running statistics, choosing an angle, and designing visuals. Recent agents handle individual steps well: data-sci…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-06-11 · Yaxin Du, Yifan Zhou, Yujie Ge, Jiajun Wang, Xianghe Pang, Shuo Tang, Tuney Zheng, Bryan Dai, Jian Yang, Siheng Chen
General AI
Tool-augmented LLM agents commonly rely on step-wise atomic tool calls, where each invocation, observation, and value transfer is exposed in the main reasoning trace. This creates an \emph{execution-granularity mismatch}: locally deterministic tool workflows are unfolded into repeated model-visible decisions, consuming…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-06-15 · Buqiang Xu, Zirui Xue, Dianmou Chen, Chenyang Fu, Chiyu Wu, Caiying Huang, Chen Jiang, Jizhan Fang, Xinle Deng, Yijun Chen, Yunzhi Yao, Xuehai Wang, Jin Shang, Gong Yu, Ningyu Zhang
General AI
As LLM agents are deployed in long-horizon sessions, context accumulation drives up inference costs. Existing approaches utilize text pruning or dynamic memory eviction to minimize token footprints; however, their unconstrained sequence mutations alter layouts, introducing prefix mismatches and cache invalidation. This…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-06-16 · Nicola Franco
General AI
We evaluate the adversarial robustness of two frontier large language models (LLMs) developed by Anthropic, Fable 5 and Opus 4.8, against four families of automated jailbreak attack across 7 826 harmful intents spanning a ten-category harm taxonomy. Using the HackAgent red-teaming framework, hundreds of thousands of ad…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-06-16 · Xingming Li, Ao Cheng, Qiyao Sun, Xixiang He, Xuanyu Ji, Runke Huang, Qingyong Hu
General AI
When vision contradicts text, multimodal large language models (MLLMs) consistently favor text, even when images provide clear evidence otherwise. This bias poses risks for applications requiring visual grounding, yet its cause remains unclear. In this paper, we uncover a surprising finding: models often get it right i…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-06-16 · Shanda Li, Qiuhong Anna Wei, Jingwu Tang, Valerie Chen, Nihar B Shah, Tim Dettmers, Yiming Yang, Ameet Talwalkar
General AI
Reproducing research results from papers and released code is central to scientific progress. Existing works have introduced benchmarks to evaluate whether LLM agents can assist with reproducibility, but they are difficult to scale due to their reliance on substantial manual effort for data curation and evaluation. We …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-06-17 · H. M. Sabbir Ahmad, Ehsan Sabouni, Emrullah Celik, Zean Wan, Damola Ajeyemi, Christos G. Cassandras, Wenchao Li
General AI
We propose a mixed-reality, hardware-in-the-loop (HIL) testbed for autonomous vehicles that seamlessly integrates a physical testbed of mobile robots with a high-fidelity simulation environment. The virtual simulation enables the creation of diverse, safety-critical driving scenarios to validate state-of-the-art percep…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-07-01 · Yuting Zhang, Yanbei Liu, Zhitao Xiao, Lei Geng, Yanwei Pang, Xiao Wang
Research Track A · General AI
Self-supervised Continual Graph Learning (CGL) aims to successively learn from a graph sequence with different tasks without label supervision - a paradigm that has attracted widespread attention. Most existing self-supervised CGL methods rely on instance-level consistency objectives that enforce stability of individua…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.2
2026-06-23 · Anand Kamat, Daniel Blake, Brent M. Werness
General AI
Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse tasks, yet they remain prone to generating hallucinations. Detecting these hallucinations is critical for deploying LLMs reliably in high-stakes applications. We present Grad Detect, a gradient-based approach for predicting hallucinat…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.2
2026-06-23 · Simone Gallivanone, Hossein Khodadadi, Mauro Dore, Mauro Medda, Nicola Franco
General AI
We introduce a large-scale, open-source dataset of pre-generated adversarial attacks for vision-language models (VLMs). The dataset is designed to be diverse, representative, and practical, extending existing benchmarks by covering 10 high-level categories and 55 subcategories of harmful intents. Our primary goal is to…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.2
2026-06-24 · Yves Ferstler, Adam Podoxin, Ty Brassington, Roman Grundkiewicz, Maite Taboada, Marzena Karpinska
General AI
AI translation of literary works is increasingly common. While the content may be rendered adequately, we do not know enough about how readers experience it in terms of immersiveness and literary effect, aspects poorly captured by automatic machine translation metrics or human evaluation targeting fluency and adequacy.…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.2
2026-06-24 · Yuxing Cheng, Yuan Wu, Yi Chang
Research Track A · General AI
Vision-language models (VLMs) have achieved strong performance on OCR-based benchmarks and increasingly focused on text-rich understanding, but their robustness under controlled visual degradation remains insufficiently understood. This gap is critical for OCR reasoning, where visual corruption can induce OCR errors an…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.2
2026-06-24 · Alexander Schperberg, Shivam K. Panda, Abraham P. Vinod, M. K. Jawed, Stefano Di Cairano
General AI
We present RoboAtlas, a contextual Active SLAM framework that adaptively balances geometric exploration and semantic reasoning using a scalable 3D semantic mapping system, OpenRoboVox. RoboAtlas integrates frontier exploration, global semantic-map reasoning, and egocentric VLM-based reasoning through a contextual multi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.0
2026-03-30 · Shivnath Tathe
Research Track A
Fixed representational capacity is a fundamental constraint in continual learning: practitioners must guess an appropriate model width before training, without knowing how many distinct concepts the data contains. We propose LACE (Loss-Adaptive Capacity Expansion), a simple online mechanism that expands a model's repre…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.0
2026-04-01 · Mohammad R. Abu Ayyash
Research Track A · General AI
We present Brainstacks, a modular architecture for continual multi-domain fine-tuning of large language models that packages domain expertise as frozen adapter stacks composing additively on a shared frozen base at inference. Five interlocking components: (1) MoE-LoRA with Shazeer-style noisy top-2 routing across all s…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.0
2026-04-01 · Henry Peng Zou, Chunyu Miao, Wei-Chieh Huang, Yankai Chen, Yue Zhou, Hanrong Zhang, Yaozu Wu, Liancheng Fang, Zhengyao Gu, Zhen Zhang, Kening Zheng, Fangxin Wang, Yi Nian, Shanghao Li, Wenzhe Fan, Langzhou He, Weizhi Zhang, Xue Liu, Philip S. Yu
Research Track B · General AI
As LLM agents transition from short, static problem solving to executing complex, long-horizon tasks in dynamic environments, the ability to handle user interruptions, such as adding requirement or revising goals, during mid-task execution is becoming a core requirement for realistic deployment. However, existing bench…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.0
2026-04-14 · Amar Gahir, Varshil Patel, Shreyank N Gowda
Research Track A · General AI
Deep neural networks are typically trained by uniformly sampling large datasets across epochs, despite evidence that not all samples contribute equally throughout learning. Recent work shows that progressively reducing the amount of training data can improve efficiency and generalization, but existing methods rely on f…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.0
2026-04-16 · Tingjia Miao, Wenkai Jin, Muhua Zhang, Jinxin Tan, Yuelin Hu, Tu Guo, Jiejun Zhang, Yuhan Wang, Wenbo Li, Yinuo Gao, Shuo Chen, Weiqi Jiang, Yayun Hu, Zixing Lei, Xianghe Pang, Zexi Liu, Yuzhi Zhang, Linfeng Zhang, Kun Chen, Wei Wang, Weinan E, Siheng Chen
General AI
The paradigm of agentic science requires AI systems to conduct robust reasoning and engage in long-horizon, autonomous exploration. However, current scientific benchmarks remain confined to domain knowledge comprehension and complex reasoning, failing to evaluate the exploratory nature and procedural complexity of real…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.0
2026-04-16 · Guy Kaplan, Zorik Gekhman, Zhen Zhu, Lotem Rozner, Yuval Reif, Swabha Swayamdipta, Derek Hoiem, Roy Schwartz
Research Track A · General AI
Large language models are prone to hallucinating factually incorrect statements. A key source of these errors is exposure to new factual information through supervised fine-tuning (SFT), which can increase hallucinations w.r.t. knowledge acquired during pre-training. In this work, we explore whether SFT-induced halluci…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.0
2026-05-01 · Chengshuai Shi, Wenzhe Li, Xinran Liang, Yizhou Lu, Wenjia Yang, Ruirong Feng, Seth Karten, Ziran Yang, Zihan Ding, Gabriel Sarch, Danqi Chen, Karthik Narasimhan, Chi Jin
General AI
Given the rapidly growing capabilities of vision-language models (VLMs), extending them to interactive decision-making tasks such as video games has emerged as a promising frontier. However, existing approaches either rely on large-scale supervised fine-tuning (SFT) on human trajectories or apply reinforcement learning…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.0
2026-05-07 · Yuxing Liu, Jianyu Wang, Tong Zhang
Research Track A · General AI
Optimizers play an important role in both pretraining and finetuning stages when training large language models (LLMs). In this paper, we present an observation that full finetuning with the same optimizer as in pretraining achieves a better learning-forgetting tradeoff, i.e., forgetting less while achieving the same o…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.0
2026-05-09 · Liangqi Yuan, Wenzhi Fang, Shiqiang Wang, H. Vincent Poor, Christopher G. Brinton
General AI
Large language models (LLMs) are transforming society, powering applications from smartphone assistants to autonomous driving. Yet cloud-based LLM services alone cannot serve a growing class of applications, including those operating under intermittent connectivity, sub-second latency budgets, data-residency constraint…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.0
2026-05-10 · Zhiqing Zhong, Zhijing Ye, Jiamin Wang, Xiaodong Yu
Research Track B · General AI
Closed-loop tool-using agents are increasingly evaluated in executable web, code, and micro-task environments, but benchmark reports often conflate workloads, action-generating drivers, and the evidence admitted for systems-facing claims. We present an executable benchmarking suite that makes these objects explicit und…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.0
2026-05-10 · Yilin Zhang, Yingkai Hua, Chunyu Wei, Xin Wang, Yueguo Chen
Research Track B · General AI
Vision-language model (VLM) based web agents demonstrate impressive autonomous GUI interaction but remain vulnerable to deceptive interface elements. Existing approaches either detect deception without task integration or document attacks without proposing defenses. We formalize deception-aware web agent defense and pr…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.0
2026-05-12 · Zhong Guan, Yongjian Guo, Haoran Sun, Wen Huang, Shuai Di, Xiong Jun Wu, Likang Wu, Hongke Zhao
General AI
Asynchronous reinforcement learning improves rollout throughput for large language model agents by decoupling sample generation from policy optimization, but it also introduces a critical failure mode for PPO-style off-policy correction. In heterogeneous training systems, the total importance ratio should ideally be de…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.0
2026-05-14 · Tri Cao, Yulin Chen, Hieu Cao, Yibo Li, Khoi Le, Thong Nguyen, Yuexin Li, Yufei He, Yue Liu, Shuicheng Yan, Bryan Hooi
Research Track B · General AI
Web agents can autonomously complete online tasks by interacting with websites, but their exposure to open web environments makes them vulnerable to prompt injection attacks embedded in HTML content or visual interfaces. Existing guard models still suffer from limited generalization to unseen domains and attack pattern…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.0
2026-05-15 · Chinmay Savadikar, Mingyu Zhao, Yuanzheng Zhu, Han Li, Shuang Xie, Alberto Castelo, Tianfu Wu, Lingyun Wang
Research Track B · General AI
Developing and evaluating e-commerce web agents requires environments that preserve meaningful task structure while enabling controllable, reproducible, and scalable scientific comparison. Existing methodologies force a tradeoff: live storefronts provide realism but are non-stationary, difficult to inspect, and irrepro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.0
2026-05-15 · Mike Wong, Kevin Hsieh, Suman Nath, Ravi Netravali
Research Track B · General AI
Skim is a speculative execution framework for web agents that exploits the predictable structure of purpose-built websites. Today's web-agent expense is not intrinsic to the tasks but a property of how agents are composed: frontier-model inference, browser rendering, and ReAct-style planning are applied to every step o…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.0
2026-05-18 · Boyuan Sun, Bowen Yin, Yuanming Li, Xihan Wei, Qibin Hou
General AI
We present SWIM (See What I Mean), a novel training strategy that aligns vision and language representations to enable fine-grained object understanding solely from textual prompts. Unlike existing approaches that require explicit visual prompts, such as masks or points, SWIM leverages mask supervision only during trai…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.0
2026-05-28 · Yunbo Tang, Chengyi Yang, Shiyu Liu, Zhishang Xiang, Zerui Chen, Qinggang Zhang, Jinsong Su
General AI
Agentic search enables LLMs to solve complex multi-hop questions through iterative reasoning and external search. Despite the effectiveness, these systems often suffer from a critical limitation in practice: agents fail to recognize their own knowledge boundaries, blindly triggering searches when internal knowledge suf…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.0
2026-05-29 · Dongxin Guo, Jikun Wu, Siu Ming Yiu
Research Track B · General AI
Extended chain-of-thought reasoning can degrade performance on deterministic state-tracking tasks, not due to preference biases, but limits rooted in the information-theoretic capacity of decoder-only attention. We establish: (1) an Attention Bottleneck Theorem with a complementary achievability construction, bounding …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.0
2026-06-08 · Gianluca Barmina, Annemette Broch Pirchert, Andrea Blasi Núñez, Lukas Galke Poech, Peter Schneider-Kamp
Research Track A · General AI
As deep learning models scale, managing, inspecting, and modifying large checkpoints has become increasingly challenging. Researchers often need to alter model weights for layer restructuring, precision casting, low-rank factorization, and architectural debugging, yet these workflows often rely on fragile ad-hoc Python…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.0
2026-06-09 · Jebacyril Arockiaraj, Dhruv Parikh, Jayashree Adivarahan, Rajgopal Kannan, Viktor Prasanna
Research Track A · General AI
Federated continual learning (FCL) must learn from distributed task streams under limited resources, such as communication, computation, memory, and label availability. Existing FCL methods often rely on repeated local optimization, replay, and full supervision. Analytic alternatives avoid iterative training and replay…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.0
2026-06-12 · Mary Isabelle Wisell, Nicholas Jacobs, Aayush Manandhar, Salimeh Yasaei Sekeh
Research Track A · General AI
Multi-source transfer learning faces a fundamental scalability bottleneck: existing approaches require either loading all K source models into memory simultaneously during parameter fusion, requiring O(K) memory, or deploying all models at inference time, making production deployment infeasible. We propose GRASP (Gradi…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.0
2026-06-29 · Jiacheng Zhang, Haoyu He, Sen Zhang, Shen Wang, Xiaolei Xu, Yuhao Sun, Meng Shen, Feng Liu
General AI
In real-world applications, guardrails are often expected to identify unsafe user-model interactions according to application-specific safety policies, rather than relying on predefined risk taxonomies. In this work, we study this setting under the paradigm of in-context policy guardrailing, where guardrails predict sa…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.9
2026-06-25 · Minbyul Jeong
Research Track B · General AI
Web-agent benchmarks overwhelmingly measure depth -- pinning one obscure answer behind a chain of constraints -- while breadth, exhaustively enumerating a closed set and filling each item's attributes, is barely evaluated, especially outside English. Breadth is also hard to build: certifying that a gold set is complete…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.9
2026-06-26 · Kuo-Chung Peng, Jiun-Cheng Jiang, Chun-Hua Lin, Tai-Yue Li, Nan-Yow Chen, Samuel Yen-Chi Chen
General AI
Traffic matrices (TMs) capture network-wide origin-destination demand and are central to traffic engineering, yet accurate whole-matrix forecasting remains challenging when prediction must be performed under the memory, update, and training-budget constraints of online network control. This paper investigates whether c…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 11.8
2026-05-04 · Mohamad Khajezade, Fatemeh H. Fard, Mohamed Sami Shehata
General AI
Cross-language code clone detection (X-CCD) is challenging because semantically equivalent programs written in different languages often share little surface similarity. Although large language models (LLMs) have shown promise for semantic clone detection, their use as black-box systems raises concerns about cost, repr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-05-04 · Xin Zhang, Qiqi Tao, Jiawei Du, Moyun Liu, Joey Tianyi Zhou
General AI
Continuous latent-space reasoning offers a compact alternative to textual chain-of-thought for multimodal models, enabling high-dimensional visual evidence to be integrated without explicit reasoning tokens. However, we identify a previously overlooked optimization pathology in existing latent visual reasoning methods:…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-05-06 · Pietro Bonazzi, Youssef Ahmed, Daniel Eckert, Andrea Ronco, Junjie Zeng, Dengxin Dai, Michele Magno
General AI
Despite widespread adoption of smartwatches worldwide, open-benchmarks for wrist-based gesture recognition remain surprisingly limited. In this work, we introduce the first open-access multi-modal benchmark, OpenWatch, for wrist-based gesture recognition using synchronized inertial and physiological sensing on a commer…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-05-07 · Hao Dong, Hongzhao Li, Shupan Li, Muhammad Haris Khan, Eleni Chatzi, Olga Fink
General AI
Despite the growing popularity of Multimodal Domain Generalization (MMDG) for enhancing model robustness, it remains unclear whether reported performance gains reflect genuine algorithmic progress or are artifacts of inconsistent evaluation protocols. Current research is fragmented, with studies varying significantly a…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-05-07 · Yangfu Zhu, Zitong Han, Nianwen Ning, Yuting Wei, Yuandong Wang, Hang Feng, Zhenzhou Shao
General AI
Multimodalpersonalityunderstandingplaysacriticalroleinhuman centered artificial intelligence. Previous work mainly focus on learn-ing rich multimodal representations for video personality under standing. However, they often suffer from potential harm caused by subject bias (e.g., observable age and unobservable mental …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-05-07 · Zhexuan Wang, Xuebo Liu, Li Wang, Zifei Shan, Yutong Wang, Zhenxi Song, Min Zhang
General AI
Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts. While the quality of these prompts is pivotal, jointly optimizing them across interacting agents remains a non-trivial challenge, prim…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-05-11 · Kainat Riaz, Muhammad Ahmed Mohsin, Ahsan Bilal, Muhammad Umer, Ayesha Mohsin, Aqib Riaz, Ali Subhan, John M. Cioffi
General AI
Automated scientific discovery using large language models relies on identifying genuinely novel solutions. Standard reinforcement learning penalizes high-variance mutations, which leads the policy to prioritize familiar patterns. As a result, the maximum reward plateaus even as the average reward increases. Overcoming…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-05-12 · Junxian Li, Kai Liu, Zizhong Ding, Zhixin Wang, Zhikai Chen, Renjing Pei, Yulun Zhang
General AI
The development of separate-encoder Unified multimodal models (UMMs) comes with a rapidly growing inference cost due to dense visual token processing. In this paper, we focus on understanding-side visual token reduction for improving the efficiency of separate-encoder UMMs. While this topic has been widely studied for …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-05-12 · Guohui Zhang, XiaoXiao Ma, Jie Huang, Hang Xu, Hu Yu, Siming Fu, Yuming Li, Zeyue Xue, Lin Song, Haoyang Huang, Nan Duan, Feng Zhao
General AI
Recent advances in joint audio-video generation have been remarkable, yet real-world applications demand strong per-modality fidelity, cross-modal alignment, and fine-grained synchronization. Reinforcement Learning (RL) offers a promising paradigm, but its extension to multi-objective and multi-modal joint audio-video …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-05-20 · Caleb Winston, Ron Yifeng Wang, Azalia Mirhoseini, Christos Kozyrakis
Research Track B · General AI
Computer-use agents (CUA) automate tasks specified with natural language such as "order the cheapest item from Taco Bell" by generating sequences of calls to tools such as click, type, and scroll on a browser. Current implementations follow a sequential fetch-screenshot-execute loop where each iteration requires an LLM…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-05-28 · Chong Bao, Shichen Liu, Lijun Yu, David Futschik, Stylianos Moschoglou, Shefali Srivastava, Ziqian Bai, Feitong Tan, Guofeng Zhang, Zhaopeng Cui, Sean Fanello, Yinda Zhang
General AI
Digital humans are fundamental to immersive interaction, yet creating a unified model for holistic modalities, including text, audio, motion, and visual content, remains an open challenge. In this paper, we present Archon, a fully pretrained, human-centric unified multimodal model for holistic avatar generation. Archon…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-05-28 · Daniel Dold, Emanuel Sommer, Julius Kobialka, Oliver Dürr, David Rügamer
General AI
While parameter-efficient fine-tuning methods like low-rank adaptation (LoRA) are standard for large language models, principled estimation of epistemic uncertainty remains challenging. Recent results in the LoRA regime suggest that discrete multi-mode approaches such as deep ensembles offer little benefit over single-…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-05-28 · Felix Zhou, Anay Mehrotra, Quanquan C. Liu
General AI
Frontier reasoning models are produced by posttraining base language models with reinforcement learning. Recent work has challenged this by showing that sampling from a sharpened version of the base model's distribution, a so-called power distribution, elicits comparable reasoning without additional training, curated d…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-05-28 · Xiaona Zhou, Muntasir Wahed, Tianjiao Yu, Constantin Brif, Ismini Lourentzou
General AI
Recent advances in Vision-Language Models (VLMs) have achieved impressive performance across many tasks, yet prior studies report unsatisfactory performance when applying large language or multimodal models to finding abnormal patterns in sequential data. Public anomaly detection benchmarks typically provide interval a…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-06-29 · Ting-Wen Ko, Jonas Geiping
General AI
Large language models (LLMs) are increasingly used in open-ended multi-agent settings, but the long-run dynamics of model--model interaction remain poorly understood. We study whether open-ended LLM discussions exhibit attractor-like behavior, i.e. topic-independent stable sets of behaviors which conversations settle i…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-06-29 · Yulin Zhou, Yimeng Wang, Nengyu Wang, Shaojia Xing, Shiyun Tu, Xiang Li, Jingkai Zhang, Ningbo Jiang, Yuankai Lin, Hua Yang, Xiangrui Zeng, Zhouping Yin
General AI
General-purpose robot policies should be modeled as dynamical systems, yet many VLA and generative imitation policies still rely on present observations or short windows. This Markovian shortcut fails in memory-dependent manipulation: identical observations can demand different actions after different histories. We pre…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-06-29 · Iliana Fayolle, Sihem Bouhenniche, Samuel Pélissier, Pierre Laperdrix, Clémentine Maurice, Walter Rudametkin
Research Track B · General AI
Since 2023, a new class of bots has emerged: Web Agents. They can automate complex tasks on the Web, going beyond traditional browser automation tools such as Selenium, Puppeteer, or Playwright. Leveraging large language models (LLMs), these agents are capable of solving anti-bot mechanisms, mimicking human behavior, a…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-06-29 · Mohit Raghavendra, Anisha Gunjal, Aakash Sabharwal, Yunzhong He
Research Track A · General AI
We introduce SWE-Interact, a new testbed for evaluating coding agents on multi-turn, interactive, user-driven software engineering tasks. Existing frontier SWE benchmarks typically provide complete requirements upfront and evaluate agents on autonomous implementation. In contrast, SWE-Interact places agents in a realis…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.8
2026-06-29 · Bryce Grant, Aryeh Rothenberg, Logan Senning, Zonghe Chua, Zach Patterson, Peng Wang
General AI
We present Sequential Planning via Anchored Robotic Keypoints, SPARK, a training-free neurosymbolic manipulation system that reaches 43.7% on six LIBERO-PRO position \& task cells, more than doubling CaP-Agent0 and Vision-Language-Action (VLA) baselines. CaP-Agent0, a multi-turn code-generation agent, achieves 18.2% by…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.6
2026-07-02 · Xuehui Wang, Xuankun Yang, Wei Shen
General AI
Visual token pruning is a crucial strategy for accelerating VLMs by compressing redundant image patches, yet existing methods often fail to preserve critical cues under dense instructions and fine-grained queries. In this paper, we investigate this failure and identify two underlying bottlenecks: the widespread dispers…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.6
2026-07-02 · Juanwu Lu, Junyu Zhu, Ziran Wang
General AI
Realistic traffic simulation requires agents that imitate logged behavior and can also be steered along interpretable axes. Such controllability enables engineers to isolate variables, reproduce specific edge cases, and test autonomous systems without real-world risk. We introduce Controllable Neural Variational Agents…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.6
2026-07-02 · Zhuowei Chen, Xiang Lorraine Li
General AI
Post-training large language models (LLMs) without real-world interaction feedback or human-labeled supervision remains challenging, particularly in specialized domains where expert annotations are costly to obtain. Recent annotation-free self-evolution methods address this by using the model's own outputs as supervisi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.6
2026-07-02 · Arman Ghaffarizadeh, Danyal Mohaddes, Aliakbar Izadkhah, Shahriar Noroozizadeh
General AI
LLM agents will increasingly act in socially structured settings where role, audience, and relational context can shape what is advantageous or costly to say. We study whether such social structure, without any explicit objective in the prompt, changes what an agent expresses publicly relative to an off-the-record (OTR…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.5
2026-02-02 · Yanjia Huang, Yunuo Chen, Ying Jiang, Jinru Han, Zhengzhong Tu, Yin Yang, Chenfanfu Jiang
General AI
The ability to transform a flat sheet into a complex three-dimensional structure is a fundamental test of physical intelligence. Unlike cloth manipulation, origami is governed by strict geometric axioms and hard kinematic constraints, where a single invalid crease or collision can invalidate the entire folding sequence…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.5
2026-03-04 · Hung Tran, Langston Nashold, Rayan Krishnan, Antoine Bigeard, Alex Gu
Research Track B · General AI
Code generation has emerged as one of AI's highest-impact use cases, yet existing benchmarks measure isolated tasks rather than the complete "zero-to-one" process of building a working application from scratch. We introduce Vibe Code Bench, a benchmark of 100 web application specifications (50 public validation, 50 hel…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 11.5
2026-03-19 · Haochen Zhao, Shaoyang Cui
Research Track B · General AI
Autonomous web agents such as \textbf{OpenClaw} are rapidly moving into high-impact real-world workflows, but their security robustness under live network threats remains insufficiently evaluated. Existing benchmarks mainly focus on static sandbox settings and content-level prompt attacks, which leaves a practical gap …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 11.5
2026-03-22 · Liang Ding
Research Track B · General AI
LLM-as-Judge evaluation fails agent tasks because a fixed rubric cannot capture what matters for this task: code debugging demands Correctness and Error Handling; web navigation demands Goal Alignment and Action Efficiency. We present ADARUBRIC, which closes this gap by generating task-specific evaluation rubrics on th…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.5
2026-03-29 · Meituan LongCat Team, Bin Xiao, Chao Wang, Chengjiang Li, Chi Zhang, Chong Peng, Hang Yu, Hao Yang, Haonan Yan, Haoze Sun, Haozhe Zhao, Hong Liu, Hui Su, Jiaqi Zhang, Jiawei Wang, Jing Li, Kefeng Zhang, Manyuan Zhang, Minhao Jing, Peng Pei, Quan Chen, Taofeng Xue, Tongxin Pan, Xiaotong Li, Xiaoyang Li, Xiaoyu Zhao, Xing Hu, Xinyang Lin, Xunliang Cai, Yan Bai, Yan Feng, Yanjie Li, Yao Qiu, Yerui Sun, Yifan Lu, Ying Luo, Yipeng Mei, Yitian Chen, Yuchen Xie, Yufang Liu, Yufei Chen, Yulei Qian, Yuqi Peng, Zhihang Yu, Zhixiong Han, Changran Wang, Chen Chen, Dian Zheng, Fengjiao Chen, Ge Yang, Haowei Guo, Haozhe Wang, Hongyu Li, Huicheng Jiang, Jiale Hong, Jialv Zou, Jiamu Li, Jianping Lin, Jiaxing Liu, Jie Yang, Jing Jin, Jun Kuang, Juncheng She, Kunming Luo, Kuofeng Gao, Lin Qiu, Linsen Guo, Mianqiu Huang, Qi Li, Qian Wang, Rumei Li, Siyu Ren, Wei Wang, Wenlong He, Xi Chen, Xiao Liu, Xiaoyu Li, Xu Huang, Xuanyu Zhu, Xuezhi Cao, Yaoming Zhu, Yifei Cao, Yimeng Jia, Yizhen Jiang, Yufei Gao, Zeyang Hu, Zhenlong Yuan, Zijian Zhang, Ziwen Wang
General AI
The prevailing Next-Token Prediction (NTP) paradigm has driven the success of large language models through discrete autoregressive modeling. However, contemporary multimodal systems remain language-centric, often treating non-linguistic modalities as external attachments, leading to fragmented architectures and subopt…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.5
2026-03-30 · Alkis Sygkounas, Rishi Hazra, Andreas Persson, Pedro Zuidberg Dos Martires, Amy Loutfi
Research Track A · General AI
A central challenge in building continually improving agents is that training environments are typically static or manually constructed. This restricts continual learning and generalization beyond the training distribution. We address this with COvolve, a co-evolutionary framework that leverages large language models (…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.5
2026-03-30 · Deepak Akkil, Mowafak Allaham, Amal Raj, Tamer Abuelsaad, Ravi Kokku
Research Track B · General AI
Reliable evaluation of AI agents operating in complex, real-world environments requires methodologies that are robust, transparent, and contextually aligned with the tasks agents are intended to perform. This study identifies persistent shortcomings in existing AI agent evaluation practices that are particularly acute …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.5
2026-04-07 · Chenghao Li, Jun Liu, Songbo Zhang, Huadong Jian, Hao Ni, Lik-Hang Lee, Sung-Ho Bae, Guoqing Wang, Yang Yang, Chaoning Zhang
General AI
Multimodal LLM agents operating in complex game environments must continually reuse past experience to solve new tasks efficiently. In this work, we propose Echo, a transfer-oriented memory framework that enables agents to derive actionable knowledge from prior interactions rather than treating memory as a passive repo…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.5
2026-04-09 · Ziqi Cai, Taoyu Yang, Zheng Chang, Si Li, Han Jiang, Shuchen Weng, Boxin Shi
General AI
Diffusion models have achieved remarkable progress in video generation, but their controllability remains a major limitation. Key scene factors such as layout, lighting, and camera trajectory are often entangled or only weakly modeled, restricting their applicability in domains like filmmaking and virtual production wh…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.5
2026-04-10 · Jingyu Zhang, Tianjian Li, William Jurayj, Hongyuan Zhan, Benjamin Van Durme, Daniel Khashabi
General AI
Large language model agents receive instructions from many sources-system messages, user prompts, tool outputs, and more-each carrying different levels of trust and authority. When these instructions conflict, models must reliably follow the highest-privilege instruction to remain safe and effective. The dominant parad…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.5
2026-04-12 · Yu Li, Xiaoran Shang, Qizhi Pei, Yun Zhu, Xin Gao, Honglin Lin, Zhanping Zhong, Zhuoshi Pan, Zheng Liu, Xiaoyang Wang, Conghui He, Dahua Lin, Feng Zhao, Lijun Wu
General AI
Post-training data plays a pivotal role in shaping the capabilities of Large Language Models (LLMs), yet datasets are often treated as isolated artifacts, overlooking the systemic connections that underlie their evolution. To disentangle these complex relationships, we introduce the concept of data lineage to the LLM e…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.5
2026-04-20 · Jiaqi Wang, Haoge Deng, Ting Pan, Yang Liu, Chengyuan Wang, Fan Zhang, Yonggang Qi, Xinlong Wang
General AI
Uniform Discrete Diffusion Model (UDM) has recently emerged as a promising paradigm for discrete generative modeling; however, its integration with reinforcement learning remains largely unexplored. We observe that naively applying GRPO to UDM leads to training instability and marginal performance gains. To address thi…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.5
2026-04-24 · Shaoang Li, Yanhang Shi, Yufei Li, Mingfu Liang, Xiaohan Wei, Yunchen Pu, Fei Tian, Chonglin Sun, Frank Shyu, Luke Simon, Sandeep Pandey, Xi Liu, Jian Li
General AI
Large Language Models (LLMs) can reason well, yet often miss decisive evidence when it is buried in long, noisy contexts. We introduce HiLight, an Evidence Emphasis framework that decouples evidence selection from reasoning for frozen LLM solvers. HiLight avoids compressing or rewriting the input, which can discard or …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.5
2026-04-27 · Han Wang, Xiaodong Yu, Jialian Wu, Jiang Liu, Ximeng Sun, Mohit Bansal, Zicheng Liu
General AI
Large language models (LLMs) achieve strong reasoning performance by allocating substantial computation at inference time, often generating long and verbose reasoning traces. While recent work on efficient reasoning reduces this overhead through length-based rewards or pruning, many approaches are post-trained under a …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.5
2026-05-01 · Zi-Bo Qin, Feng-Feng Wei, Tai-You Chen, Wei-Neng Chen
General AI
Distributed blackbox consensus optimization is a fundamental problem in multi-agent systems, where agents must improve a global objective using only local objective queries and limited neighbor communication. Existing methods largely rely on handcrafted update rules and static cooperation patterns, which often struggle…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.5
2026-05-12 · Phu-Hoa Pham, Chi-Nguyen Tran, Nguyen Lam Phu Quy, Dao Sy Duy Minh, Huynh Trung Kiet, Long Tran-Thanh
Research Track A · General AI
Streaming decision trees are natural candidates for open-world continual learning, as they perform local updates, enjoy bounded memory, and static decision boundaries. Despite these, they still fail in online class-incremental learning due to two coupled miscalibrations: (i) their split criterion grows unreliable as th…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.5
2026-06-04 · Yang Li, Jiaxiang Liu, Jiang Cai, Mingkun Xu
General AI
A situated query like "where is Lin Wei?" often encodes more than its literal content: the user may also want to know whether Lin Wei is free, in a good mood, or worth interrupting now. Standard tool-use agents answer the literal question and stop. AURA inserts an inference step between scene perception and tool use th…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.5
2026-06-04 · Eros Fanì, Oğuzhan Ersoy
General AI
Foundational Large Language Models (LLMs) demonstrate proficiency on a wide range of general tasks, and achieve remarkable results on various specialized tasks via domain-expert LLMs. With the ever-growing list of available LLMs, inference routers are being proposed to select the most appropriate LLM for each prompt. H…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.5
2026-06-08 · Daniel Vila-Cruz, Laura Morán-Fernández, Verónica Bolón-Canedo
Research Track A
We present HydraCIL, a decoupled continual learning model based on prototype-guided multi-head classifiers, targeting sustainable deployment in embedded and resource-constrained environments. While most Class-Incremental Learning (CIL) methods rely on powerful hardware and long retraining cycles, real-world systems, su…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.5
2026-06-08 · Andries Rosseau, Robert Müller, Ann Nowé
Research Track A · General AI
Continual training of deep neural networks under non-stationarity often leads to a progressive loss of plasticity, eventually limiting further learning. We relate plasticity to the empirical Neural Tangent Kernel, and identify dynamical isometry (the condition that layer-wise Jacobian singular values remain close to on…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.5
2026-06-15 · Prasanth YSS, Zhichen Ren, Rasa Hosseinzadeh, Ilan Gofman, Yuqi Chen, Zhaoyan Liu, Guangwei Yu, Jesse C. Cresswell, Satya Krishna Gorti
General AI
Reinforcement learning with verifiable rewards (RLVR) improves language-model reasoning, but GRPO-style optimization remains prone to collapse. We analyse this instability through token-level gradient dynamics, deriving a taxonomy that predicts how updates affect next-token probabilities and entropy. The taxonomy shows…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.5
2026-06-16 · Kathrin Korte, Christian Medeiros Adriano, Joachim Winther Pedersen, Eleni Nisioti, Sebastian Risi
Research Track A
Compositional learning systems must balance plasticity, the ability to acquire new knowledge, with stability, the preservation of previously learned components, especially when tasks share structure and risk interference. We study how modular architecture, task similarity, and representational dimensionality jointly sh…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.5
2026-06-16 · Yatai Ji, An-Chieh Cheng, Yang Fu, Yukang Chen, Han Zhang, Zhaojing Yang, Wei Huang, Ka Chun Cheung, Song Han, Vidya Nariyambut Murali, Pavlo Molchanov, Jan Kautz, Simon See, Hongxu Yin, Ping Luo, Sifei Liu
General AI
Spatial VLMs have made substantial progress in geometric perception, yet complex spatial reasoning requiring multi-step inference over depth, distance, and scene relations remains challenging. Moreover, different spatial queries call for fundamentally different strategies: some are best addressed through purely linguis…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.5
2026-06-17 · Mengyu Ye, Keito Kudo, Wataru Ikeda, Ryosuke Matsuda, Keisuke Sakaguchi, Jun Suzuki
General AI
Diffusion models have become a promising alternative to autoregressive models. Among these, uniform diffusion language models (UDLMs) permit any token to be updated at any step, in principle enabling more flexible generation. However, no UDLM has yet been pretrained from scratch at both large parameter scale and large …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.5
2026-06-29 · Xinlei Yu, Gen Li, Qingyi Si, Guibin Zhang, Yuqi Xu, Congcong Wang, Shuai Dong, Kaiwen Tuo, Xiangyu Zeng, Kaituo Feng, Qunzhong Wang, Yang Shi, Xiaobin Hu, Xiangyu Yue, Jiaqi Wang, Shuicheng Yan
Research Track A · General AI
On-policy distillation (OPD) offers superior capacity transfer by supervising student-sampled trajectories with dense token-level signals. To furnish high-quality supervision sources and thereby elevate the performance frontier of distillation, an intuitive direction is to infuse privileged information to either teache…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.5
2026-06-29 · Yiting Hu, Lingjie Duan
Research Track A · General AI
Continual learning (CL), where a model is trained on a sequence of data tasks, is increasingly being adopted across key fields such as large language models and image recognition, yet it remains highly vulnerable to data poisoning that triggers learning divergence or severe excess risk. Despite these threats, a princip…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.4
2026-06-22 · Shanhui Zhao, Jiacheng Liu, Guohong Liu, Jichao Yan, Jialei Ye, Yuhao Yang, Hao Wen, Shizuo Tian, Yizhen Yuan, Yuxuan Chen, Yunxin Liu, Ju Ren, Ya-Qin Zhang, Chao Huang, Yao Guo, Yuanchun Li
General AI
AI agents are driving a new software paradigm, with the ability to autonomously call tools, extract information, manage memory, and complete tasks that span applications and data sources. Most existing end-user operating systems, however, are designed for application-centric workflows and offer little native support fo…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.4
2026-06-23 · Jiayi Lei, Yuandong Pu, Xingyu Han, Rongpeng Zhu, Jing Xu, Jinyao Wang, Zijian Zhou, Bin Fu, Yuewen Cao, Yihao Liu, Yongsheng Li
General AI
Text-to-image (T2I) generation models have achieved remarkable progress in producing visually realistic images from natural language prompts. Yet it remains unclear whether their success reflects genuine causal understanding or sophisticated pattern matching over visual-textual correlations. Inspired by Russell's induc…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.4
2026-06-23 · Yuru Wang, Lejun Cheng, Yuxin Zuo, Sihang Zeng, Bingxiang He, Che Jiang, Junlin Yang, Yuchong Wang, Kaikai Zhao, Weifeng Huang, Kai Tian, Zhenzhao Yuan, Jincheng Zhong, Weizhi Wang, Ning Ding, Bowen Zhou, Kaiyan Zhang
General AI
We introduce NatureBench, a cross-discipline benchmark of 90 tasks distilled from peer-reviewed Nature-family publications, designed to evaluate whether AI coding agents can move beyond reproduction toward discovery on real scientific problems. NatureBench is built on NatureGym, an automated pipeline that constructs a …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.4
2026-06-23 · Bingxuan Li, Yining Hong, Cheng Qian, Hyeonjeong Ha, Jiateng Liu, Zhenhailong Wang, Yue Guo, Yunzhu Li, Heng Ji
General AI
Physical interactions follow a long-tailed distribution: a set of common and regular interactions dominates human experience and visual data, while a broad spectrum of rare and irregular interactions remains underrepresented. Although recent visual world models, including image and video generation models, achieve impr…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.4
2026-06-24 · Fangzheng Li, Aimin Zhang, Chen Lv
General AI
Tool Calling and Structured Output are two core capabilities of modern Agent systems, yet their interaction under joint deployment conditions remains insufficiently understood. This paper reports a reproducible phenomenon observed in a production Agent system: when Tool Calling and JSON Schema constraints are simultane…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.4
2026-06-24 · Simon Kurgan, Evan Wang, Eric Leonen, Sophie Szeto, Luke Alexander, Artemii Remizov, Jarod Alper, Giovanni Inchiostro, Vasily Ilin
General AI
Mathematical knowledge is organized around statements and their dependencies, but this structure is exposed unevenly: informal papers cite mostly at the document level, while formal libraries record fine-grained dependencies over a much smaller body of mathematics. We introduce TheoremGraph, a unified statement-level d…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-03-25 · Zhuoran Li, Zhiyang Li, Kaijun Zhou, Jinyu Gu
Research Track A · General AI
Despite the promise of Vision-Language-Action (VLA) models as generalist robotic controllers, their robustness against perceptual noise and environmental variations in out-of-distribution (OOD) tasks remains fundamentally limited by the absence of long-term memory, causal failure attribution, and dynamic intervention c…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-03-26 · Zehao Wang, Huaide Jiang, Shuaiwu Dong, Yuping Wang, Hang Qiu, Jiachen Li
General AI
Human driving behavior is inherently personal, which is shaped by long-term habits and influenced by short-term intentions. Individuals differ in how they accelerate, brake, merge, yield, and overtake across diverse situations. However, existing end-to-end autonomous driving systems either optimize for generic objectiv…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-03-27 · Wonyoung Lee, Wooseong Jeong, Kuk-Jin Yoon
General AI
Model merging combines independently fine-tuned checkpoints without joint multi-task training. In the era of foundation-model, fine-tuning with Low-Rank Adaptation (LoRA) is prevalent, making LoRA merging a promising target. Existing approaches can work in homogeneous settings where all target tasks are classification …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-03-30 · Haozhe Qi, Kevin Qu, Mahdi Rad, Rui Wang, Alexander Mathis, Marc Pollefeys
General AI
Long video understanding remains challenging for Multi-modal Large Language Models (MLLMs) due to high memory costs and context-length limits. Prior approaches mitigate this by scoring and selecting frames/tokens within short clips, but they lack a principled mechanism to (i) compare relevance across distant video clip…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-03-30 · Philip Schroeder, Thomas Weng, Karl Schmeckpeper, Eric Rosen, Stephen Hart, Ondrej Biza
General AI
Vision-language models (VLMs) have shown impressive capabilities across diverse tasks, motivating efforts to leverage these models to supervise robot learning. However, when used as evaluators in reinforcement learning (RL), today's strongest models often fail under partial observability and distribution shift, enablin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-02 · Minda Zhao, Yutong Yang, Chufei Peng, Rachel Gonsalves, Weiyue Li, Ruyi Yang, Zhixi Liu, Mengyu Wang
General AI
Emotional tone is pervasive in human communication, yet its influence on large language model (LLM) behaviour remains unclear. Here, we examine how first-person emotional framing in user-side queries affect LLM performance across six benchmark domains, including mathematical reasoning, medical question answering, readi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-02 · Klemens Iten, Bruce Lee, Chenhao Li, Lenart Treven, Andreas Krause, Bhavya Sukhija
General AI
Learning-based control methods typically assume stationary system dynamics, an assumption often violated in real-world systems due to drift, wear, or changing operating conditions. We study reinforcement learning for control under time-varying dynamics. We consider a continual model-based reinforcement learning setting…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-02 · Masafumi Enomoto, Ryoma Obara, Haochen Zhang, Masafumi Oyamada
Research Track B · General AI
Web agents based on large language models (LLMs) rely on observations of web pages -- commonly represented as HTML -- as the basis for identifying available actions and planning subsequent steps. Prior work has treated the verbosity of HTML as an obstacle to performance and adopted observation reduction as a standard p…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-04 · Hessen Bougueffa Eutamene, Abdellah Zakaria Sellam, Abdelmalik Taleb-Ahmed, Abdenour Hadid
Research Track A · General AI
Detecting AI-generated images remains a significant challenge because detectors trained on specific generators often fail to generalize to unseen models; however, while pixel-level artifacts vary across models, frequency-domain signatures exhibit greater consistency, providing a promising foundation for cross-generator…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-06 · Parsa Hosseini, Sumit Nawathe, Mahdi Salmani, Meisam Razaviyayn, Soheil Feizi
General AI
Large reasoning models rely on long chain-of-thought generation to solve complex problems, but extended reasoning often incurs substantial computational cost and can even degrade performance due to overthinking. A key challenge is determining when the model should stop reasoning and produce the final answer. In this wo…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-06 · Mingzhe Du, Luu Anh Tuan, Dong Huang, See-kiong Ng
Research Track A · General AI
The accelerating pace of scientific publishing makes it increasingly difficult for researchers to stay current. We present Paper Espresso, an open-source platform that automatically discovers, summarizes, and analyzes trending arXiv papers. The system uses large language models (LLMs) to generate structured summaries w…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-06 · LM-Provers, Yuxiao Qu, Amrith Setlur, Jasper Dekoninck, Edward Beeching, Jia Li, Ian Wu, Lewis Tunstall, Aviral Kumar
General AI
Proprietary AI systems have recently demonstrated impressive capabilities on complex proof-based problems, with gold-level performance reported at the 2025 International Mathematical Olympiad (IMO). However, the training pipelines behind these systems remain largely undisclosed, and their reliance on large "internal" m…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-09 · Kaiyuan Tian, Yu Tang, Gongqingjian Jiang, Baihui Liu, Yifu Gao, Xialin Su, Linbo Qiao, Dongsheng Li
General AI
Full-parameter fine-tuning of large language models is constrained by substantial GPU memory requirements. Low-rank adaptation methods mitigate this challenge by updating only a subset of parameters. However, these approaches often limit model expressiveness and yield lower performance than full-parameter fine-tuning. …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-09 · Onkar Susladkar, Dong-Hwan Jang, Tushar Prakash, Adheesh Juvekar, Vedant Shah, Ayush Barik, Nabeel Bashir, Muntasir Wahed, Ritish Shrirao, Ismini Lourentzou
General AI
We introduce RewardFlow, an inversion-free framework that steers pretrained diffusion and flow-matching models at inference time through multi-reward Langevin dynamics. RewardFlow unifies complementary differentiable rewards for semantic alignment, perceptual fidelity, localized grounding, object consistency, and human…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-13 · Manuela González-González, Soufiane Belharbi, Muhammad Osama Zeeshan, Masoumeh Sharafi, Muhammad Haseeb Aslam, Lorenzo Sia, Nicolas Richet, Marco Pedersoli, Alessandro Lameiras Koerich, Simon L Bacon, Eric Granger
General AI
Using behavioural science, health interventions focus on behaviour change by providing a framework to help patients acquire and maintain healthy habits that improve medical outcomes. In-person interventions are costly and difficult to scale, especially in resource-limited regions. Digital health interventions offer a c…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-13 · Jinhui Ye, Ning Gao, Senqiao Yang, Jinliang Zheng, Zixuan Wang, Yuxin Chen, Pengguang Chen, Yilun Chen, Shu Liu, Jiaya Jia
General AI
Vision-Language-Action (VLA) models have recently emerged as a promising paradigm for building general-purpose robotic agents. However, the VLA landscape remains highly fragmented and complex: as existing approaches vary substantially in architectures, training data, embodiment configurations, and benchmark-specific en…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-14 · Anne Lee, Gurudutt Hosangadi
Research Track A · General AI
The rapid advancement of AI has changed the character of HPC usage such as dimensioning, provisioning, and execution. Not only has energy demand been amplified, but existing rudimentary continual learning capabilities limit ability of AI to effectively manage HPCs. This paper reviews emerging directions beyond monolith…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-14 · Guoxin Chen, Jie Chen, Lei Chen, Jiale Zhao, Fanzhe Meng, Wayne Xin Zhao, Ruihua Song, Cheng Chen, Ji-Rong Wen, Kai Jia
General AI
Autonomous AI research has advanced rapidly, but long-horizon ML research engineering remains difficult: agents must sustain coherent progress across task comprehension, environment setup, implementation, experimentation, and debugging over hours or days. We introduce AiScientist, a system for autonomous long-horizon e…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-16 · Kiran Purohit, Ramasuri Narayanam, Soumyabrata Pal
General AI
Speculative decoding (SD) accelerates large language model inference by allowing a lightweight draft model to propose outputs that a stronger target model verifies. However, its token-centric nature allows erroneous steps to propagate. Prior approaches mitigate this using external reward models, but incur additional la…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-16 · Chen Wang, Lai Wei, Yanzhi Zhang, Chenyang Shao, Zedong Dan, Weiran Huang, Ge Lan, Yue Wang
General AI
Recent advances in reinforcement learning (RL) have improved the reasoning capabilities of large language models (LLMs) and vision-language models (VLMs). However, the widely used Group Relative Policy Optimization (GRPO) consistently suffers from entropy collapse, causing the policy to converge prematurely and lose di…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-17 · Ruiyang Wang, Hao-Lun Hsu, Jiwoo Kim, Miroslav Pajic
General AI
Coordinating multi-robot systems (MRS) to search in unknown environments is particularly challenging for tasks that require semantic reasoning beyond geometric exploration. Classical coordination strategies rely on frontier coverage or information gain and cannot incorporate high-level task intent, such as searching fo…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-18 · Jinchang Zhu, Jindong Li, Cheng Zhang, Jiahong Liu, Menglin Yang
Research Track A · General AI
Long-term memory is a critical challenge for Large Language Model agents, as fixed context windows cannot preserve coherence across extended interactions. Existing memory systems represent conversation history as unstructured embedding vectors, retrieving information through semantic similarity. This paradigm fails to …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-19 · Ziqing Zhuang, Linhai Zhang, Jiasheng Si, Deyu Zhou, Yulan He
Research Track A · General AI
Large language models (LLMs) have demonstrated strong reasoning capabilities, and as existing approaches for enhancing LLM reasoning continue to mature, increasing attention has shifted toward meta-reasoning as a promising direction for further improvement. However, most existing meta-reasoning methods remain episodic:…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-21 · Yutian Chen, Shi Guo, Renbiao Jin, Tianshuo Yang, Xin Cai, Yawen Luo, Mingxin Yang, Mulin Yu, Linning Xu, Tianfan Xue
General AI
Sparse-view 3D reconstruction is essential for modeling scenes from casual captures, but remain challenging for non-generative reconstruction. Existing diffusion-based approaches mitigates this issues by synthesizing novel views, but they often condition on only one or two capture frames, which restricts geometric cons…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-21 · Zhihong Zhang, Jie Zhao, Xiaojian Huang, Jin Xu, Zhuodong Luo, Xin Liu, Jiansheng Wei, Xuejin Chen
General AI
Multimodal reward models (MRMs) play a crucial role in aligning Multimodal Large Language Models (MLLMs) with human preferences. Training a good MRM requires high-quality multimodal preference data. However, existing preference datasets face three key challenges: lack of granularity in preference strength, textual styl…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-21 · Feihao Fang, My T. Thai, Yuanyuan Lei
General AI
Large Language Models (LLMs) still struggle with multi-step logical reasoning. Existing approaches either purely refine the reasoning chain in natural language form or attach a symbolic solver as an external module. In this work, we instead ask whether LLMs contain a shared internal logical subspace that simultaneously…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-21 · Perry Dong, Alexander Swerdlow, Dorsa Sadigh, Chelsea Finn
General AI
Some of the most performant reinforcement learning algorithms today can be prohibitively expensive as they use test-time scaling methods such as sampling multiple action candidates and selecting the best one. In this work, we propose FASTER, a method for getting the benefits of sampling-based test-time scaling of diffu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-21 · Yiwen Qiu, Linjuan Wu, Yizhou Liu, Yuchen Yan, Jin Ma, Xu Tan, Yao Hu, Daoxin Zhang, Wenqi Zhang, Weiming Lu, Jun Xiao, Yongliang Shen
General AI
Large language models have achieved remarkable progress on complex reasoning tasks. However, they often implicitly fabricate information when inputs are incomplete, producing confident but unreliable conclusions -- a failure mode we term ungrounded reasoning. We argue that this issue arises not from insufficient reason…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-22 · Yupeng Zheng, Xiang Li, Songen Gu, Yuhang Zheng, Shuai Tian, Weize Li, Linbo Wang, Senyu Fei, Pengfei Li, Yinfeng Gao, Zebin Xing, Yilun Chen, Qichao Zhang, Haoran Li, Wenchao Ding
General AI
Recent advances in Vision-Language-Action (VLA) models have opened new avenues for robot manipulation, yet existing methods exhibit limited efficiency and a lack of high-level knowledge and spatial awareness. To address these challenges, we propose PokeVLA, a lightweight yet powerful foundation model for embodied manip…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-22 · Jiahao Xie, Alessio Tonioni, Nathalie Rauschmayr, Federico Tombari, Bernt Schiele
General AI
Large vision-language models (LVLMs) have demonstrated impressive performance in various multimodal understanding and reasoning tasks. However, they still struggle with object hallucinations, i.e., the claim of nonexistent objects in the visual input. To address this challenge, we propose Region-aware Chain-of-Verifica…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-23 · Ceyuan Yang, Zhijie Lin, Yang Zhao, Fei Xiao, Hao He, Qi Zhao, Chaorui Deng, Kunchang Li, Zihan Ding, Yuwei Guo, Fuyun Wang, Fangqi Zhu, Xiaonan Nie, Shenhan Zhu, Shanchuan Lin, Hongsheng Li, Weilin Huang, Guang Shi, Haoqi Fan
General AI
We present Omni, a unified multimodal model natively trained on diverse modalities, including text, images, videos, 3D geometry, and hidden representations. We find that such training enables Context Unrolling, where the model explicitly reasons across multiple modal representations before producing predictions. This p…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-24 · Yunquan Chen, Haoyu Chen
General AI
Understanding social dominance in animal behavior is critical for neuroscience and behavioral studies. In this work, we explore the capability of Multimodal Large Language Models(MLLMs) to analyze raw behavioral video of mice and predict their dominance hierarchy. We introduce MTT-Bench, a novel benchmark comprising an…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-27 · Cheng Wang, Zhibin He, Zhihao Peng, Shengyuan Liu, Yufan Hu, Lichao Sun, Xiang Li, Yixuan Yuan
General AI
Agentic artificial intelligence systems promise to accelerate scientific workflows, but neuroimaging poses unique challenges: heterogeneous modalities (sMRI, fMRI, dMRI, EEG), long multi-stage pipelines, and persistent reproducibility risks. To address this gap, we present NeuroClaw, a domain-specialized multi-agent re…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-27 · Amal AKLI, Mike PAPADAKIS, Maxime CORDY, Yves Le TRAON
General AI
Large language models are increasingly used for code generation, yet the correctness of their outputs depends not only on model capability but also on how tasks are specified. Prior studies demonstrate that small changes in natural language prompts, particularly under-specification can substantially reduce code correct…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-28 · Wei-Chun Chen, Yu-Xuan Chen, I-Fang Chung, Ying-Jia Lin
General AI
Accurate nutrient estimation from unstructured recipe text is an important yet challenging problem in dietary monitoring, due to ambiguous ingredient terminology and highly variable quantity expressions. We systematically evaluate models spanning a wide range of representational capacity, from lexical matching methods …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-28 · Pengcheng Fang, Yuxia Chen, Xiaohao Cai
General AI
Video temporal grounding (VTG) aims to localize the start and end timestamps of the event described by a given query within an untrimmed video. Despite the strong open-world video understanding and recognition ability of video language large models (Vid-LLMs), outputting precise temporal grounding information remains c…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-30 · Hao Chen, Jiaming Liu, Zhonghao Yan, Nuowei Han, Renrui Zhang, Chenyang Gu, Jialin Gao, Ziyu Guo, Siyuan Qian, Yinxi Wang, Peng Jia, Chi-Wing Fu, Shanghang Zhang, Pheng-Ann Heng
General AI
Vision-Language-Action (VLA) models have increasingly incorporated reasoning mechanisms for complex robotic manipulation. However, existing approaches share a critical limitation: whether employing explicit linguistic reasoning that suffers from latency and discretization, or utilizing more expressive continuous latent…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-04-30 · Tao Ge, Baolin Peng, Hao Cheng, Jianfeng Gao
General AI
Realistic long-horizon productivity work is strongly conditioned on user-specific computer environments, where much of the work context is stored and organized through directory structures and content-rich artifacts. To scale synthetic data creation for such productivity scenarios, we introduce Synthetic Computers at S…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-05-01 · Yan Fang, Mengcheng Lan, Zilong Huang, Weixian Lei, Yunqing Zhao, Yujie Zhong, Yingchen Yu, Qi She, Yao Zhao, Yunchao Wei
General AI
In this paper, we present \textbf{Gen}erative \textbf{L}anguage-\textbf{I}mage \textbf{P}re-training (GenLIP), a minimalist generative pretraining framework for Vision Transformers (ViTs) designed for multimodal large language models (MLLMs). To better align vision encoders with the autoregressive nature of LLMs, GenLI…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-05-01 · Yuan Li, Jun Hu, Jiaxin Jiang, Bryan Hooi, Bingsheng He
General AI
Multimodal data plays a critical role in web-based recommendation systems, where information from diverse modalities such as vision and text enhances representation learning. However, real-world multimodal datasets often suffer from modality incompleteness due to sensor failures, annotation scarcity, or privacy constra…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-05-01 · Sailesh Panda, Pritam Kadasi, Abhishek Upperwal, Mayank Singh
General AI
Large language models (LLMs) often achieve strong performance on reasoning benchmarks, but final-answer accuracy alone does not show whether they faithfully execute the procedure specified in a prompt. We study this question through a controlled diagnostic benchmark for procedural execution, where models are given a st…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-06-04 · I Putu Adi Pratama, Bahadorreza Ofoghi, Atul Sajjanhar, Shang Gao
General AI
Medical visual question answering (Med-VQA) has strong potential for clinical decision support by enabling AI models to interpret medical images and answer clinically relevant queries. Recent approaches typically connect off-the-shelf vision encoders with large language models (LLMs) through lightweight mapping network…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-06-04 · Shaohui Dai, Yansong Qu, You Shen, Shengchuan Zhang, Liujuan Cao
General AI
Recent advances in 3D multimodal large language models (3D-MLLMs) have enabled unified solutions for 3D scene understanding tasks, including visual question answering, captioning, and referring segmentation. However, existing 3D-MLLMs remain largely object-centric, limiting their ability to model fine-grained part stru…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-06-04 · Xinnong Zhang, Wanting Shan, Hanjia Lyu, Zhongyu Wei, Jiebo Luo
General AI
Large language models are increasingly used to simulate social media users and infer how individuals may respond to online discussions. However, it remains unclear whether these simulations reflect precise user-specific beliefs or whether they are highly sensitive to semantically independent changes in conversational c…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-06-04 · Zhuoming Chen, Xinrui Zhong, Qilong Feng, Ranajoy Sadhukhan, Yang Zhou, Michael Qizhe Shieh, Zhihao Jia, Beidi Chen
General AI
Sparse attention is becoming increasingly important for serving large language models (LLMs) as generation lengths continue to grow. However, deploying and evaluating new sparse attention algorithms at scale remains highly engineering-intensive, slowing both human researchers and AI agents in exploring the sparse atten…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-06-08 · Suraj Biswas, Saurabh Gupta, Pritam Mukherjee
General AI
Ask a pretrained biomedical language model whether "cortisol 28 ug/dL" and "stock-market volatility" are related, and it returns a cosine similarity of 0.83 on a scale where 1.0 means identical. The two share no mechanism. This is not a corner case: every off-the-shelf biomedical encoder we tested (BioBERT, PubMedBERT,…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-06-08 · Hao Shi, Weiye Li, Bin Xie, Yulin Wang, Renping Zhou, Tiancai Wang, Xiangyu Zhang, Ping Luo, Gao Huang
Research Track A · General AI
Temporal modeling is essential for robotic manipulation, as effective control requires both memory of past interactions and imagination of future states. However, most VLA models rely primarily on the current observation and therefore struggle with long-horizon, temporally dependent tasks. Cognitive science suggests th…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-06-08 · Mingxian Lin, Shengju Qian, Yuqi Liu, Yi-Hua Huang, Yiyu Wang, Wei Huang, Yitang Li, Fan Zhang, Zeyu Hu, Lingting Zhu, Xin Wang, Xiaojuan Qi
General AI
Vision-language model (VLM) agents are increasingly deployed in interactive game environments. Yet game benchmarks for VLM agents typically report a single first-attempt score per (agent, game) pair, focus on single-agent Solo play, and lack unified protocols for evaluating heterogeneous agent classes (commercial VLMs,…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-06-10 · Haotao Xie
General AI
Recently, large language models (LLMs) have achieved promising progress in the fields of classical Chinese translation and the generation of classical poetry. However, domain-specific research on precise translation and affective-semantic understanding of classical poetry remains limited. The main challenge is that mos…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-06-11 · Zach Studdiford, Gary Lupyan
General AI
When large language models (LLMs) fail to generalize or make haphazard errors in reasoning, it is often taken as evidence that LLMs are not truly reasoning, but rather performing a kind of pattern matching. The implication is that people's behavior does not exhibit the same types of failures because human reasoning use…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-06-15 · Mariam Elbakry, Aliaa Sayed Sheha, Salma Hassan Tantawy, Aya Yassin, Concetto Spampinato, Karim Lekadir, Xiaomeng Li, Marawan Elbatel
General AI
Multiphasic contrast-enhanced CT (CECT) is widely used for abdominal lesion characterization, yet it carries inherent risks of contrast-induced nephropathy, escalates acquisition burden, and heavily contributes to radiologist workload. To address these challenges, we introduce a novel multi-center benchmark for multi-o…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-06-15 · Mehmet Iscan
General AI
Frozen small code models (<=1.5B parameters, run locally without fine-tuning) suit offline and privacy-constrained use, but often emit plausible-but-wrong programs. A natural remedy is a post-hoc operator that selects, verifies, repairs, or re-processes the model's samples without retraining; in principled form it is P…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-06-16 · Ankita Samaddar, Sandeep Neema, Daniel Balasubramanian, Xenofon Koutsoukos
General AI
With sophisticated cyber-attacks becoming increasingly prevalent, modern networks require intelligent autonomous cyber-defense agents trained via Reinforcement Learning (RL). These agents employ neurosymbolic approaches such as behavior trees with learning-enabled components (LECs) to learn, reason, adapt, and implemen…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-06-16 · Jinghan Wu, Jing Li, Ivor W. Tsang, Xuetao Zhang
General AI
Visual information helps resolve ambiguity in coreference resolution, leading to notable performance gains. However, existing Multi-modal Coreference Resolution (MCR) methods require training with (partially) annotated data from the target dataset before they can be applied, preventing their direct usability and raisin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-06-16 · Weizhi Zhang, Zechen Li, Hamid Palangi, Ben Graef, A. Ali Heydari, Simon A. Lee, Salman Rahman, Ray Luo, Zeinab Esmaeilpour, Erik Schenck, Chloe Zhang, Yamin Li, Menglian Zhou, Philip S. Yu, Daniel McDuff, Lindsey Sunden, Mark Malhotra, Shwetak Patel, Ahmed A. Metwally
General AI
The LLM-empowered personal health agents with user health (sensor) metrics have offered a promising pathway to alleviate global disparities in healthcare access. However, large-scale clinical deployment remains constrained by an open-ended evaluation bottleneck: physician annotation is reliable but costly and unscalabl…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-06-17 · Yijin Wang, Shuyi Wang, Wenhan Zhang, Yuqi Ouyang
General AI
Text-rich images often contain privacy-sensitive, transactional, or decision-relevant information. As recent multimodal image generation models become increasingly capable of synthesizing realistic textual content and structured visual designs, detecting AI-generated text-rich images has become an important challenge f…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-06-17 · Nikita Kachaev, Andrey Moskalenko, Matvey Skripkin, Nikita Kurlaev, Daria Pugacheva, Albina Burlova, Mikhail Kolosov, Denis Shepelev, Andrey Kuznetsov, Elena Tutubalina, Aleksandr I. Panov, Alexey K. Kovalev, Vlad Shakhuro
Research Track A · General AI
Embodied Vision-Language-Action (VLA) models are typically obtained by fine-tuning powerful pretrained VLMs on robotics data, yet it is unclear how much commonsense and factual knowledge they retain after adaptation. Failures on knowledge-sensitive tasks are ambiguous, conflating missing knowledge with poor generalizat…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-06-17 · Ikram Belmadani, Oumaima El Khettari, Carlos Ramisch, Frederic Bechet, Richard Dufour, Benoit Favre
General AI
The development of large language models (LLMs) has led to an increased focus on their adaptation to specialized domains and languages, yet the effectiveness of domain adaptation strategies remains unclear. We present a study of medical domain adaptation using French medical question-answering (QA) as a case study. We …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.2
2026-04-08 · Jiwan Chung, JiHyuk Byun, Vibhav Vineet, Seon Joo Kim
Research Track B · General AI
Web agents act through long interaction sequences, yet existing benchmarks evaluate only terminal success, discarding all process information and offering little guidance on improvement. In this work, we conduct a process-level analysis of web agents. We introduce WebStep, a benchmark of 1,800 task instances with contr…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 11.2
2026-06-23 · Qi Chen, Wenxuan Li, Pedro R. A. S. Bassi, Xinze Zhou, Jakob Wasserthal, Ibrahim Ethem Hamamci, Sezgin Er, Ashwin Kumar, Yiwen Ye, Yuhan Wang, Yuyin Zhou, Akshay S. Chaudhari, Curtis Langlotz, Kang Wang, Yang Yang, Alan L. Yuille, Zongwei Zhou
General AI
Artificial intelligence (AI) has achieved remarkable success in medical imaging, but it is widely recognized that these models often perform inconsistently across real-world clinical settings. Such inconsistencies occur when patient demographics and imaging protocols vary, for example, in detecting small tumors, analyz…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.2
2026-06-24 · Lea Roxanne Muth, Marian Margraf
Research Track A · General AI
This paper presents a novel approach to perform semi-automated BSI IT-Grundschutz certification using a MultiLarge Language Model system (MLS) with Hybrid RetrievalAugmented Generation (HybridRAG). Facing the challenges of the Network and Information Security Directive 2 (NIS2) directive, a shortage of specialists, and…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.2
2026-06-24 · Poojitha Thota, Shirin Nilizadeh
General AI
Training-time data poisoning during fine-tuning poses a significant threat to large language models (LLMs) deployed for abstractive text summarization, where small task-specific datasets exert disproportionate influence on model behavior. In this setting, adversaries manipulate fine-tuning data to induce persistent sum…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.2
2026-06-24 · Babak Rahmani, Sebastian Dziadzio, Joschka Strüber, Sergio Hernández-Gutiérrez, Matthias Bethge
General AI
For most of scientific history, researchers studying behavior could only infer hidden mechanisms from outward actions: an inverse problem that becomes more tractable when observation is augmented by targeted intervention. We pose a computational analogue: given only behavioral traces of an agent in a game environment, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.1
2026-07-02 · Quoc Bao Phan, Tuy Tan Nguyen
General AI
Federated learning (FL) enables collaborative model training across distributed devices without sharing raw data, making it suitable for privacy-sensitive robotic sensing applications. However, multi-agent systems generate heterogeneous and non-independent and identically distributed (non-IID) multimodal sensor streams…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 11.0
2026-03-24 · Xinyao Wu, Zhe Xu, Cheng Chen, Jiawei Ma, Yefeng Zheng, Raymond Kai-yu Tong
Research Track A · General AI
Class-incremental learning (CIL) in medical image-guided diagnosis requires retaining prior diagnostic knowledge while adapting to newly emerging disease categories, which is critical for scalable clinical deployment. This problem is particularly challenging due to heterogeneous data and privacy constraints that preven…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 11.0
2026-04-08 · Wonseon Lim, Jaesung Lee, Dae-Won Kim
Research Track A · General AI
Continual learning (CL) on edge devices requires not only high accuracy but also training-time efficiency to support on-device adaptation under strict memory and computational constraints. While prompt-based continual learning (PCL) is parameter-efficient and achieves competitive accuracy, prior work has focused mainly…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.0
2026-04-14 · Chuang Peng, Wei Zhang, Renshuai Tao, Xinhao Zhang, Jian Yang
Research Track B · General AI
Text-based web agents offer computational efficiency for autonomous web navigation, yet developing robust agents remains challenging due to the noisy and heterogeneous nature of real-world HTML. Standard Supervised Fine-Tuning (SFT) approaches fail in two critical dimensions: they lack discrimination capabilities to re…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.0
2026-04-15 · Aaron Pache, Mark CW van Rossum
Research Track A · General AI
Synaptic plasticity is metabolically expensive, yet animals continuously update their internal models without exhausting energy reserves. However, when artificial neural networks are trained, the network parameters are typically updated on every sample that is presented, even if the sample was classified correctly. Ins…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.0
2026-04-29 · Mingze Li, Yu Rong, Songyou Li, Lihong Wang, Jiacheng Cen, Liming Wu, Anyi Li, Zongzhao Li, Qiuliang Liu, Rui Jiao, Tian Bian, Pengju Wang, Hao Sun, Jianfeng Zhang, Ji-Rong Wen, Deli Zhao, Shifeng Jin, Tingyang Xu, Wenbing Huang
General AI
The discovery of novel materials is critical for global energy and quantum technology transitions. While deep learning has fundamentally reshaped this landscape, existing predictive or generative models typically operate in isolation, lacking the autonomous orchestration required to execute the full discovery process. …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.0
2026-05-01 · Dongxin Guo, Jikun Wu, Siu Ming Yiu
Research Track B · General AI
AI agents execute tens to hundreds of chained LLM calls per task, yet GPU schedulers treat each call as independent, discarding gigabytes of intermediate state between steps and inflating end-to-end latency by 3-8x. We argue that this request-level abstraction is fundamentally mismatched to compound AI workloads, and p…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.0
2026-05-02 · Xiaoyu Yang, En Yu, Jie Lu
Research Track A
In the pursuit of autonomous learning systems, the foundational assumption of stationarity, the premise that data distributions and model behaviors remain constant, is fundamentally untenable. Historically, the research community has addressed non-stationary environments almost exclusively under the scope of concept dr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.0
2026-05-07 · Borui Zhang, Bo Zhang, Bo Wang, Wenzhao Zheng, Yuhao Cheng, Liang Tang, Yiqiang Yan, Jie Zhou, Jiwen Lu
Research Track B · General AI
GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance. Utilizing the proposed \textbf{Masked Prediction Distribution (MPD)} attribution metho…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.0
2026-05-07 · Xin Gao, Ruiyi Zhang, Meixi Du, Peijia Qin, Pengtao Xie
Research Track A · General AI
Despite the success of large language models (LLMs) on general-purpose tasks, their performance in highly specialized domains such as biomedicine remains unsatisfactory. A key limitation is the inability of LLMs to effectively leverage biomedical tools, which clinical experts and biomedical researchers rely on extensiv…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.0
2026-05-07 · Hao Ye, Jisheng Dang, Junfeng Fang, Bimei Wang, Yizhou Zhang, Ning Lv, Wencan Zhang, Hong Peng, Bin Hu, Tat-Seng Chua
Research Track A · General AI
Recent extensive research has demonstrated that the enhanced reasoning capabilities acquired by models through Reinforcement Learning with Verifiable Rewards (RLVR) are primarily concentrated within the rank-1 components. Predicated on this observation, we employed Periodic Rank-1 Substitution and identified a counteri…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.0
2026-05-21 · Karan Goyal
General AI
The rapid proliferation of Vision-Language Models (VLMs) is often framed as enabling unified multimodal knowledge discovery but rests on an under-examined assumption: that current VLMs faithfully synthesise multimodal data. We argue they often do not, and this gap reflects a trustworthiness problem in the dominant Visi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.0
2026-05-21 · Pilchen Hippolyte, Fabre Romain, Signe Talla Franck, Perez Patrick, Grave Edouard
Research Track A · General AI
Large language models (LLMs) are typically trained on shuffled corpora, yielding models whose knowledge is frozen at train time and whose temporal grounding remains poorly understood. In this work, we study the impact of pre-training dynamics on the acquisition of time-sensitive factual knowledge, focusing specifically…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.0
2026-05-27 · Jungwon Park, Jimyeong Kim, Jungmin Ko, Nojun Kwak, Wonjong Rhee
General AI
Diffusion language models decode text by iteratively denoising masked token sequences, making the choice of which positions to decode a central inference-time decision. Most training-free decoding strategies use model confidence for position selection, assuming that high-confidence positions are ready to be decoded. In…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.0
2026-05-28 · Long Phan, Devin Kim, Alexander Pan, Alice Blair, Adam Khoja, Dan Hendrycks
General AI
Large language models (LLMs) exhibit systematic political bias across a variety of sensitive contexts. We find that LLMs handle counterpart topics from opposing political sides asymmetrically. We refer to this phenomenon as covert political bias and identify 7 categories of techniques through which it operates. We prop…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.0
2026-05-28 · Corrado Rainone, Davide Belli, Bence Major, Arash Behboodi
General AI
The design space of agentic AI inference spans two extremes: frontier large language models (LLMs), typically hosted in the cloud and offering strong performance across a wide range of tasks at substantially high cost, and more cost-efficient small language models (SLMs), which are amenable to on-device inference. Hybr…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.0
2026-06-18 · Kaiyue Yang, Yuyan Bu, Jingwei Yi, Yuchi Wang, Biyu Zhou, Juntao Dai, Songlin Hu, Yaodong Yang
General AI
As LLM agents increasingly select tools autonomously, their choices among tools with different privileges become safety-relevant. However, prior tool-selection studies focus on safety-agnostic metadata preferences, leaving privilege-sensitive choices underexplored. To address this gap, we study over-privileged tool sel…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.0
2026-06-18 · Dayeon Kang, Hyejun Jeong, Jade Sheffey, Pubali Datta, Amir Houmansadr
Research Track B · General AI
As AI web agents proliferate, combining large language models with autonomous, browser-level control, indiscriminate content scraping by web agents has emerged as a privacy and security challenge. Existing defenses, such as robots.txt and active bot-blocking, are insufficient, as they are widely violated and easily cir…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.0
2026-06-28 · Ze Huang, Jiahui Zhang, Hairuo Liu, Chenxi Zhang, Ran Cheng, Li Zhang
General AI
We study action-conditioned world modeling as a scalable way to learn transferable dynamics priors for robot learning. By pretraining a model to predict how actions drive visual scene evolution, the resulting world model captures reusable interaction dynamics beyond appearance-level video generation. Concretely, we pre…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 11.0
2026-06-29 · Disen Lan, Jianbin Zheng, Yuxi Ren, Xin Xia, Xuanda Wang, Xuefeng Xiao, Xipeng Qiu, Yu Cheng
General AI
Hybrid attention models improve long-context efficiency by retaining only a subset of full-attention layers and replacing the remaining layers with linear attention. However, the effectiveness of Transformer-to-hybrid conversion critically depends on which layers preserve full attention. Existing hybrid layer selection…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 11.0
2026-06-29 · Zhiqi Li, Chengrui Dong, Zhenhua Du, Hangning Zhou, Cong Qiu, Hailong Qin, Mu Yang, Dongxu Wei, Peidong Liu
General AI
Interactive video generation systems for camera-controlled world exploration roll out growing sequences of latent video frames, entangling state transition with high-frequency observation synthesis. We propose Walking in the Implicit, a scene-centric paradigm that changes the rollout variable from frame latents to a fi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.9
2026-06-23 · J. Fernando Hernandez-Garcia, Tomás Figliolia, Beren Millidge
Research Track A · General AI
The loss of plasticity - the ability of a network to learn new information after having already learned older information - is a fundamental challenge in creating artificial neural networks capable of continual learning. Although this phenomenon has been known for decades, it has mostly been studied in older, relativel…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-05-04 · Jiujiu Chen, Yazheng Liu, Sihong Xie, Hui Xiong
General AI
Large language models excel at complex reasoning, yet evaluating their intermediate steps remains challenging. Although process reward models provide step-wise supervision, they often suffer from a risk compensation effect, where incorrect steps are offset by later correct ones, assigning high rewards to flawed reasoni…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-05-07 · Omar El Khalifi, Thomas Rossi, Oscar Fossey, Thibault Fouque, Ulysse Mizrahi, Philip Torr, Ivan Laptev, Fabio Pizzati, Baptiste Bellot-Gurlet
General AI
For artistic applications, video generation requires fine-grained control over both performance and cinematography, i.e., the actor's motion and the camera trajectory. We present ActCam, a zero-shot method for video generation that jointly transfers character motion from a driving video into a new scene and enables per…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-05-07 · Lujia Zhong, Yihao Xia, Jianwei Zhang, Shuo huang, Jiaxin Yue, Mingyang Xia, Yonggang Shi
General AI
Multimodal neuroimaging analysis often involves complex, modality-specific preprocessing workflows that require careful configuration, quality control, and coordination across heterogeneous toolchains. Beyond preprocessing, downstream statistical analysis and disease classification commonly require task-specific code, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-05-12 · Christos Ziakas, Alessandra Russo, Avishek Joey Bose
General AI
Generative policies based on expressive model classes, such as diffusion and flow matching, are well-suited to complex control problems with highly multimodal action distributions. Their expressivity, however, comes at a significant inference cost: generating each action typically requires simulating many steps of the …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-05-12 · Hannes Büchi, Manon Flageat, Eduardo Sebastián, Amanda Prorok
General AI
Effective multi-agent cooperation requires agents to adopt diverse behaviors as task conditions evolve-and to do so at the right moment. Yet, current Multi-Agent Reinforcement Learning (MARL) frameworks that facilitate this diversity are still limited by the fact that they bind fixed behaviors to fixed agent identities…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-05-21 · Zacharie Chenail-Larcher, Brahim Mahmoudi, Naouel Moha, Quentin Stiévenart, Florent Avellaneda
General AI
Large Language Models (LLMs) are increasingly integrated into software systems for diverse purposes, due to their versatility, flexibility, and ability to simulate human reasoning to some extent. However, poor integration of LLM inference in source code can undermine software system quality. Therefore, inadequate LLM i…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-05-22 · Yifan Yang, Ziyang Gong, Weiquan Huang, Qihao Yang, Ziwei Zhou, Zisu Huang, Yan Li, Xuemei Gao, Qi Dai, Bei Liu, Kai Qiu, Yuqing Yang, Dongdong Chen, Xue Yang, Chong Luo
General AI
Agent skills today are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision, none of which behaves like a deep-learning optimizer for the skill, and none of which reliably improves over its starting point under feedback. We argue the skill should instead be trained as the external state…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-05-28 · Anany Kotawala
General AI
Multi-component LLM agents assemble probabilistic claims from components that each see only part of a joint problem; the composition can violate basic probability axioms even when every component is locally coherent. We formalise this locally coherent, globally incoherent failure via the compositional residual eps*, th…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-05-28 · Qinpei Luo, Ruichun Ma, Xinyu Zhang, Lili Qiu
General AI
Printed circuit board (PCB) schematic design defines nearly all electronic hardware, but it remains manual and expertise-intensive. While generative AI has advanced digital and analog IC design, PCB schematic generation from natural-language intent is largely unexplored. This paper presents SchGen, the first large lang…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-06-29 · Haitao Wu, Qirui Zhang, Zhouheng Yao, Shangquan Sun, Qihao Zheng, Mianxin Liu, Chi Zhang, Wanli Ouyang, Chunfeng Song, Changqing Zhang, Jiamin Wu
General AI
Modeling the bidirectional correspondence between external sensory stimuli and internal neural activity has emerged as a critical frontier in neuroscience. However, existing approaches predominantly treat brain encoding and decoding as isolated tasks, relying heavily on unimodal alignment and external priors while over…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-06-29 · Shun Lei, Huaicheng Zhang, Dapeng Wu, Yaoxun Xu, Lishi Zuo, Wei Tan, Hangting Chen, Guangzheng Li, Jianwei Yu, Zhiyong Wu, Dong Yu
General AI
Full-length song generation must preserve coherence and musicality, render detailed vocal and accompaniment acoustics, and follow lyrics and prompts. Existing language model-based systems face a structural trade-off: mixed-token modeling preserves vocal-instrument coordination but obscures track-specific details, where…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-06-29 · Chuyue Li, Ziqi Tang, Jingyi Wang, Yu Wu, Kazuma Hashimoto, Lingyu Gao
General AI
With the advancement of Large Language Models (LLMs), code error detection has extended beyond traditional IDE diagnostics to context-sensitive debugging in educational scenarios. However, existing approaches lack large-scale datasets, multi-error analysis, and unified error taxonomies. To address this, we introduce Py…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.8
2026-06-29 · Mohamed el amine boudjoghra, Ivan Laptev, Angela Dai
General AI
Articulated 3D objects are essential for interactive environments in embodied AI, robotics, and virtual reality, but reconstructing their structure and motion from sparse observations remains challenging. Existing approaches remain largely constrained by lack of supervised data or lack the priors needed to reliably rec…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.8
2026-07-01 · Shengguang Wu, Hao Zhu, Yuhui Zhang, Xiaohan Wang, Serena Yeung-Levy
General AI
Memory expertise is a learned skill: knowing what to encode, when to retrieve, and how to organize knowledge--a capacity known in cognitive science as metamemory. We bring this perspective to LLMs by treating memory management as a trainable skill. We promote file-system operations to first-class memory actions alongsi…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 10.7
2026-06-18 · Guangyi Liu, Gao Wu, Congxiao Liu, Pengxiang Zhao, Liang Liu, Mading Li, Qi Zhang, Mengyan Wang, Liang Guo, Yong Liu
Research Track B · General AI
MLLM-based mobile GUI agents have made substantial progress on short-horizon tasks, yet remain unreliable on long-horizon tasks that require retaining intermediate facts across many steps and app transitions. We attribute this limitation to ReAct-style prompting, which passively accumulates per-step records, leading to…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.6
2026-07-02 · Josh Hills, Ida Caspary, Asa Cooper Stickland
General AI
As AI coding agents become more autonomous, they increasingly ship code iteratively, with the codebase persisting across sessions. This persistence creates a new attack surface: a misaligned or prompt-injected agent can distribute attacks across pull requests (PRs) and time its payload for the PR with the best natural …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.6
2026-07-02 · Vivienne Ming
General AI
Whether pairing people with AI helps or hurts is usually reported as a single average effect. Using a real-money prediction market (Polymarket) as an objective, externally resolved benchmark, this pilot shows that the value of human-AI collaboration depends on a specific, measurable form of human capital. Analyzed at t…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.6
2026-07-02 · Matteo Boglioni, Thibault Rousset, Siva Reddy, Marius Mosbach, Verna Dankers
General AI
LLMs memorize sensitive training data, including personally identifiable information (PII), creating a pressing need for reliable post hoc removal methods. Unlearning has emerged as a promising solution, with state-of-the-art(SOTA) methods often following a localize-first, unlearn-second paradigm that targets specific …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 10.5
2026-03-11 · Gallil Maimon, Ori Yoran, Felix Kreuk, Michael Hassid, Gal Cohen, Pierre Chambon, Yossi Adi
General AI
A promising research direction in enabling LLMs to generate consistently correct code involves addressing their inability to properly estimate program execution, particularly for code they generate. In this work, we demonstrate that Code LLMs can be trained to simulate program execution in a step-by-step manner and tha…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.5
2026-03-25 · Adidev Jhunjhunwala, Judah Goldfeder, Hod Lipson
Research Track A
A key challenge to understanding self-awareness has been a principled way of quantifying whether an intelligent system has a concept of a "self," and if so how to differentiate the "self" from other cognitive structures. We propose that the "self" can be isolated by seeking the invariant portion of cognitive process th…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 10.5
2026-03-27 · Nicholas Edwards, Sebastian Schuster
General AI
As Large Language Model (LLM) agents are increasingly deployed in open-ended domains like software engineering, they frequently encounter underspecified instructions that lack crucial context. While human developers naturally resolve underspecification by asking clarifying questions, current agents are largely optimize…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-03-28 · Zhuoyang Qian, Wei Shi, Xu Lin, Li Ling, Meng Luo, Ziming Wang, Zhiwei Zhang, Tengyue Xu, Gaoge Liu, Zhentao Zhang, Shuo Zhang, Ziqi Wang, Zheng Feng, Yan Luo, Shu Xu, Yongjin Chen, Zhibo Feng, Zhuo Chen, Bruce Yuan, Biao Wu, Harry Wang, Kris Chen
General AI
Generating scientific manuscripts requires maintaining alignment between narrative reasoning, experimental evidence, and visual artifacts across the document lifecycle. Existing language-model generation pipelines rely on unconstrained text synthesis with validation applied only after generation, often producing struct…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.5
2026-03-30 · Fiorenzo Stoppa, Stephen J. Smartt
Research Track A
We present SNID-SAGE (SuperNova IDentification-Spectral Analysis and Guided Exploration), a framework for supernova spectral classification with both a fully interactive graphical interface and a scriptable command-line pipeline for large-scale processing. The pipeline combines deterministic spectral preprocessing, FFT…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-03-31 · Weixian Xu, Tiantian Mi, Yixiu Liu, Yang Nan, Zhimeng Zhou, Lyumanshan Ye, Lin Zhang, Yu Qiao, Pengfei Liu
General AI
Can AI accelerate the development of AI itself? While recent agentic systems have shown strong performance on well-scoped tasks with rapid feedback, it remains unclear whether they can tackle the costly, long-horizon, and weakly supervised research loops that drive real AI progress. We present ASI-Evolve, an agentic fr…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-03-31 · Wenli Li, Kai Zhao, Haoran Jiang, Enquan Yang, Yi Su, Dan Zeng
General AI
Vision-language models (VLMs) have been widely adopted for 3D question answering (3D QA). In typical pipelines, visual tokens extracted from multiple viewpoints are concatenated with language tokens and jointly processed by a large language model (LLM) for inference. However, aggregating multi-view observations inevita…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-04-01 · Benjamin Turtel, Paul Wilczewski, Kris Skotheim
General AI
Anticipating supply chain disruptions before they materialize is a core challenge for firms and policymakers alike. A key difficulty is learning to reason reliably about infrequent, high-impact events from noisy and unstructured inputs - a setting where general-purpose models struggle without task-specific adaptation. …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-04-10 · Han Luo, Guy Laban
General AI
Large language models are increasingly deployed in multi-turn settings such as tutoring, support, and counseling, where reliability depends on preserving consistent roles, personas, and goals across long horizons. This requirement becomes critical when LLMs are used to generate synthetic dialogues for training and eval…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-04-15 · Tianshuo Yang, Guanyu Chen, Yutian Chen, Zhixuan Liang, Yitian Liu, Zanxin Chen, Chunpu Xu, Haotian Liang, Jiangmiao Pang, Yao Mu, Ping Luo
General AI
While end-to-end Vision-Language-Action (VLA) models offer a promising paradigm for robotic manipulation, fine-tuning them on narrow control data often compromises the profound reasoning capabilities inherited from their base Vision-Language Models (VLMs). To resolve this fundamental trade-off, we propose HiVLA, a visu…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-04-15 · Haoran Xu, Kaiwen Hu, Somayeh Sojoudi, Amy Zhang
General AI
We study behavior-regularized reinforcement learning (RL), where regularization toward a reference distribution (the dataset in offline RL or the base model in LLM RL finetuning) is essential to prevent value over-optimization caused by erroneous out-of-distribution extrapolation. Existing methods either rely on repara…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.5
2026-04-16 · Jinchang Liu, Qingshan Zhou, Hongkan Chen, Yi Bu
Research Track A
Science advances not only by accumulating discovered patterns but by changing how new problems and solutions are expressed. While structural indicators track scholarly attention, they offer only an indirect proxy for the reorganization of meaning. We propose a semantic geometry based on the R-P-C (references, focal pub…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.5
2026-04-18 · Jiaxin Zhang, Xiangyu Peng, Qinglin Chen, Qinyuan Ye, Caiming Xiong, Chien-Sheng Wu
Research Track A
On-policy distillation (OPD) is an increasingly important paradigm for post-training language models. However, we identify a pervasive Scaling Law of Miscalibration: while OPD effectively improves task accuracy, it systematically traps models in severe overconfidence. We trace this failure to an information mismatch: t…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.5
2026-04-20 · Hyeonseo Jang, Hyuk Kwon, Kibok Lee
Research Track A
We investigate recently introduced domain-class incremental learning scenarios for vision-language models (VLMs). Recent works address this challenge using parameter-efficient methods, such as prefix-tuning or adapters, which facilitate model adaptation to downstream tasks by incorporating task-specific information int…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.5
2026-04-22 · Shanshan Zhong, Yi Lu, Jingjie Ning, Yibing Wan, Lihan Feng, Yuyi Ao, Leonardo F. R. Ribeiro, Markus Dreyer, Sean Ammirati, Chenyan Xiong
Research Track A · General AI
Skills have become the de facto way to enable LLM agents to perform complex real-world tasks with customized instructions, workflows, and tools, but how to learn them automatically and effectively remains unclear. We introduce SkillLearnBench, the first benchmark for evaluating continual skill learning methods, compris…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-04-23 · Itay Nakash, George Kour, Ateret Anaby-Tavor
General AI
Large language models (LLMs) are increasingly deployed as customer-facing agents, yet evaluating their reliability remains challenging due to stochastic, multi-turn interactions. Current evaluation protocols rely on linear Monte Carlo rollouts of complete agent-user conversations to estimate success. However, this appr…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-04-27 · Hongxin Li, Yuntao Chen, Zhaoxiang Zhang
Research Track B · General AI
Graphical User Interface (GUI) element grounding (precisely locating elements on screenshots based on natural language instructions) is fundamental for agents interacting with GUIs. Deploying this capability directly on resource-constrained devices like mobile phones is increasingly critical for GUI agents requiring lo…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.5
2026-04-29 · Karthik Charan Raghunathan, Christian Metzner, Laura Kriener, Melika Payvand
Research Track A · General AI
In a continual learning setting, we require a model to be plastic enough to learn a new task and stable enough to not disturb previously learned capabilities. We argue that this dilemma has an architectural root. A finite network has limited representational and plastic resources, yet the required capacity depends on p…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.5
2026-04-30 · Kathrin Korte, Joachim Winter Pedersen, Eleni Nisioti, Sebastian Risi
Research Track A
To preserve previously learned representations, continual learning systems must strike a balance between plasticity, the ability to acquire new knowledge, and stability. This stability-plasticity dilemma affects how representations can be reused across tasks: shared structure enables transfer when tasks are similar but…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.5
2026-05-07 · Bomin Wang, Hangqi Zhou, Yibo Gao, Xiahai Zhuang
Research Track A · General AI
Continual learning (CL) is essential for deploying medical image segmentation models in clinical environments where imaging domains, anatomical targets, and diagnostic tasks evolve over time. However, continual segmentation still faces three main challenges. First, the scenarios for this task remain insufficiently stan…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.5
2026-05-11 · Lungchuan Chen
Research Track A · General AI
Memory consolidation, the process by which transient experiences are transformed into stable, structured representations, is a foundational organizing principle in the human brain, yet it remains largely unexplored as a design principle for modern sequence models. In this work, we leverage established neuroscientific t…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.5
2026-05-28 · Yilun Yao, Jiaming Pan, Elsie Dai, Peizhuang Cong, Yaoming Li, Tong Yang
Research Track A · General AI
Mixture-of-Experts (MoE) language models reduce per-token computation but still require storing and serving all experts, making deployment memory-intensive. Existing post-training compression methods mainly shrink this cost by pruning experts or merging their weights. We formulate post-training MoE compression as exper…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-06-03 · Muhammad Usama, Didier Stricker, Mohammad Sadil Khan, Muhammad Zeshan Afzal
General AI
Learning representations of CAD models is a largely open problem. While 3D representation learning has flourished around point clouds and meshes, the native format of CAD - boundary representations BReps, which encodes exact parametric surfaces, curves, and their topology, has received little attention as a representat…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.5
2026-06-04 · Hao Bai, Rui Yang, Chenlu Ye, Spencer Whitehead, Aviral Kumar, Tong Zhang
Research Track B · General AI
Training vision-language web agents with multi-step RL is compute-intensive, with two dominant forms of inefficiency: idle GPUs in synchronous RL, and trajectories that use more steps and tokens than necessary. We present AsyncWebRL, which addresses both. On the system side, an asynchronous design overlaps rollout, gra…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-06-04 · Qiuyu Tian, Haojie Yin, Yingce Xia, Youyong Kong, Zequn Liu
General AI
AI research often requires decisions before future evidence exists: which bottleneck to attack, which direction to pursue, or where a project should be positioned. We introduce ForeSci, a temporally controlled benchmark for evaluating whether LLM agents can make such forward-looking research judgements from historical …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.5
2026-06-04 · Seyed Arshan Dalili, Mehrdad Mahdavi
Research Track A · General AI
Sparse Autoencoders (SAEs) are widely used for mechanistic interpretability in large language models, yet their formulation assigns each latent feature a single decoder direction, implicitly assuming features to be one-dimensional. We show that this assumption mismatches with the multi-dimensional structure of model fe…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-06-05 · Chung-En Sun, Linbo Liu, Tsui-Wei Weng
General AI
Are tool-calling LLM agents equally safe throughout a conversation? We discover they are not: agents are most vulnerable at the very start of a session and become substantially safer after a few regular agentic tasks -- a phenomenon we term the cold-start safety gap. To study this systematically, we introduce Safety Ov…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.5
2026-06-06 · Amine El Hattami, Nicolas Chapados, Christopher Pal
Research Track B · General AI
AI agents increasingly turn past experience into reusable artifacts such as code, workflows, and procedural memories. Reuse can improve efficiency, but it also creates a lifecycle reliability problem: artifacts that succeed once may fail under environment drift, underspecified tasks, or changing task distributions, esp…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-06-07 · Suraj Ranganath, Anish Raghavendra
General AI
Enterprise property graphs vary widely in schema structure, internal terminology, domain assumptions, governance constraints, and user interaction patterns. A deployment-relevant Text2Cypher benchmark therefore reflects the questions users and agents actually ask of that graph. Creating such a benchmark is difficult be…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-06-09 · Xinyu Zhou, Boyu Zhu, Yi Xu, Zhiwei Li, Yingfa Chen, Huiming Wang, Zhijiang Guo
General AI
Chain-of-thought (CoT) supervised fine-tuning (SFT) is widely adopted to improve reasoning ability, yet we find that it systematically degrades long-context recall in hybrid linear-attention models. Across architectures including HypeNet and Jet-Nemotron, retrieval performance on Needle-In-A-Haystack (NIAH) deteriorate…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-06-09 · Malikeh Ehghaghi, Boglárka Ecsedi, Marsha Chechik, Colin Raffel
General AI
Adversarial robustness evaluations of large language models (LLMs) typically report attack success rate (ASR) under fixed query budgets, implicitly treating all attacks as equally costly. In practice, the computational expense of different attack strategies can vary by orders of magnitude. Consequently, ASR at a fixed …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-06-09 · Yuke Zhao, Wangbo Zhao, Weijie Wang, Zeyu Zhang, Dakai An, Akide Liu, Yinghao Yu, Jiasheng Tang, Fan Wang, Wei Wang, Bohan Zhuang
General AI
We introduce WorldOlympiad, a benchmark for diagnosing video-based world models across physical faithfulness, geometric consistency, and interaction fidelity. While existing benchmarks often focus on visual quality, semantic alignment, or short-term temporal coherence, they provide limited insight into whether generate…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-06-11 · Yujun Zhou, Kehan Guo, Haomin Zhuang, Xiangqi Wang, Yue Huang, Zhenwen Liang, Pin-Yu Chen, Tian Gao, Nuno Moniz, Nitesh V. Chawla, Xiangliang Zhang
General AI
Interactive LLM agents are becoming part of daily work, but they do not reliably become easier to work with over time: a correction remembered in one session may still be violated in the next. We study this gap between preference access and preference compliance. In tasks derived from anonymized real-user friction case…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.5
2026-06-11 · Hexuan Yu, Chaoyu Zhang, Heng Jin, Shanghao Shi, Ning Zhang, Y. Thomas Hou, Wenjing Lou
Research Track B · General AI
Modern LLM-powered autonomous agents increasingly rely on rich user interface (UI) state observations to achieve reliable action grounding in complex digital environments. However, many deployments transmit the full UI state to remote inference servers even when most elements are irrelevant to the current task, which c…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-06-12 · Chenxin Li, Zhengyao Fang, Zhengyang Tang, Pengyuan Lyu, Xingran Zhou, Xin Lai, Fei Tang, Liang Wu, Yiduo Guo, Weinong Wang, Junyi Li, Yi Zhang, Yang Ding, Huawen Shen, Sunqi Fan, Shangpin Peng, Zheng Ruan, Anran Zhang, Benyou Wang, Chengquan Zhang, Han Hu
General AI
Phone agents are increasingly expected to complete real mobile workflows rather than merely predict the next screen action. However, much of the current mobile-agent literature still evaluates agents primarily as GUI controllers that observe a screen, emit taps and swipes, and are scored by target app state. Real phone…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-06-14 · Nafiseh Nikeghbal, Amir Hossein Kargaran, Shaghayegh Kolli, Jana Diesner
General AI
Standard accuracy benchmarks are designed to test how closely large language models (LLMs) approach correct answers, but are not suitable for testing whether LLMs stick with a correct answer when that answer is challenged by a plausible counter-argument. We introduce a controlled protocol for evaluating answer stabilit…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.5
2026-06-16 · Nethmi Jayasinghe, Diana Gontero, Amit Ranjan Trivedi
Research Track A · General AI
Robots that learn over long deployments must add new skills without losing the shared policy structure that makes earlier skills reusable. We study sequential robot skill learning, where previous trajectories and task losses may be unavailable, and the deployed policy must remain a single shared controller without task…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-06-17 · Zijian Wang, Hanqi Li, Ziyue Yang, Zijian Hu, Shenghan Zuo, Yunzhe Zhang, Da Ma, Danyu Luo, Chenrun Wang, Jing Peng, Tiancheng Huang, Sijia Guo, Huayang Wang, Zichen Zhu, Senyu Han, Yilu Cao, Kai Yu, Lu Chen
General AI
AI systems can increasingly automate scientific workflows, but the reasoning that links prior evidence, generated ideas, experiments and final claims often remains implicit inside model inference. Here we introduce Xcientist, a research harness that externalizes research synthesis and experimental validation into inspe…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.5
2026-06-17 · Yujin Zhang, Daye Nam
Research Track B · General AI
AI web agents can perform complex, multi-step tasks such as searching for products, comparing options, and making purchases on behalf of users. However, verifying the correctness of an agent's output remains difficult. Existing transparency mechanisms, including full trajectory logs, source links, screenshots, and LLM-…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.5
2026-06-30 · Kaisen Yang, Zheng Jiang, Yuzhao Peng, Houde Qian, Boshi Zhang, Youjie Zheng, Shijin Hong, Qingle Liu, Ruoyu Han, Bohan Lyu, Bingxiang He, Eren Cai, Calvin Xiao, Qinhuai Na
Research Track A · Research Track B · General AI
Internet users collectively perform an enormous range of skilled work through web browsers, from software development and document editing to search, forms, and enterprise workflows, making human browsing a highly scalable but under-exploited source of reusable browser skills. We argue that the bottleneck for browser a…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 10.4
2026-06-23 · Zhihao Wang, Jianxiong Li, Yu Cui, Yuan Gao, Xianyuan Zhan, Junzhi Yu, Xiao Ma
General AI
Generalist value models play a pivotal role in scaling robotic policy learning from large-scale, mixed-quality data. Mathematically, accurate value estimation demands deep temporal understanding, requiring models to both ground the current belief using historical context and plan over future outcomes. However, most exi…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.4
2026-06-25 · Xinyu Wang, Chongbo Zhao, Fangneng Zhan, Yue Ma
General AI
Streaming video editing has made rapid progress, yet practical deployment is still limited by two core issues: maintaining stable backgrounds and non-edited regions over time, and achieving the low latency required for real-time interactive scenarios. Meanwhile, recent streaming video generation methods are mostly deve…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 10.4
2026-06-26 · Siqiao Xue, Chunxue Xu
General AI
Adapting a foundation vision-language encoder to a specialized retrieval task creates a fundamental tradeoff: gains on the target distribution come at the cost of the foundation model's broad generalization, and fashion retrieval is a stringent instance of this problem. We present ZooClaw-FashionSigLIP2, a fashion-spec…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.3
2026-02-01 · Alberto Castelo, Zahra Zanjani Foumani, Ailin Fan, Keat Yang Koay, Vibhor Malik, Yuanzheng Zhu, Han Li, Meysam Feghhi, Ronie Uliana, Shuang Xie, Zhaoyu Zhang, Angelo Ocana Martins, Mingyu Zhao, Francis Pelland, Jonathan Faerman, Nikolas LeBlanc, Aaron Glazer, Andrew McNamara, Lingyun Wang, Zhong Wu
Research Track B · General AI
A/B testing remains the gold standard for evaluating e-commerce UI changes, yet it diverts traffic, takes weeks to reach significance, and risks harming user experience. We introduce SimGym, a scalable system for rapid offline A/B testing using traffic-grounded synthetic buyers powered by Large Language Model agents op…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.3
2026-03-24 · Qianlong Lan, Anuj Kaul
Research Track B · General AI
Deploying large language models (LLMs) as autonomous browser agents exposes a significant attack surface in the form of Indirect Prompt Injection (IPI). Cloud-based defenses can provide strong semantic analysis, but they introduce latency and raise privacy concerns. We present the Cognitive Firewall, a three-stage spli…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-03-26 · Linyue Pan, Lexiao Zou, Shuo Guo, Jingchen Ni, Hai-Tao Zheng
General AI
Agent performance increasingly depends on \emph{harness engineering}, yet harness design is usually buried in controller code and runtime-specific conventions, making it hard to transfer, compare, and study as a scientific object. We ask whether the high-level control logic of an agent harness can instead be externaliz…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-03-26 · Liping Yi, Zhiming Zhao, Qinghua Hu
General AI
Social learning highlights that learning agents improve not in isolation, but through interaction and structured knowledge exchange with others. When introduced into machine learning, this principle gives rise to social machine learning (SML), where multiple agents collaboratively learn by sharing abstracted knowledge.…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-03-26 · Jinbo Xing, Zeyinzi Jiang, Yuxiang Tuo, Chaojie Mao, Xiaotang Gai, Xi Chen, Jingfeng Zhang, Yulin Pan, Zhen Han, Jie Xiao, Keyu Yan, Chenwei Xie, Chongyang Zhong, Kai Zhu, Tong Shen, Lianghua Huang, Yu Liu, Yujiu Yang
General AI
Recent unified models have made unprecedented progress in both understanding and generation. However, while most of them accept multi-modal inputs, they typically produce only single-modality outputs. This challenge of producing interleaved content is mainly due to training data scarcity and the difficulty of modeling …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-03-30 · Min Wang, Ata Mahjoubfar
General AI
Agentic vision-language models increasingly act through extended interactions, but most evaluations still focus on single-image, single-turn correctness. We introduce AMIGO (Agentic Multi-Image Grounding Oracle Benchmark), a long-horizon benchmark for hidden-target identification over galleries of visually similar imag…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-03-30 · Yanyan Yan, Yang Feng, Jiangshan Liu, Di Liu, Zixi Liu, Hao Teng, Baowen Xu
General AI
The growing adoption of Rust for its memory safety and performance has increased the demand for effective migration of legacy C codebases. However, existing rule-based translators (e.g., \ctorust) often generate verbose, non-idiomatic code that preserves unsafe C semantics, limiting readability, maintainability, and pr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-03-31 · Mst. Fahmida Sultana Naznin, Adnan Ibney Faruq, Mushfiqur Rahman, Niloy Kumar Mondal, Md. Mehedi Hasan Shawon, Md Rakibul Hasan
General AI
Automated radiology report summarization aims to distill verbose findings into concise clinical impressions, but existing multimodal models often struggle with visual noise and fail to meaningfully improve over strong text-only baselines in the FINDINGS $\to$ IMPRESSION transformation. We challenge two prevailing assum…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-03-31 · Zhuowen Liang, Xiaotian Lin, Zhengxuan Zhang, Yuyu Luo, Haixun Wang, Nan Tang
General AI
Large language models (LLMs) are widely applied to data analytics over documents, yet direct reasoning over long, noisy documents remains brittle and error-prone. Hence, we study document question answering (QA) that consolidates dispersed evidence into a structured output (e.g., a table, graph, or chunks) to support r…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-03-31 · Kaleb Newman, Tyler Zhu, Olga Russakovsky
General AI
Video diffusion models exhibit emergent reasoning capabilities like solving mazes and puzzles, yet little is understood about how they reason during generation. We take a first step towards understanding this and study the internal planning dynamics of video models using 2D maze solving as a controlled testbed. Our inv…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-02 · Qiyao Zhang, Shuhua Zheng, Jianli Sun, Chengxiang Li, Xianke Wu, Zihan Song, Zhiyong Cui, Yisheng Lv, Yonglin Tian
General AI
Embodied visual tracking is crucial for Unmanned Aerial Vehicles (UAVs) executing complex real-world tasks. In dynamic urban scenarios with complex semantic requirements, Vision-Language-Action (VLA) models show great promise due to their cross-modal fusion and continuous action generation capabilities. To benchmark mu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-02 · Syed Ahmed, Bharathi Vokkaliga Ganesh, Jagadish Babu P, Karthick Selvaraj, Praneeth Talluri, Sanket Hingne, Anubhav Kumar, Anushka Yadav, Pratham Kumar Verma, Kiranmayee Janardhan, Mandanna A N
General AI
Understanding how Large Language Models (LLMs) process information from prompts remains a significant challenge. To shed light on this "black box," attention visualization techniques have been developed to capture neuron-level perceptions and interpret how models focus on different parts of input data. However, many ex…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-03 · Yunfei Bai, Amit Dhanda, Shekhar Jain
General AI
The recent advancements in Vision Language Models (VLMs) have demonstrated progress toward true intelligence requiring robust reasoning capabilities. Beyond pattern recognition, linguistic reasoning must integrate with visual comprehension, particularly for Chart Question Answering (CQA) tasks involving complex data vi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-06 · Songyuan Yang, Huibin Tan, Kailun Yang, Wenjing Yang, Shaowu Yang
General AI
We introduce AnyUser, a unified robotic instruction system for intuitive domestic task instruction via free-form sketches on camera images, optionally with language. AnyUser interprets multimodal inputs (sketch, vision, language) as spatial-semantic primitives to generate executable robot actions requiring no prior map…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-09 · Yuxuan Zhang, Yubo Wang, Yipeng Zhu, Penghui Du, Junwen Miao, Xuan Lu, Wendong Xu, Yunzhuo Hao, Songcheng Cai, Xiaochen Wang, Huaisong Zhang, Xian Wu, Yi Lu, Minyi Lei, Kai Zou, Huifeng Yin, Ping Nie, Liang Chen, Dongfu Jiang, Wenhu Chen, Kelsey R. Allen
General AI
AI agents may be able to automate your inbox, but can they automate other routine aspects of your life? Everyday online tasks offer a realistic yet unsolved testbed for evaluating the next generation of AI agents. To this end, we introduce ClawBench, an evaluation framework of 153 simple tasks that people need to accom…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-14 · Yiyang Huang, Yitian Zhang, Yizhou Wang, Mingyuan Zhang, Liang Shi, Huimin Zeng, Yun Fu
General AI
Despite significant progress in video-language modeling, hallucinations remain a persistent challenge in Video Large Language Models (Vid-LLMs), referring to outputs that appear plausible yet contradict the content of the input video. This survey presents a comprehensive analysis of hallucinations in Vid-LLMs and intro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-14 · Erfan Baghaei Potraghloo, Seyedarmin Azizi, Souvik Kundu, Massoud Pedram
General AI
Instruction-tuned large language models produce helpful, structured responses, but how robust is this helpfulness when trivially constrained? We show that simple lexical constraints (banning a single punctuation character or common word) cause instruction-tuned LLMs to collapse their responses, losing 14--48% of compre…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-14 · Joel Fokou
General AI
Autonomous AI agents are rapidly transitioning from experimental tools to operational infrastructure, with projections that 80% of enterprise applications will embed AI copilots by the end of 2026. As agents gain the ability to execute real-world actions (reading files, running commands, making network requests, modify…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-16 · Marcel Wagenländer, Otto White, Britannio Jarrett, Pedro Silvestre, Yanda Tao, Guo Li, Huanzhou Zhu, Llúis Vilanova, Peter Pietzuch
General AI
Agentic workflows carry out complex tasks by orchestrating multiple large language models (LLMs) and tools. Serving such workflows at a target throughput with low latency is challenging because they can be defined using arbitrary agentic frameworks and exhibit unpredictable execution times: execution may branch, fan-ou…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-16 · Xiao-Liang Qi
General AI
This article argues that the most important significance of the AI revolution, especially the rise of large language models, lies not simply in automation, but in a fundamental change in how complex information and human know-how are carried, replicated, and shared. From this perspective, AI for Science is especially i…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-17 · Yi Lin, Yihao Ding, Yonghui Wu, Yifan Peng
General AI
Automated 3D radiology report generation often suffers from clinical hallucinations and a lack of the iterative verification found in human practice. While recent Vision-Language Models (VLMs) have advanced the field, they typically operate as monolithic "black-box" systems without the collaborative oversight character…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-20 · Xirui Li, Ming Li, Derry Xu, Wei-Lin Chiang, Ion Stoica, Cho-Jui Hsieh, Tianyi Zhou
General AI
Constructing environments for training and evaluating claw-like agents remains a manual, human-intensive process that does not scale. We argue that what is needed is not just a dataset, but an automated pipeline capable of generating diverse, verified environments on demand. To this end, we introduce ClawEnvKit, an aut…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-20 · Manan Gupta, Dhruv Kumar
General AI
Large language models frequently commit unrecoverable reasoning errors mid-generation: once a wrong step is taken, subsequent tokens compound the mistake rather than correct it. We introduce $\textbf{Latent Phase-Shift Rollback}$ (LPSR): at each generation step, we monitor the residual stream at a critical layer lcrit,…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-20 · Daniela Baiamonte, Elena Fano, Matteo Gabburo, Stefano Simonazzi, Leonardo Rigutini, Andrea Zugarini
General AI
Vision Language Models (VLMs) achieved rapid progress in the recent years. However, despite their growth, VLMs development is heavily grounded on English, leading to two main limitations: (i) the lack of multilingual and multimodal datasets for training, and (ii) the scarcity of comprehensive evaluation benchmarks acro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-21 · Xianming Li, Zongxi Li, Tsz-fung Andrew Lee, Jing Li, Haoran Xie, Qing Li
General AI
Parameter-efficient fine-tuning (PEFT) reduces the training cost of full-parameter fine-tuning for large language models (LLMs) by training only a small set of task-specific parameters while freezing the pretrained backbone. However, existing approaches, such as Low-Rank Adaptation (LoRA), achieve adaptation by inserti…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-22 · Hanqi Li, Lu Chen, Kai Yu
General AI
As LLMs are increasingly integrated into agentic systems, they must adhere to dynamically defined, machine-interpretable interfaces. We evaluate LLMs as in-context interpreters: given a novel context-free grammar, can LLMs generate syntactically valid, behaviorally functional, and semantically faithful outputs? We intr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-23 · Irene Aldridge, Jolie An, Riley Burke, Michael Cao, Chia-Yi Chien, Kexin Deng, Ruipeng Deng, Yichen Gao, Olivia Guo, Shunran He, Zheng Li, George Lin, Weihang Lin, Percy Lyu, Alex Ng, Qi Wang, Hanxi Xiao, Dora Xu, Yuanyuan Xue, Sheng Zhang, Sirui Zhang, Yun Zhang, Sirui Zhao, Xiaolong Zhao, Yihan Zhao, Waner Zheng
General AI
The emergence of agentic artificial intelligence (AI) represents a fundamental transformation in financial markets, characterized by autonomous systems capable of reasoning, planning, and adaptive decision-making with minimal human intervention. This comprehensive survey synthesizes recent advances in agentic AI across…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-23 · Isabella Liu, An-Chieh Cheng, Rui Yan, Geng Chen, Ri-Zhao Qiu, Xueyan Zou, Sha Yi, Hongxu Yin, Xiaolong Wang, Sifei Liu
General AI
Long-horizon manipulation remains challenging for vision-language-action (VLA) policies: real tasks are multi-step, progress-dependent, and brittle to compounding execution errors. We present LoHo-Manip, a modular framework that scales short-horizon VLA execution to long-horizon instruction following via a dedicated ta…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-23 · Jiseon Kim, Jea Kwon, Luiz Felipe Vecchietti, Wenchao Dong, Jaehong Kim, Meeyoung Cha
General AI
Human moral judgment is context-dependent and modulated by interpersonal relationships. As large language models (LLMs) increasingly function as decision-support systems, determining whether they encode these social nuances is critical. We characterize machine behavior using the Whistleblower's Dilemma by varying two e…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-23 · Naheed Rayhan, Sohely Jahan
General AI
Large language models (LLMs) are increasingly integrated into sensitive workflows, raising the stakes for adversarial robustness and safety. This paper introduces Transient Turn Injection(TTI), a new multi-turn attack technique that systematically exploits stateless moderation by distributing adversarial intent across …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-24 · Jiajun Yu, Guodong Liu, Li Wang, Pengxiang Zhou, Wentao Liu, Yin He, Chao Xu, Fei Gao, Yanjun Cao
General AI
Parallel trajectory optimization via the Alternating Direction Method of Multipliers (ADMM) has emerged as a scalable approach to long-horizon motion planning. However, existing frameworks typically decompose the problem into parallel subproblems based on a predefined fixed structure. Such structural rigidity often cau…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-24 · Longju Bai, Zhemin Huang, Xingyao Wang, Jiao Sun, Rada Mihalcea, Erik Brynjolfsson, Alex Pentland, Jiaxin Pei
General AI
The wide adoption of AI agents in complex human workflows is driving rapid growth in LLM token consumption. When agents are deployed on tasks that require a significant amount of tokens, three questions naturally arise: (1) Where do AI agents spend the tokens? (2) Which models are more token-efficient? and (3) Can agen…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-27 · Parsa Ashrafi Fashi, Utkarsh Saxena, Mehdi Rezagholizadeh, Aref Jafari, Akash Haridas, Mingyu Yang, Vansh Bhatia, Guihong Li, Vikram Appia, Emad Barsoum
General AI
Hybrid sequence models that combine efficient Transformer components with linear sequence modeling blocks are a promising alternative to pure Transformers, but most are still pretrained from scratch and therefore fail to reuse existing Transformer checkpoints. We study upcycling as a practical path to convert pretraine…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-27 · Zhenyu Zhao, Aparna Balagopalan, Adi Agrawal, Dilshoda Yergasheva, Waseem Alshikh, Daniel M. Bikel
General AI
Given the increased use of LLMs in financial systems today, it becomes important to evaluate the safety and robustness of such systems. One failure mode that LLMs frequently display in general domain settings is that of sycophancy. That is, models prioritize agreement with expressed user beliefs over correctness, leadi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-30 · Gyoung S. Na, Chanyoung Park
Research Track A · General AI
Deriving governing equations from empirical observations is a longstanding challenge in science. Although artificial intelligence (AI) has demonstrated substantial capabilities in function approximation, the discovery of explainable and extrapolatable equations remains a fundamental limitation of modern AI, posing a ce…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-04-30 · Sriram Narayanan, Ziyu Jiang, Srinivasa Narasimhan, Manmohan Chandraker
General AI
Modern video diffusion models excel at appearance synthesis but still struggle with physical consistency: objects drift, collisions lack realistic rebound, and material responses seldom match their underlying properties. We present PhyCo, a framework that introduces continuous, interpretable, and physically grounded co…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-05-01 · Pavlin G. Poličar, Andraž Pevcin, Blaž Zupan
General AI
Generating diverse, readable statistical charts from tabular data remains challenging for LLMs, as many failures become apparent after rendering and are not detectable from data or code alone. Existing chart datasets also rarely provide fully aligned artifacts, such as executable code, dataset context, and question-ans…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-06-04 · Lizhi Yang, Junheng Li, Nehar Poddar, Yiling Hou, Gio Huh, Robert Griffin, Georgia Gkioxari, Aaron Ames
General AI
For a humanoid robot to be deployed in the real world, the choice of command space (i.e., the interface between task planning and whole-body control) is crucial. Existing whole-body controllers typically demand dense kinematic or spatial references that planners struggle to synthesize from task semantics. We instead pr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-06-04 · Mehmet Iscan
General AI
Large language models increasingly write, review, and judge code, and a fast-growing practice equips them with prompt 'skills' that ask the model to reason like a scientist. A prominent example tells the model to act as a Popperian falsificationist, and such skills are reported to improve generated code. But these gain…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-06-04 · Ziwen Kan, Yishuo Chen, Kecheng Li, Andrew Wen, Xiaomeng Wang, Liwei Wang, Jihao Duan, Song Wang, Hongfang Liu, Tianlong Chen
General AI
Time series foundation models (TS-FMs) aim to learn generalizable temporal representations that can be adapted to a wide range of downstream tasks. In real-world multimodal settings, time series are frequently affected by temporal misalignment and partial modality missingness, where different modalities are observed at…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-06-04 · Yutao Sun, Yanqi Zhang, Li Dong, Jianyong Wang, Furu Wei
General AI
Long-context inference in modern LLMs is increasingly constrained by decoding efficiency, especially in reasoning-heavy settings where models generate long intermediate chains of thought. Existing sparse attention methods often face a practical efficiency-quality trade-off. Structured block sparse methods typically pro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-06-08 · Danqi Zhuang, Jisui Huang, Xiaoyue Xi, Andrew Kiggins, Xiaojie Wang, Ke Chen, Yue Wu
General AI
Standard diffusion models typically use a single time-homogeneous Gaussian terminal distribution as the reference law for generation. While this choice is analytically convenient and empirically powerful, it provides little explicit structure for data concentrated near low-dimensional manifolds, where different regions…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-06-09 · George Perrett, Javae Elliott, Jennifer Hill, Marc Scott
General AI
Large Language Models (LLMs) are increasingly described as performing at the level of human experts on knowledge economy tasks. These claims are primarily based on how LLMs perform on benchmarking tasks that measure average performance across standardized datasets. Primary limitations of many benchmarking tasks are tha…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-06-09 · Avi Gupta, Nilotpal Sinha, Vishnu Raj, Sambuddha Saha, Pratik Joshi, Koteswar Rao Jerripothula, Tammam Tillo
General AI
Class-Incremental Learning (CIL) aims to continuously learn new classes without forgetting previously acquired knowledge. While recent CIL advances have spurred significant interest across various modalities, the audio-visual setting remains underexplored. Furthermore, although foundational multimodal models like SAM-A…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-06-09 · Wajih ul Islam, Muhammad Yaqoob, Javed Ali Khan, Volker Steuber
General AI
Clinicians diagnose brain tumors by synthesizing patient symptoms, medical history, and quantitative imaging data from modalities such as MRI and CT scans into a unified clinical judgement. However, most deep learning models rely on MRI/CT images alone, failing to replicate the clinicians multimodal reasoning. We explo…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-06-09 · Gangwei Xu, Qihang Zhang, Jiaming Zhou, Xing Zhu, Yujun Shen, Xin Yang, Yinghao Xu
General AI
Autoregressive video generation has emerged as a powerful paradigm for World Action Models (WAMs). However, existing approaches suffer from slow training convergence and limited converged accuracy, particularly at high frame rates, as the training supervision is confined to the current chunk without explicit signals ab…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-06-09 · Haoling Zhou, Shixuan Zhao, Chao Wang, Zhiqiang Lin
General AI
Generative AI applications such as personal AI agents, image generators, and chat assistants offer advanced capabilities to improve user experience. Behind the scenes, Large Language Models (LLMs) that power these services require a massive amount of computation and are usually deployed in the cloud, available as APIs,…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-06-14 · Claudio Fantinuoli
General AI
Machine interpreting (MI), the live, real-time branch of speech translation, has achieved remarkable progress on standard benchmarks, with some systems approaching human parity on textual fidelity. Yet the user experience remains far inferior to interpreter-mediated communication, revealing what we term the \emph{accur…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-06-15 · Hamidah Oderinwale
General AI
Benchmark scores tell you what an agent got right; they do not tell you how it got there. In this work, we introduce methods for comparing agents procedurally in different contexts, where the model, tasks, and approaches vary. We compare ten agents and find that they are identifiable by their behavioral habits, which w…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-06-15 · Yanan Long
General AI
Public AI evaluations are often read as terminal leaderboards, yet the underlying evidence is a selective time series shaped by reporting rules, benchmark revisions, and missingness. Repeated public archives for LiveBench and Open LLM Leaderboard v2 serve as the primary longitudinal record; LMArena provides a preferenc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-06-15 · Jisang Han, Seonghu Jeon, Jaewoo Jung, René Zurbrügg, Honggyu An, Tifanny Portela, Marco Hutter, Marc Pollefeys, Seungryong Kim, Sunghwan Hong
General AI
Generalist robot policies must follow user instructions while reasoning about how objects, cameras, and robot actions interact in the 3D physical world. Recent vision-language-action models (VLAs) and video world-action models (WAMs) inherit strong semantic or temporal priors from large-scale foundation models, but the…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-06-16 · Md Tawkat Islam Khondaker, Raymond Li, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Issam H. Laradji
General AI
Deep research (DR) systems are increasingly used for complex information-seeking tasks, but existing works mainly focus on generating reports and summaries. In contrast, many enterprise tasks instead require an agent to identify concrete workflows which is a sequence of action-steps. For example, rather than summarizin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-06-17 · Yu Zhang, Kangyi Ji, Yongxiang Zou, Rongtao Xu, Feng Zheng, Long Cheng
General AI
This paper presents an invertible neural network adapter for general robotic manipulation, designed to generate precise high-dimensional actions conditioned on multimodal observations, including visual, linguistic, and proprioceptive inputs, through a one-step denoising process. Built upon a flow-matching formulation, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-06-17 · Sanghyeok Choi, Henry Gouk, Esmeralda S. Whitammer
General AI
The knowledge encoded in large language models (LLMs) can serve as a substrate for structured reasoning over variables describing a complex world, but accessing this knowledge in a probabilistically coherent manner poses a difficult inference problem. We propose Large Language Gibbs, a scheme for structured probabilist…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.3
2026-06-17 · Haodong Chen, Xuanhe Zhou, Wei Zhou, Xinyue Shao, Yanbing Zhu, Bo Wang, Jiawei Hong, Anya Jia, Fan Wu
General AI
Automatically generating slide decks from source documents is an important application of large language models (LLMs). Existing benchmarks primarily assess slide completeness and technical depth, while overlooking the target audience as a critical real-world factor. For instance, specialists demand rigorous proofs, wh…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.2
2026-06-23 · Xingjian Leng, Jaskirat Singh, Zhanhao Liang, Ethan Smith, Martin Bell, Aninda Saha, Yuhui Yuan, Liang Zheng
General AI
Diffusion transformer (DiT) research on image generation has converged to a single evaluation setup: class-conditional generation on ImageNet. While methods improve the FID and related metrics, it is increasingly unclear whether they reflect real progress in generative modeling. The natural alternative, i.e., text-to-i…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.2
2026-06-23 · Orest Kupyn, Goutam Bhat, Philipp Henzler, Fabian Manhardt, Christian Rupprecht, Federico Tombari
General AI
Generating explorable 3D scenes from a single image requires strong generative priors and accurate geometric representations suitable for downstream use. Current video diffusion models offer high-quality generation and implicitly encode multi-view geometric structure in latent space. However, existing feedforward laten…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.0
2026-03-11 · Hyungjoo Chae, Jungsoo Park, Alan Ritter
Research Track B · General AI
Training autonomous web agents is fundamentally limited by the environments they learn from: real-world websites are unsafe to explore, hard to reset, and rarely provide verifiable feedback. We propose VeriEnv, a framework that treats language models as environment creators, automatically cloning real-world websites in…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.0
2026-03-22 · Alfred Shen, Aaron Shen
Research Track A · General AI
Current AI agent frameworks commit early to a single interaction protocol, a fixed tool integration strategy, and static user models, limiting their deployment across diverse interaction paradigms. To address these constraints, we introduce STEM Agent (Self-adapting, Tool-enabled, Extensible, Multi-agent), a modular ar…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.0
2026-04-02 · Kang-Sin Choi
Research Track A · General AI
We propose LSCP, a self-gated post-training framework for autonomous knowledge acquisition: learning only what a model does not already know, verified against what it does know, at a strength proportional to conviction, with no external oracle. When a passage produces anomalously high per-token loss, LSCP flags it, gen…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.0
2026-04-08 · Jiaming Cheng, Duong Tung Nguyen
Research Track A · General AI
Deploying large language model (LLM) inference at scale requires jointly selecting base models, provisioning heterogeneous GPUs, configuring parallelism, and distributing workloads under tight latency, accuracy, and budget constraints. Exact mixed-integer linear programming (MILP) approaches guarantee optimality but sc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.0
2026-04-17 · Ulrich Tan
Research Track A · General AI
We introduce the Tan-HWG framework (Hebbian-Wasserstein-Geometry), a geometric theory of Hebbian plasticity in which memory states are modeled as probability measures evolving through Wasserstein minimizing movements. Hebbian learning rules are formalized as Hebbian energies satisfying a sequential stability condition,…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.0
2026-04-30 · Haofei Yu, Yining Zhao, Lenore Blum, Manuel Blum, Paul Pu Liang
Research Track B · General AI
Despite remarkable advances, today's AI systems remain narrow in scope, falling short of the flexible, adaptive, and multisensory intelligence that characterizes human capabilities. This gap has fueled longstanding debates about whether AI might one day achieve human-like generality or even consciousness, and whether t…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.0
2026-05-04 · Zhisheng Tang, Mayank Kejriwal
Research Track B · General AI
Research funding discovery remains fundamentally fragmented: researchers navigate disparate agency portals (e.g., in the United States, NSF, NIH, DARPA, Grants.gov, and many others) with heterogeneous interfaces, search capabilities, and data schemas. We present a compound AI system that unifies this landscape through …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.0
2026-05-07 · Xinmiao Huang, Jinwei Hu, Rajarshi Roy, Changshun Wu, Yi Dong, Xiaowei Huang
Research Track B · General AI
Large language model (LLM) agents now execute long, tool-using tasks where final outcome checks can arrive too late for intervention. Online warning requires lightweight prefix monitors over heterogeneous traces, but hand-authored event schemas are brittle and deployment-time LLM judging is costly. We introduce PrefixG…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.0
2026-05-08 · Zhichao Liu, Wenbo Pan, Haining Yu, Ge Gao, Tianqing Zhu, Xiaohua Jia
Research Track B · General AI
Browser agents are increasingly deployed in long-horizon tasks, which require executing extended action chains to accomplish user goals. However, this prolonged execution process provides attackers with more opportunities to inject malicious instructions. Existing prompt injection attacks against browser agents expose …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.0
2026-05-11 · Ihor Stepanov, Oleksandr Lukashov, Mykhailo Shtopko, Vivek Kalyanarangan
General AI
Joint named entity recognition (NER) and relation extraction (RE) is a fundamental task in natural language processing for constructing knowledge graphs from unstructured text. While recent approaches treat NER and RE as separate tasks requiring distinct models, we introduce GLiNER-Relex, a unified architecture that ex…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.0
2026-05-14 · Zahra Zanjani Foumani, Alberto Castelo, Shuang Xie, Ted Chaiwachirasak, Han Li, Lingyun Wang
Research Track B · General AI
LLM-based web agents can navigate live storefronts, yet they often collapse to a single "average buyer" policy, failing to capture the heterogeneous and distributional nature of real buyer populations. Existing personalization methods rely on hand-crafted prompt-based personas that are brittle, difficult to scale, cont…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.0
2026-05-27 · Kenny Daniel
Research Track B · General AI
The fastest-growing data in production today is unstructured text: agent traces, chat logs, reasoning chains, model outputs. People want to analyze it, and the questions worth asking ("show me where the agent got confused") cannot be answered by SQL alone, since text is not queryable without a model in the query path. …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.0
2026-05-28 · Vaishali Senthil, Ashutosh Hathidara, Sebastian Schreiber
General AI
Tool retrieval over large API catalogs is a core bottleneck for LLM agents: user queries arrive in colloquial, often underspecified language, while the catalog uses technical API vocabulary that no fixed encoder can bridge on its own. The two dominant training approaches, contrastive encoder fine-tuning and HyDE-style …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.0
2026-05-28 · Yue Zhang, Zun Wang, Han Lin, Yonatan Bitton, Idan Szpektor, Mohit Bansal
General AI
Spatial reasoning is a fundamental capability for vision-language models (VLMs) deployed in real-world environments. However, visual observations are inherently limited representations of a 3D world: occlusion can render objects invisible, and perspective can make geometric properties misleading. Despite this, existing…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.0
2026-05-28 · Cheolhong Min, Jaeyun Jung, Daeun Lee, Hyeonseong Jeon, Yu Su, Jonathan Tremblay, Chan Hee Song, Jaesik Park
General AI
Vision-language models (VLMs) achieve strong performance on spatial reasoning benchmarks, yet it remains unclear whether this reflects structured 3D understanding or reliance on statistical shortcuts in natural images. We introduce a representation-level analysis framework that constructs minimal contrastive pairs to m…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.0
2026-06-04 · Marius Dragoi, Ioana Pintilie, Alexandra Dragomir, Antonio Barbalau, Florin Brad
Research Track A
Parameter-efficient finetuning methods based on spectral decomposition have enabled progress in Continual Learning. In this paper we introduce TailLoR, which utilizes the singular bases U and V of the pre-trained weights as a fixed reference frame to learn a low-rank update applied to the singular value matrix. A soft …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.0
2026-06-18 · Ganlin Yang, Zhangzheng Tu, Yuqiang Yang, Sitong Mao, Junyi Dong, Tianxing Chen, Jiaqi Peng, Jing Xiong, Jiafei Cao, Jifeng Dai, Wengang Zhou, Yao Mu, Tai Wang
General AI
Memory remains a critical bottleneck for long-horizon robotic manipulation, as standard Vision-Language-Action (VLA) policies often fail when task-relevant cues become occluded or unobservable over time. While existing memory-augmented methods utilize historical context, they either suffer from severe information bottl…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 10.0
2026-06-19 · Jiehui Huang, Yuechen Zhang, Bin Xia, Jiahao Wang, Xu He, Zhenchao Tang, Meng Chu, Xin Tao, Pengfei Wan, Jiaya Jia
General AI
Generating a coherent multi-shot video requires structured cross-shot memory. Subject appearance, scene context, and speaker identity must persist across cuts. Existing approaches either train end-to-end over fixed-length sequences and cannot scale, generate shot-by-shot with memory banks that grow linearly, or orchest…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 10.0
2026-06-27 · Yongjin Yang, Jiarui Liu, Yinghui He, Lechen Zhang, Bernhard Schölkopf, Zhijing Jin
General AI
Reinforcement learning with verifiable rewards (RLVR) has been extended from single-domain training to multi-domain reasoning suites spanning mathematics, programming, and science. However, the training curriculum (how often each domain is sampled) is typically fixed or hand-tuned, even though reasoning skills transfer…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 10.0
2026-06-29 · Daniyel Ayupov, Artur Markov-Tsoy
General AI
We present DreamForge-World 0.1 Preview, a preview foundational world model for real-time interactive world simulation. The system adapts the LongLive 1 autoregressive video stack, itself derived from Wan2.1-T2V-1.3B, with a residual action pathway inspired by the Matrix-Game family. DreamForge-World 0.1 Preview focuse…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 10.0
2026-06-29 · Yuxi Wang, Chengkai Jin, Yufei Liu, Wenqi Ouyang, Tianyi Wei, Zhiwei Zeng, Siyuan Huang, Zhiqi Shen, Xingang Pan
General AI
4D hand motion reconstruction from egocentric video is bottlenecked by clear limitations of existing methods: image-based pipelines depend on a detector that fails under heavy occlusion, while video-based methods rely on temporal modules learned only from scarce hand-pose annotations, a narrow signal insufficient to mo…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.8
2026-04-30 · Andac Demir, Erik W. Anderson, Jeremy L. Jenkins, Srayanta Mukherjee
General AI
In this work, we introduce CellxPert, a scalable multimodal foundation model that unifies single-cell and spatial multi-omics within a common representation space. CellxPert jointly encodes transcriptomic (scRNA-seq), chromatin-accessibility (ATAC-seq), and surface-proteomic (CITE-seq) measurements, while directly inco…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.8
2026-05-02 · Zhiwen Ruan, Yichao Du, Jianjie Zheng, Longyue Wang, Yun Chen, Peng Li, Jinsong Su, Yang Liu, Guanhua Chen
General AI
A promising paradigm for adapting instruction-tuned language models is to learn task-specific updates on a pretrained base model and subsequently merge them into the instruction-tuned model. However, existing approaches typically treat the instruction-tuned model as a passive target that is only involved at the final m…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-05-03 · Zongqian Li, Yixuan Su, Han Zhou, Zihao Fu, Nigel Collier
General AI
Parameter-efficient fine-tuning methods like Low-Rank Adaptation (LoRA) have become essential for deploying large language models, yet their static parameter allocation remains suboptimal for inputs of varying complexity. We present Flexi-LoRA, a novel framework that dynamically adjusts LoRA ranks based on input comple…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-05-03 · Yiyao Wang, Sixian Zhang, Keming Zhang, Xinhang Song, Songjie Du, Shuqiang Jiang
General AI
Existing zero-shot Object Goal Navigation (ObjectNav) methods often exploit commonsense knowledge from large language or vision-language models to guide navigation. However, such knowledge arises from internet-scale text rather than embodied 3D experience, and episodic observations collected during navigation are typic…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-05-04 · Yu-Ju Tsai, Brian Price, Qing Liu, Luis Figueroa, Daniil Pakhomov, Zhihong Ding, Scott Cohen, Ming-Hsuan Yang
General AI
Personalized image completion aims to restore occluded regions in personal photos while preserving identity and appearance. Existing methods either rely on generic inpainting models that often fail to maintain identity consistency, or assume that suitable reference images are explicitly provided. In practice, suitable …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-05-04 · Frederic Grabowski, Jacek Szczerbiński, Maciej Jaśkowski, Kalina Jasińska-Kobus, Paweł Dąbrowski-Tumański, Tomasz Jetka, Bartosz Topolski
General AI
Molecular property models increasingly support high-stakes drug-discovery decisions, but their outputs are often difficult to audit: classical predictors return scores without rationale, while language models can produce fluent explanations weakly grounded in the input molecule. We introduce Bolek, a compact multimodal…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-05-04 · Danil Tokhchukov, Veronika Morozova, Gonzalo Ferrer
General AI
Traditional Simultaneous Localization and Mapping (SLAM) algorithms rely heavily on the static environment assumption, which severely limits their applicability in real-world spaces populated by moving entities, such as pedestrians. In this work, we propose DynoSLAM, a tightly-coupled Dynamic GraphSLAM architecture tha…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-05-07 · Daniel Zheng, Ingrid von Glehn, Yori Zwols, Iuliya Beloshapka, Lars Buesing, Daniel M. Roy, Martin Wattenberg, Bogdan Georgiev, Tatiana Schmidt, Andrew Cowie, Fernanda Viegas, Dimitri Kanevsky, Vineet Kahlon, Hartmut Maennel, Sophia Alj, George Holland, Alex Davies, Pushmeet Kohli
General AI
We introduce the AI co-mathematician, a workbench for mathematicians to interactively leverage AI agents to pursue open-ended research. The AI co-mathematician is optimized to provide holistic support for the exploratory and iterative reality of mathematical workflows, including ideation, literature search, computation…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-05-07 · Apurva Gandhi, Satyaki Chakraborty, Xiangjun Wang, Aviral Kumar, Graham Neubig
General AI
We introduce Recursive Agent Optimization (RAO), a reinforcement learning approach for training recursive agents: agents that can spawn and delegate sub-tasks to new instantiations of themselves recursively. Recursive agents implement an inference-time scaling algorithm that naturally allows agents to scale to longer c…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-05-11 · Pau de las Heras Molins, Beyazit Yalcinkaya, Lasse Peters, David Fridovich-Keil, Georgios Bakirtzis
General AI
Multi-objective reinforcement learning (MORL) allows a user to express preference over outcomes in terms of the relative importance of the objectives, but standard metrics cannot capture whether changes in preference reliably change the agent's behavior in the intended way, a property termed controllability. As a resul…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-05-12 · Miaosen Zhang, Xiaohan Zhao, Zhihong Tan, Zhou Huoshen, Yijia Fan, Yifan Yang, Kai Qiu, Bei Liu, Justin Wagle, Chenzhong Yin, Mingxi Cheng, Ji Li, Qi Dai, Chong Luo, Xu Yang, Xin Geng, Baining Guo
Research Track B · General AI
Computer-use agents (CUAs) automate on-screen work, as illustrated by GPT-5.4 and Claude. Yet their reliability on complex, low-frequency interactions is still poor, limiting user trust. Our analysis of failure cases from advanced models suggests a long-tail pattern in GUI operations, where a relatively small fraction …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-05-12 · Jacob Fein-Ashley, Paria Rashidinejad
General AI
Looped Transformers offer a promising alternative to purely feed-forward computation by iteratively refining latent representations, improving language modeling and reasoning. Yet recurrent architectures remain unstable to train, costly to optimize and deploy, and constrained to small, fixed recurrence depths. We intro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-05-21 · Guangya Hao, Yunbo Long, Zhuokai Zhao
General AI
Self-evolving multi-agent systems (MAS) have emerged as a promising route to LLM agents that continually improve from experience, with persistent memory at their foundation. However, existing designs almost exclusively adopt a centralized repository shared across agents, incurring communication and coordination overhea…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.8
2026-05-22 · Alessandro Sosso, Akhil Arora, Bas Spitters
General AI
Agentic systems have recently emerged as state-of-the-art approaches for automated theorem proving in formal mathematics. To assess how far these capabilities extend to program verification, we evaluate Claude Code in an agentic proving framework on CLEVER, a Lean 4 benchmark for verifiable code generation. Our results…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-05-22 · Aneesh Komanduri, Xintao Wu
General AI
Causal generative modeling is essential for developing reliable and transparent AI systems capable of counterfactual reasoning. While existing approaches focus on integrating causal constraints during the training of generative models, they often lack a unified framework to leverage the zero-shot reasoning capabilities…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-05-22 · Jianshu Zhang, Yijiang Li, Huifeixin Chen, Haoran Lu, Letian Xue, Bingyang Wang, Han Liu
General AI
Vision-Language Models (VLMs) are increasingly deployed in embodied environments, where they need produce numerical outputs such as action magnitudes and spatial coordinates. Although these numbers appear meaningful, it remains unclear whether these numerical outputs are genuinely grounded in spatial perception. Theref…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-05-27 · Junfeng Nie, Alvin Jin, Xiaohui Chen
General AI
Existing approaches for synthetic tabular data generation are based on either purely generative models or LLMs, both of which struggle with data heterogeneity, logical consistency, rare-event coverage, and robustness in low-data regimes. In this paper, we propose a hierarchical hybrid top-down and bottom-up (H-TDBU) fr…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.8
2026-05-28 · Tingle Li, Siddharth Gururani, Kevin J. Shih, Gantavya Bhatt, Sang-gil Lee, Zhifeng Kong, Arushi Goel, Gopala Anumanchipalli, Ming-Yu Liu
General AI
Generative video-to-audio (V2A) models produce highly plausible soundtracks, but it remains unclear whether they capture the underlying physical processes. Existing evaluations emphasize perceptual realism and overlook physical correctness under controlled interventions. In this paper, we introduce FlatSounds, a benchm…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-05-28 · Hidir Yesiltepe, Jiazhen Hu, Tuna Han Salih Meral, Adil Kaan Akan, Kaan Oktay, Hoda Eldardiry, Pinar Yanardag
General AI
Long-rollout causal video diffusion has converged on a fixed-size sliding-window KV cache, with recent progress innovating within this layout by changing which tokens occupy the window or how their positions are encoded. The per-head KV layout itself, a dominant contributor to streaming memory and latency, has been mos…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-05-28 · Benjamin A. Burns, Sara Fridovich-Keil
General AI
Diffusion models have excellent capacity to model complex distributions of natural data, which has made them a popular and effective choice for posterior sampling in imaging inverse problems. Existing methods can incorporate any measurement model at inference time but must use an inexact approximation for the likelihoo…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-05-29 · Adrian de Wynter
General AI
Much research has been carried out on large language models (LLMs) and LLM-powered agentic workflows. However, many works within the field state emergence of, ascribe to, or assume, generalised anthropomorphic attributes to them (e.g., morality or understanding of natural language). Our goal is not to argue in favour o…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-05-29 · Olaf Dünkel, Basavaraj Sunagad, Haoran Wang, David T. Hoffmann, Christian Theobalt, Adam Kortylewski
General AI
Measuring structured object understanding in vision foundation models remains challenging due to inconsistent evaluation protocols and limited part-level supervision. Semantic correspondence (SC) evaluates this capability by testing whether object parts can be matched across instances and categories under large variati…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.8
2026-06-29 · Nicola Borri, Yukun Liu, Aleh Tsyvinski
General AI
Using 380 trillion tokens of realized AI consumption across more than four hundred large language models from the licensed proprietary OpenRouter dataset covering approximately 2 percent of current global monthly AI token consumption, we analyze how AI affects firms, markets, and workers. Leveraging the unprecedented s…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.8
2026-06-29 · Haoran Jin, Xiting Wang, Shijie Ren, Hong Xie, Defu Lian
General AI
Sparse Autoencoders (SAEs) are widely used to interpret large language models by decomposing activations into sparse, human-understandable features, but scaling to large dictionaries exposes fundamental challenges. Systematic studies reveal pervasive feature splitting that fragments coherent concepts into non-atomic la…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.8
2026-06-29 · Taixi Chen, Nancy Guo
General AI
Large-scale multimodal models (LMMs) have achieved superior performance in visual recognition by synergizing information across diverse, massive-scale paired modalities. In real-world scenarios, however, missing-modality inputs are ubiquitous, causing models optimized for modality-complete data to exhibit precipitous p…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.8
2026-06-29 · Lei Bai, Zongsheng Cao, Yang Chen, Zhiyao Cui, Shangheng Du, Yue Fan, Shiyang Feng, Zijie Guo, Haonan He, Liang He, Xiaohan He, Shuyue Hu, Yusong Hu, Songtao Huang, Yichen Jiang, Hao Li, Xin Li, Dahua Lin, Weihao Lin, Fenghua Ling, Dongrui Liu, Zhuo Liu, Runmin Ma, Chunjiang Mu, Haoyang Peng, Tianshuo Peng, Jinxin Shi, Luohe Shi, Boyuan Sun, Zelin Tan, Shengji Tang, Qianyi Wang, Yiming Wu, Yi Xie, Xiangchao Yan, Jingqi Ye, Peng Ye, Fangchen Yu, Jiakang Yuan, Bihao Zhan, Bo Zhang, Chen Zhang, Shufei Zhang, Shuaiyu Zhang, Wenlong Zhang, Yiqun Zhang, Junpeng Zhao, Zhijie Zhong, Bowen Zhou, Yuhao Zhou
General AI
We introduce Agents-A1, a 35B Mixture-of-Experts Agentic Model that reaches trillion-parameter-level performance by scaling the agent horizon. We investigate agent-horizon scaling from two perspectives: scaling long-horizon trajectories and scaling heterogeneous agent abilities. To support this goal, we build a long-ho…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.8
2026-07-01 · Max Van Puyvelde, Halil Ibrahim Gulluk, Wim Van Criekinge, Olivier Gevaert
General AI
Diffusion language models, which generate text by denoising a token canvas bidirectionally instead of emitting tokens left to right, have become competitive with autoregressive (AR) generation. Medical foundation models, however, remain almost entirely autoregressive. We adapt a mixture-of-experts diffusion language mo…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.8
2026-07-02 · Jiayin Zhu, Kelong Mao, Yudong Guo, Dengbo He, Sulong Xu, Simiu Gu, Yutao Yue
General AI
Skills are becoming a reusable operational layer for LLM agents, encoding SOPs, domain rules, tool workflows, scripts, and validation routines. In realistic skill repositories, overlapping skills make reliable skill-use difficult. Final verifier success is too coarse for both evaluation and training, since an agent may…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.6
2026-07-01 · Ran Yan, Wei Fu, Jiale Li, Shusheng Xu, Zhiyu Mei, Jiaxuan Gao, Jiarui Zhang, Wentai Zhang, Hao Dai, Xujie Shen, Chuyi He, Zhen Pu, Jun Mei, Zhiyao Lin, Haitao Wang, Zhiqiang Ding, Jiawei Zhang, Huaijie Wang, Ruida Xu, Honghua Dong, Youhe Jiang, Yi Wu, Tongkai Yang, Binhang Yuan
General AI
LLM agents are rapidly being deployed in production, including coding assistants, customer-support chatbots, and scientific research assistants, yet they remain fundamentally static in enterprise deployment. The LLM weights, system prompts, tool repertoires, and in-context harnesses are frozen at deployment time, and a…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.6
2026-07-02 · Kent K. Chang
General AI
Language models are increasingly used to quantify cultural phenomena, but what makes such measurement distinctively cultural? This paper argues that NLP work on culture is a material-discursive practice: the apparatus -- model, data, annotation, evaluation -- participates in constituting the cultural reality it measure…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.6
2026-07-02 · Mona Schirmer, Metod Jazbec, Alexander Timans, Christian Naesseth, Maja Waldron, Eric Nalisnick
General AI
Despite alignment training, LLMs remain prone to generating unsafe outputs at deployment time. Monitoring outputs online and raising an alarm when safety can no longer be assumed is therefore critical. We study a simple real-time monitor that turns a verifier signal from an external model into an alarm decision by thre…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.5
2026-01-14 · Saber Zerhoudi, Michael Granitzer
Research Track B · General AI
A fundamental tension exists between the demand for sophisticated AI assistance in web search and the need for user data privacy. Current centralized models require users to transmit sensitive browsing data to external services, which limits user control. In this paper, we present a browser extension that provides a vi…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.5
2026-02-10 · Talor Abramovich, Maor Ashkenazi, Carl, Putterman, Benjamin Chislett, Tiyasa Mitra, Bita Darvish Rouhani, Ran Zilberstein, Yonatan Geifman
General AI
Speculative Decoding (SD) has emerged as a critical technique for accelerating Large Language Model (LLM) inference. Unlike deterministic system optimizations, SD performance is inherently data-dependent, meaning that diverse and representative workloads are essential for accurately measuring its effectiveness. Existin…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.5
2026-03-04 · Soroush Nasiriany, Sepehr Nasiriany, Abhiram Maddukuri, Yuke Zhu
Research Track A · General AI
Recent advances in robot learning have accelerated progress toward generalist robots that can perform everyday tasks in human environments. Yet it remains difficult to gauge how close we are to this vision. The field lacks a reproducible, large-scale benchmark for systematic evaluation. To fill this gap, we present Rob…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.5
2026-03-14 · Seokmin Lee, Yunghee Lee, Byeonghyun Pak, Byeongju Woo
General AI
For robotic agents operating in dynamic environments, learning visual state representations from streaming video observations is essential for sequential decision making. Recent self-supervised learning methods have shown strong transferability across vision tasks, but they do not explicitly address what a good visual …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.5
2026-03-15 · Lok-Lam Ieong, Chia-Chien Chen, Chih-Kai Yang, Yu-Han Huang, An-Yu Cheng, Hung-yi Lee
General AI
Chain-of-thought (CoT) prompting has been extended to large audio-language models (LALMs) to elicit reasoning, yet enhancing its effectiveness without training remains challenging. We study inference-time model steering as a training-free approach to improve LALM reasoning. We introduce three strategies using diverse i…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.5
2026-03-19 · Ke-Han Lu, Szu-Wei Fu, Chao-Han Huck Yang, Zhehuai Chen, Sung-Feng Huang, Chih-Kai Yang, Yi-Cheng Lin, Chi-Yuan Hsiao, Wenze Ren, En-Pei Hu, Yu-Han Huang, An-Yu Cheng, Cheng-Han Chiang, Yu Tsao, Yu-Chiang Frank Wang, Hung-yi Lee
General AI
Large language models (LLMs) have been widely used as knowledge backbones of Large Audio Language Models (LALMs), yet how much auditory knowledge they encode through text-only pre-training and how this affects downstream performance remains unclear. We study this gap by comparing different LLMs under two text-only and …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.5
2026-03-22 · Liang Ding
Research Track B · General AI
LLM agents fail on the majority of real-world tasks -- GPT-4o succeeds on fewer than 15% of WebArena navigation tasks and below 55% pass@1 on ToolBench (Zhou et al., 2024; Qin et al., 2024) -- yet every failed trajectory is routinely discarded, wasting the dominant source of collected experience. We introduce AgentHER,…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.5
2026-03-24 · Wanying Mo, Jijia Lai, Xiaoming Wang
Research Track B · General AI
Browser agents built on LLMs can act in web interfaces, yet most remain confined to a single chat surface (e.g., a sidebar). This mismatch with real browsing can increase context-switching and reduce user control. We introduce \textbf{IntentWeave}, a design space of ten spatial paradigms for embedding agentic assistanc…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.5
2026-03-26 · Jiaqing Zhang, Hao Wang, Mingjia Yin, Bo Chen, Qinglin Jia, Rui Zhou, Ruiming Tang, ChaoYi Ma, Enhong Chen
Research Track A · General AI
Modern deep recommender models are trained under a continual learning paradigm, relying on massive and continuously growing streaming behavioral logs. In large-scale platforms, retraining models on full historical data for architecture comparison or iteration is prohibitively expensive, severely slowing down model deve…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.5
2026-03-27 · Zhaochong An, Orest Kupyn, Théo Uscidda, Andrea Colaco, Karan Ahuja, Serge Belongie, Mar Gonzalez-Franco, Marta Tintore Gazulla
General AI
Large-scale video diffusion models achieve impressive visual quality, yet often fail to preserve geometric consistency. Prior approaches improve consistency either by augmenting the generator with additional modules or applying geometry-aware alignment. However, architectural modifications can compromise the generaliza…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-03-30 · Zhang Li, Zhibo Lin, Qiang Liu, Ziyang Zhang, Shuo Zhang, Zidun Guo, Jiajun Song, Jiarui Zhang, Xiang Bai, Yuliang Liu
General AI
We introduce Multilingual Document Parsing Benchmark, the first benchmark for multilingual digital and photographed document parsing. Document parsing has made remarkable strides, yet almost exclusively on clean, digital, well-formatted pages in a handful of dominant languages. No systematic benchmark exists to evaluat…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-03-31 · Qiyao Wang, Hongbo Wang, Longze Chen, Zhihao Yang, Guhong Chen, Hamid Alinejad-Rokny, Hui Li, Yuan Lin, Min Yang
General AI
Scientific idea generation (SIG) is critical to AI-driven autonomous research, yet existing approaches are often constrained by a static retrieval-then-generation paradigm, leading to homogeneous and insufficiently divergent ideas. In this work, we propose FlowPIE, a tightly coupled retrieval-generation framework that …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.5
2026-03-31 · Patrice Bechard, Orlando Marquez Ayala, Emily Chen, Jordan Skelton, Sagar Davasam, Srinivas Sunkara, Vikas Yadav, Sai Rajeswar
Research Track B · General AI
There has been growing interest in building agents that can interact with digital platforms to execute meaningful enterprise tasks autonomously. Among the approaches explored are tool-augmented agents built on abstractions such as Model Context Protocol (MCP) and web agents that operate through graphical interfaces. Ye…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-04-06 · Haoxuan Han, Weijie Wang, Zeyu Zhang, Yefei He, Bohan Zhuang
Research Track A · General AI
Recent advancements in Vision-Language Models (VLMs) have significantly pushed the boundaries of Visual Question Answering (VQA).However,high-resolution details can sometimes become noise that leads to hallucinations or reasoning errors. In this paper,we propose Degradation-Driven Prompting (DDP), a novel framework tha…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-04-08 · Yuechen Jiang, Enze Zhang, Md Mohsinul Kabir, Qianqian Xie, Stavroula Golfomitsou, Konstantinos Arvanitis, Sophia Ananiadou
General AI
Recent advances in vision-language models (VLMs) have improved image captioning for cultural heritage. However, inferring structured cultural metadata (e.g., creator, origin, period) from visual input remains underexplored. We introduce a multi-category, cross-cultural benchmark for this task and evaluate VLMs using an…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-04-08 · Jianhui Liu, Haoze Sun, Wenbo Li, Yanbing Zhang, Rui Yang, Zhiliang Zhu, Yijun Yang, Shenghe Zheng, Nan Jiang, Jiaxiu Jiang, Haoyang Huang, Tien-Tsin Wong, Nan Duan, Xiaojuan Qi
General AI
Spatial understanding is a fundamental cornerstone of human-level intelligence. Nonetheless, current research predominantly focuses on domain-specific data production, leaving a critical void: the absence of a principled, open-source engine capable of fully unleashing the potential of high-quality spatial data. To brid…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-04-12 · Song Jin, Juntian Zhang, Xun Zhang, Zeying Tian, Fei Jiang, Guojun Yin, Wei Lin, Yong Liu, Rui Yan
General AI
Recent advancements in Vision-Language Models (VLMs) have revolutionized general visual understanding. However, their application in the food domain remains constrained by benchmarks that rely on coarse-grained categories, single-view imagery, and inaccurate metadata. To bridge this gap, we introduce DiningBench, a hie…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-04-12 · Udari Madhushani Sehwag, Elaine Lau, Haniyeh Ehsani Oskouie, Shayan Shabihi, Erich Liang, Andrea Toledo, Guillermo Mangialardi, Sergio Fonrouge, Ed-Yeremai Hernandez Cardona, Paula Vergara, Utkarsh Tyagi, Chen Bo Calvin Zhang, Pavi Bhatter, Nicholas Johnson, Furong Huang, Ernesto Gabriel Hernandez Montoya, Bing Liu
General AI
Accelerating scientific discovery requires the identification of which experiments would yield the best outcomes before committing resources to costly physical validation. While existing benchmarks evaluate LLMs on scientific knowledge and reasoning, their ability to predict experimental outcomes - a task where AI coul…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-04-15 · Team HY-World, Chenjie Cao, Xuhui Zuo, Zhenwei Wang, Yisu Zhang, Junta Wu, Zhenyang Liu, Yuning Gong, Yang Liu, Bo Yuan, Chao Zhang, Coopers Li, Dongyuan Guo, Fan Yang, Haiyu Zhang, Hang Cao, Jianchen Zhu, Jiaxin Lin, Jie Xiao, Jihong Zhang, Junlin Yu, Lei Wang, Lifu Wang, Lilin Wang, Linus, Minghui Chen, Peng He, Penghao Zhao, Qi Chen, Rui Chen, Rui Shao, Sicong Liu, Wangchen Qin, Xiaochuan Niu, Xiang Yuan, Yi Sun, Yifei Tang, Yifu Sun, Yihang Lian, Yonghao Tan, Yuhong Liu, Yuyang Yin, Zhiyuan Min, Tengfei Wang, Chunchao Guo
General AI
We introduce HY-World 2.0, a multi-modal world model framework that advances our prior project HY-World 1.0. HY-World 2.0 accommodates diverse input modalities, including text prompts, single-view images, multi-view images, and videos, and produces 3D world representations. With text or single-view image inputs, the mo…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-04-16 · Quyen Tran, Hai Nguyen, Hoang Phan, Quan Dao, Linh Ngo, Khoat Than, Dinh Phung, Dimitris Metaxas, Trung Le
General AI
In online incremental learning, data continuously arrives with substantial distributional shifts, creating a significant challenge because previous samples have limited replay value when learning a new task. Prior research has typically relied on either a single adaptive centroid or multiple fixed centroids to represen…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-04-18 · Xinru Yan, Boxi Cao, Yaojie Lu, Hongyu Lin, Weixiang Zhou, Le Sun, Xianpei Han
General AI
Native Omni-modal Large Language Models (OLLMs) have shifted from pipeline architectures to unified representation spaces. However, this native integration gives rise to a critical yet underexplored phenomenon: modality preference. To bridge this gap, we first systematically quantify modality preference of OLLMs using …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-04-19 · Liyang Wang, Zeyu Zhang, Hao Tang
General AI
Scene graph representations enable structured visual understanding by modeling objects and their relationships, and have been widely used for multiview and 3D scene reasoning. Existing methods such as MSG learn scene graph embeddings in Euclidean space using contrastive learning and attention based association. However…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.5
2026-04-20 · Weixi Tong, Yifeng Di, Tianyi Zhang
Research Track B · General AI
Existing web agents typically initiate exploration from the root URL, which is inefficient for complex websites with deep hierarchical structures. Without a global view of the website's structure, agents frequently fall into navigation traps, explore irrelevant branches, or fail to reach target information within a lim…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.5
2026-04-20 · Yu Zhang, Chuyang Sun, Kehai Chen, Xuefeng Bai, Yang Xiang, Min Zhang
General AI
Large Vision-Language Models (LVLMs) still struggle with vision hallucination, where generated responses are inconsistent with the visual input. Existing methods either rely on large-scale annotated data for fine-tuning, which incurs massive computational overhead, or employ static post-hoc strategies that overlook the…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-04-22 · Hardy Chen, Nancy Lau, Haoqin Tu, Shuo Yan, Xiangyan Liu, Zijun Wang, Juncheng Wu, Michael Qizhe Shieh, Alvaro A. Cardenas, Cihang Xie, Yuyin Zhou
General AI
Frontier coding agents are increasingly used in workflows where users supervise progress primarily through repeated improvement of a public score, namely the reported score on a public evaluation file with labels in the workspace, rather than through direct inspection of the agent's intermediate outputs. We study wheth…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.5
2026-04-23 · Jebacyril Arockiaraj, Dhruv Parikh, Viktor Prasanna
Research Track A · General AI
On-device continual learning (CL) is critical for edge AI systems operating on non-stationary data streams, but most existing methods rely on backpropagation or exemplar-heavy classifiers, incurring substantial compute, memory, and latency overheads. Hyperdimensional computing (HDC) offers a lightweight alternative thr…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-04-23 · Mohammed Safi Ur Rahman Khan, Sanjay Suryanarayanan, Tushar Anand, Mitesh M. Khapra
General AI
Large Vision-Language Models (VLMs) are increasingly used to evaluate outputs of other models, for image-to-text (I2T) tasks such as visual question answering, and text-to-image (T2I) generation tasks. Despite this growing reliance, the reliability of these Evaluator VLMs remains under explored. In this work, we system…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-04-24 · Zhengxu Yu, Yu Fu, Zhiyuan He, Yuxuan Huang, Lee Ka Yiu, Meng Fang, Weilin Luo, Jun Wang
General AI
Individual agent capabilities have advanced rapidly through modular skills and tool integrations, yet multi-agent systems remain constrained by fixed team structures, tightly coupled coordination logic, and session-bound learning. We argue that this reflects a deeper absence: a principled organisational layer that gove…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-04-25 · Yihan Wang, Lei Li, Yao Lai, Jing Wang, Yan Lu
General AI
Analog circuit design relies heavily on reusing existing intellectual property (IP), yet searching across heterogeneous representations such as SPICE netlists, schematics, and functional descriptions remains challenging. Existing methods are largely limited to exact matching within a single modality, failing to capture…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.5
2026-04-25 · Yizheng Huang, Wenjun Zeng, Aditi Kumaresan, Zi Wang
General AI
Evaluating generative AI models is increasingly resource-intensive due to slow inference, expensive raters, and a rapidly growing landscape of models and benchmarks. We propose ProEval, a proactive evaluation framework that leverages transfer learning to efficiently estimate performance and identify failure cases. ProE…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.5
2026-04-26 · Tin Nguyen, Thang T. Truong, Runtao Zhou, Trung Bui, Chirag Agarwal, Anh Totti Nguyen
Research Track B · General AI
Users browsing the web daily struggle to quickly locate relevant information in cluttered pages, complete unfamiliar multi-step tasks, and stay focused amid distracting content. State-of-the-art AI assistants (e.g., ChatGPT, Gemini, Claude) and browser agents (e.g., OpenAI Operator, Browser Use) can answer questions an…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-04-26 · Qi Li, Bo Yin, Weiqi Huang, Ruhao Liu, Bojun Zou, Runpeng Yu, Jingwen Ye, Weihao Yu, Xinchao Wang
General AI
Vision-Language-Action (VLA) models are emerging as a unified substrate for embodied intelligence. This shift raises a new class of safety challenges, stemming from the embodied nature of VLA systems, including irreversible physical consequences, a multimodal attack surface across vision, language, and state, real-time…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-04-27 · Hongxin Li, Xiping Wang, Jingran Su, Zheng Ju, Yuntao Chen, Qing Li, Zhaoxiang Zhang
Research Track B · General AI
Autonomous agents capable of navigating Graphical User Interfaces (GUIs) hold the potential to revolutionize digital productivity. However, achieving true digital autonomy extends beyond reactive element matching; it necessitates a predictive mental model of interface dynamics and the ability to foresee the "digital wo…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-04-27 · Yiming Zhang, Jiacheng Chen, Jiaqi Tan, Yongsen Mao, Wenhu Chen, Angel X. Chang
General AI
Current evaluations of spatial intelligence can be systematically invalid under modern vision-language model (VLM) settings. First, many benchmarks derive question-answer (QA) pairs from point-cloud-based 3D annotations originally curated for traditional 3D perception. When such annotations are treated as ground truth …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-04-29 · Yibin Luo, Shiwei Gao, Huichuan Zheng, Youyou Lu, Jiwu Shu
General AI
Fine-tuning Large Language Models (LLMs) on consumer-grade GPUs is highly cost-effective, yet constrained by limited GPU memory and slow PCIe interconnects. Pipeline parallelism combined with CPU offloading mitigates these hardware bottlenecks by reducing communication overhead. However, existing PP schedules suffer fr…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-04-30 · Hanzhong Guo, Jie Wu, Jie Liu, Yu Gao, Zilyu Ye, Linxiao Yuan, Xionghui Wang, Yizhou Yu, Weilin Huang
General AI
While Reinforcement Learning from Human Feedback (RLHF) has become a pivotal paradigm for text-to-image generation, its application to image editing remains largely unexplored. A key bottleneck is the lack of a robust general reward model for all editing tasks. Existing edit reward models usually give overall scores wi…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-05-01 · Indraneil Paul, Glavaš Glavas, Iryna Gurevych
General AI
Reward models (RMs) have become an indispensable fixture of the language model (LM) post-training playbook, enabling policy alignment and test-time scaling. Research on the application of RMs in code generation, however, has been comparatively sparse, with existing work largely focusing on execution feedback. This choi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.5
2026-05-06 · William T. Redman, Erik C. Johnson, Brian Robinson
Research Track A · General AI
Identifying and exploiting common features across domains is at the heart of the human ability to make analogies, and is believed to be crucial for the ability to continually learn. To do this successfully, general and flexible computational strategies must be developed. While the extent to which Transformer neural net…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.5
2026-05-07 · Wenhan Zheng, Yuyi Mao, Ivan Wang-Hei Ho
Research Track A
Channel state information (CSI)-based human activity recognition (HAR) is vulnerable to performance degradation under domain shifts across varying physical environments. Continual learning (CL) offers a principled way to learn new domains sequentially while preserving past knowledge, but existing CL solutions for CSI-b…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-05-28 · Hesong Wang, Xin Jin, Lu Lu, Chenhaowen Li, Jian Chen, Qiang Liu, Huan Wang
Research Track A · General AI
Video large language models (Video-LLMs) have demonstrated strong capabilities in video understanding tasks. However, their practical deployment is still hindered by the inefficiency introduced by processing massive amounts of visual tokens. Although recent approaches achieve extremely low token retention ratios while …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.5
2026-05-29 · Xiaosong Han, Ke Chen, Xindi Dai, Di Liang, Minlong Peng, Wei Pang, Fausto Giunchiglia, Xiaoyue Feng, Yonghao Liu, Renchu Guan
Research Track A · General AI
In real-world deployment, LLMs are often adapted continually across tasks to keep LLMs up-to-date in production, where new fine-tuning should preserve previously learned skills. However, indiscriminately mixing tasks can dilute task specialization, while sequential fine-tuning (full-parameter or low rank adaptation) of…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-06-04 · Yuejie Li, Yueying Hua, Ke Yang, Li Zhang, Yueping He, Ruiqi Li, Bolin Chen, Tao Wang, Bowen Li, Chengjun Mao
General AI
Retrieval-augmented QA pipelines often route retrieved passages through an LLM rewriter before a smaller reader, lifting F1 by tens of points on multi-hop benchmarks; this gain is typically credited to improved evidence quality. We ask whether that lift is causally driven by the gold answer string appearing in the rewr…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.5
2026-06-09 · Yu Lu, Junjie Yang, Piotr Koniusz, YuXin Song, Yi Yang
Research Track A · General AI
Autoregressive video generators synthesize long videos by generating successive temporal segments, but their historical KV cache grows with video length. Existing bounded-cache methods reduce this cost with local windows, sink tokens, or compressed memory states, yet they usually assign fixed roles to different parts o…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-06-10 · Zhuofan Shi, Mingzhe Ma, Lu Wang, Fangkai Yang, Pu Zhao, Yiming Guan, Youling Huang, Wei Zhang, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan
General AI
Deep search requires agents to answer complex questions through multi-step web search, browsing, evidence comparison, and synthesis. A central challenge is deciding how to search when several directions look plausible but only some will later lead to reliable evidence. If an agent greedily follows the current best-look…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.5
2026-06-15 · Yinhan He, Liam Collins, Bhuvesh Kumar, Jundong Li, Neil Shah, Donald Loveland
General AI
Large Language Models (LLMs) are increasingly adopted as backbones for Generative Recommendation (GR), promising access to pretrained world knowledge. Yet reliably invoking this knowledge for GR remains poorly understood. A key obstacle is that LLM-based GR typically represents items with Semantic IDs (SIDs), disruptin…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-06-15 · Shuai Yang, Bingjie Gao, Ziwei Liu, Jiaqi Wang, Dahua Lin, Tong Wu
General AI
Consistent video generation under editing operations requires persistence: when edits modify scene appearance or layout, subsequent generations should remain coherent across time and viewpoints. However, existing memory designs struggle to maintain long-term consistency after such modifications, as stored contexts may …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-06-16 · Zhexiao Xiong, Yizhi Song, Hao Kang, Qing Yan, Liming Jiang, Jenson Yang, Zhoujie Fu, Stathi Fotiadis, Angtian Wang, Zichuan Liu, Bo Liu, Yiding Yang, Xin Lu, Nathan Jacobs
Research Track A · General AI
Interactive world models aim to simulate environment dynamics under real-time user actions. However, their action vocabulary is largely confined to navigation: most actions correspond to motion (e.g., walk, turn, look around), while interaction with objects in the scene (e.g., pick up plates, open doors, or trigger phy…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-06-16 · Byung-Kwan Lee, Ximing Lu, Shizhe Diao, Minki Kang, Saurav Muralidharan, Karan Sapra, Andrew Tao, Pavlo Molchanov, Yejin Choi, Yu-Chiang Frank Wang, Ryo Hachiuma
General AI
Knowledge distillation transfers a teacher's competence to a small student but is brittle in the small-student regime: forcing the student to imitate logits from a much larger teacher concentrates it on the teacher's sharpest modes, hurting generalization on benchmark families beyond the training corpus. Reinforcement …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2025-12-08 · Alisha Ukani, Hamed Haddadi, Ali Shahin Shamsabadi, Peter Snyder
Research Track B · General AI
This paper presents a systematic evaluation of the privacy behaviors and attributes of eight recent, popular browser agents. Browser agents are software that automate Web browsing using large language models and ancillary tooling. However, the automated capabilities that make browser agents powerful also make them high…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.3
2026-03-03 · Patrick J. Mineault, Thomas L. Griffiths, Sean Escola
Research Track A · General AI
We propose that the jagged intelligence landscape of modern AI systems arises from a missing training signal that we call "cognitive dark matter" (CDM): brain functions that meaningfully shape behavior yet are hard to infer from behavior alone. We identify key CDM domains-metacognition, cognitive flexibility, episodic …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.3
2026-03-26 · Geeyang Tay, Wentao Ma, Jaewon Lee, Yuzhi Tang, Daniel Lee, Weisu Yin, Dongming Shen, Silin Meng, Yi Zhu, Mu Li, Alex Smola
General AI
Automatic speech recognition (ASR) systems have achieved near-human accuracy on curated benchmarks, yet still fail in real-world voice agents under conditions that current evaluations do not systematically cover. Without diagnostic tools that isolate specific failure factors, practitioners cannot anticipate which condi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-03-26 · Yan Li, Zezi Zeng, Ziwei Zhou, Xin Gao, Muzhao Tian, Yifan Yang, Mingxi Cheng, Qi Dai, Yuqing Yang, Lili Qiu, Zhendong Wang, Zhengyuan Yang, Xue Yang, Lijuan Wang, Ji Li, Chong Luo
General AI
Recent advances in image generation models have expanded their applications beyond aesthetic imagery toward practical visual content creation. However, existing benchmarks mainly focus on natural image synthesis and fail to systematically evaluate models under the structured and multi-constraint requirements of real-wo…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-03-26 · Hai X. Pham, David T. Hoffmann, Ricardo Guerrero, Brais Martinez
General AI
Contrastive vision-language (V&L) models remain a popular choice for various applications. However, several limitations have emerged, most notably the limited ability of V&L models to learn compositional representations. Prior methods often addressed this limitation by generating custom training data to obtain hard neg…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-03-26 · Yuxing Lu, Xukai Zhao, Wei Wu, Jinzhuo Wang
General AI
The knowledge base in a retrieval-augmented generation (RAG) system is typically assembled once and never revised, even though the facts a query requires are often fragmented across documents and buried in irrelevant content. We argue that the knowledge base should be treated as a trainable component and propose WriteB…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-03-30 · Omer Dahary, Benaya Koren, Daniel Garibi, Daniel Cohen-Or
General AI
Modern Text-to-Image (T2I) diffusion models have achieved remarkable semantic alignment, yet they often suffer from a significant lack of variety, converging on a narrow set of visual solutions for any given prompt. This typicality bias presents a challenge for creative applications that require a wide range of generat…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-03-31 · Yufeng Li, Rrubaa Panchendrarajan, Arkaitz Zubiaga
General AI
Verifiable claim detection asks whether a claim expresses a factual statement that can, in principle, be assessed against external evidence. As an early filtering stage in automated fact-checking, it plays an important role in reducing the burden on downstream verification components. However, existing approaches to cl…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-03-31 · Prasanjit Dey, Soumyabrata Dev, Bianca Schoen-Phelan
General AI
We address the challenge of adapting pre-trained Large Language Models (LLMs) for multivariate time-series analysis, where their deployment is often hindered by prohibitive computational and memory demands. Our solution, One-for-All, introduces Gaussian Rank-Stabilized Low-Rank Adapters (rsLoRA) to enable parameter-eff…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-01 · Ankit Grover, Lodovico Giaretta, Rémi Bourgerie, Sarunas Girdzijauskas
General AI
The integration of Graph Neural Networks (GNNs) with Large Language Models (LLMs) has emerged as a promising paradigm for Graph Question Answering (GraphQA). However, effective methods for encoding complex structural information into the LLM's latent space remain an open challenge. Current state-of-the-art architecture…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-02 · Sarath Shekkizhar, Romain Cosentino, Adam Earle
General AI
Standard LLM benchmarks evaluate the assistant turn: the model generates a response to an input, a verifier scores correctness, and the analysis ends. This paradigm leaves unmeasured whether the LLM encodes any awareness of what follows the assistant response. We propose user-turn generation as a probe of this gap: giv…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-02 · Chongjie Ye, Cheng Cao, Chuanyu Pan, Yiming Hao, Yihao Zhi, Yuanming Hu, Xiaoguang Han
General AI
Recent multimodal large language models have achieved strong performance in unified text and image understanding and generation, yet extending such native capability to 3D remains challenging due to limited data. Compared to abundant 2D imagery, high-quality 3D assets are scarce, making 3D synthesis under-constrained. …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-07 · Yanis Labrak, David Grünert, Séverin Baroudi, Jiyun Chun, Pawel Cyrta, Sergio Burdisso, Ahmed Hassoon, David Liu, Adam Rothschild, Reed Van Deusen, Petr Motlicek, Andrew Perrault, Ricard Marxer, Thomas Schaaf
General AI
Long-context audio reasoning is underserved in both training data and evaluation. Existing benchmarks target short-context tasks, and the open-ended generation tasks most relevant to long-context reasoning pose well-known challenges for automatic evaluation. We propose a synthetic data generation pipeline designed to s…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-08 · Jagadeesh Chundru
Research Track B · General AI
LLM-driven web agents operating through continuous inference loops -- repeatedly querying a model to evaluate browser state and select actions -- exhibit a fundamental scalability constraint for repetitive tasks. We characterize this as the Rerun Crisis: the linear growth of token expenditure and API latency relative t…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.3
2026-04-09 · Feng Luo, Yu-Neng Chuang, Guanchu Wang, Zicheng Xu, Xiaotian Han, Tianyi Zhang, Vladimir Braverman
General AI
On-policy distillation (OPD) trains student models under their own induced distribution while leveraging supervision from stronger teachers. We identify a failure mode of OPD: as training progresses, on-policy rollouts can undergo abrupt length inflation, causing truncated trajectories to dominate the training data. Th…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-09 · Boyang Zhang, Sebastián G. Acosta, Preston Carlson, Sacha Bron, Pierre-Loïc Doulcet, Simon Suo
General AI
AI agents are changing the requirements for document parsing. What matters is \emph{semantic correctness}: parsed output must preserve the structure and meaning needed for autonomous decisions, including correct table structure, precise chart data, semantically meaningful formatting, and visual grounding. Existing benc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-13 · Hugh Blayney, Álvaro Arroyo, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Michael M. Bronstein, Xiaowen Dong
General AI
Reasoning has become a central capability in large language models. Recent research has shown that reasoning performance can be improved by looping an LLM's layers in the latent dimension, resulting in looped reasoning language models. Despite promising results, few works have investigated how their internal dynamics d…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-13 · Minh Le-Anh, Cuong Chi Le, Tien N. Nguyen
General AI
Automated Program Repair (APR) has recently benefited from large language models (LLMs). However, most LLM-based APR approaches still rely primarily on coarse end-to-end signals from test-suite outcomes to guide repair, providing limited insight into where a program's internal logic deviates from its intended behavior.…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-13 · Donghao Zhou, Guisheng Liu, Hao Yang, Jiatong Li, Jingyu Lin, Xiaohu Huang, Yichen Liu, Xin Gao, Cunjian Chen, Shilei Wen, Chi-Wing Fu, Pheng-Ann Heng
General AI
In this work, we study Human-Object Interaction Video Generation (HOIVG), which aims to synthesize high-quality human-object interaction videos conditioned on text, reference images, audio, and pose. This task holds significant practical value for automating content creation in real-world applications, such as e-commer…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-13 · Federico Bottino, Carlo Ferrero, Nicholas Dosio, Pierfrancesco Beneventano
General AI
Organizational knowledge used by AI agents typically lacks epistemic structure: retrieval systems surface semantically relevant content without distinguishing binding decisions from abandoned hypotheses, contested claims from settled ones, or known facts from unresolved questions. We argue that the ceiling on organizat…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-14 · Farbod Alinezhad, Jianfei Cao, Gary J. Young, Brady Post
General AI
Predicting counterfactual outcomes in longitudinal data, where sequential treatment decisions heavily depend on evolving patient states, is critical yet notoriously challenging due to complex time-dependent confounding and inadequate uncertainty quantification in existing methods. We introduce the Causal Diffusion Mode…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-14 · Yecheng Wu, Song Han, Hai Cai
General AI
On-policy distillation (OPD) has emerged as an efficient post-training paradigm for large language models. However, standard OPD requires a live teacher inference server throughout training, resulting in substantial infrastructure overhead. In this work, we investigate whether on-policy distillation can be performed of…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-14 · Syed Md Mukit Rashid, Abdullah Al Ishtiaq, Kai Tu, Yilu Dong, Tianwei Wu, Ali Ranjbar, Tianchang Yang, Najrin Sultana, Shagufta Mehnaz, Syed Rafiul Hussain
General AI
Logical vulnerabilities in software stem from flaws in program logic rather than memory safety, which can lead to critical security failures. Although existing automated program repair techniques primarily focus on repairing memory corruption vulnerabilities, they struggle with logical vulnerabilities because of their …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-16 · Fabrizio Genilotti, Arianna Stropeni, Gionata Grotto, Francesco Borsatti, Manuel Barusco, Davide Dalle Pezze, Gian Antonio Susto
General AI
The reliability of a machine vision system for autonomous driving depends heavily on its training data distribution. When a vehicle encounters significantly different conditions, such as atypical obstacles, its perceptual capabilities can degrade substantially. Unlike many domains where errors carry limited consequence…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-16 · Jianxuan Yang, Xinyue Guo, Zhi Cheng, Kai Wang, Lipan Zhang, Jinjie Hu, Qiang Ji, Yihua Cao, Yihao Meng, Zhaoyue Cui, Mengmei Liu, Meng Meng, Jian Luan
General AI
Recent advances in video-to-audio (V2A) generation enable high-quality audio synthesis from visual content, yet achieving robust and fine-grained controllability remains challenging. Existing methods suffer from weak textual controllability under visual-text conflict and imprecise stylistic control due to entangled tem…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-16 · Boyan Li, Ou Ocean Kun Hei, Yue Yu, Yuyu Luo
General AI
While Large Language Models (LLMs) demonstrate impressive proficiency in generating SQL queries, they fundamentally lack the capability to self-evaluate correctness without an execution oracle. This limitation creates a stark Generation-Selection Gap, where high potential accuracy (Pass@K) fails to translate into execu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-17 · Tianqi Luo, Leixian Shen, Yuyu Luo
General AI
Agentic visual analytics (VA) represents an emerging class of systems in which large language model (LLM)-driven agents autonomously plan, execute, evaluate, and iterate across the full visual analytics pipeline. By shifting users from low-level tool operations to high-level analytical goals expressed through natural l…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-17 · Thomas Bayer, Alexander Lohr, Sarah Weiß, Bernd Michelberger, Wolfram Höpken
General AI
Explaining Machine Learning (ML) results in a transparent and user-friendly manner remains a challenging task of Explainable Artificial Intelligence (XAI). In this paper, we present a method to enhance the interpretability of ML models by using a Knowledge Graph (KG). We store domain-specific data along with ML results…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-19 · Ziao Zhang, Kou Shi, Shiting Huang, Avery Nie, Yu Zeng, Yiming Zhao, Zhen Fang, Qishen Su, Haibo Qiu, Wei Yang, Qingnan Ren, Shun Zou, Wenxuan Huang, Lin Chen, Zehui Chen, Feng Zhao
Research Track A · General AI
As the capability frontier of autonomous agents continues to expand, they are increasingly able to complete specialized tasks through plug-and-play external skills. Yet current benchmarks mostly test whether models can use provided skills, leaving open whether they can discover skills from experience, repair them after…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-20 · Raghvendra Kumar, Devankar Raj, Sriparna Saha
General AI
India's linguistic landscape, spanning 22 scheduled languages and hundreds of marginalized dialects, has driven rapid growth in NLP datasets, benchmarks, and pretrained models. However, no dedicated survey consolidates resources developed specifically for Indian languages. Existing reviews either focus on a few high-re…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-20 · Aditya Arora, Akshita Gupta, Pau Rodriguez, Marcus Rohrbach
General AI
Story Visualization aims to generate a sequence of images that faithfully depicts a textual narrative that preserve character identity, spatial configuration, and stylistic coherence as the narratives unfold. Maintaining such cross-frame consistency has traditionally relied on explicit memory banks, architectural expan…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-20 · Weicheng Lin, Yi Zhang, Jiawei Dang, Liang-Jie Zhang
General AI
Low-Rank Adaptation (LoRA) has become a widely adopted parameter-efficient fine-tuning method for large language models, with its effectiveness largely influenced by the allocation of ranks and scaling factors, as well as initialization. Existing LoRA variants typically address only one of these factors, often at the c…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-21 · Yuan Zhuang, Yuexin Bian, Sihong He, Jie Feng, Qing Su, Songyang Han, Jonathan Petit, Shihao Ji, Yuanyuan Shi, Fei Miao
General AI
Scaling critic capacity is a promising direction for enhancing off-policy reinforcement learning (RL). However, larger critics are prone to overfitting and unstable in replay-buffer-based bootstrap training. This paper leverages Low-Rank Adaptation (LoRA) as a structural-sparsity regularizer for off-policy critics. Our…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-21 · Mehdi Maboudi, Said Harb, Jackson Ferrao, Kourosh Khoshelham, Yelda Turkan, Karam Mawas
General AI
Point cloud registration involves aligning one point cloud with another or with a three-dimensional (3D) model, enabling the integration of multimodal data into a unified representation. This is essential in applications such as construction monitoring, autonomous driving, robotics, and virtual or augmented reality (VR…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-21 · Abhinav Agarwal
General AI
LLM-assisted defect discovery has a precision crisis: plausible-but-wrong reports overwhelm maintainers and degrade credibility for real findings. We present Refute-or-Promote, an inference-time reliability pattern combining Stratified Context Hunting (SCH) for candidate generation, adversarial kill mandates, context a…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-23 · Praval Sharma, Ashok Samal, Leen-Kiat Soh, Deepti Joshi
General AI
Event extraction identifies the central aspects of events from text. It supports event understanding and analysis, which is crucial for tasks such as informed decision-making in emergencies. Therefore, it is necessary to develop automated event extraction approaches. However, existing datasets for algorithm development…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-23 · Alibay Osmanli, Zixu Cheng, Shaogang Gong
General AI
Physical video understanding requires more than naming an event correctly. A model can answer a question about pouring, sliding, or collision from textual regularities while still failing to localize the event in time or space. We introduce a grounded benchmark for physical video understanding that extends the what--wh…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-23 · Yen-Siang Wu, Rundong Luo, Jingsen Zhu, Tao Tu, Ali Farhadi, Matthew Wallingford, Yu-Chiang Frank Wang, Steve Marschner, Wei-Chiu Ma
General AI
How can we tell whether a video has been sped up or slowed down? How can we generate videos at different speeds? Although videos have been central to modern computer vision research, little attention has been paid to perceiving and controlling the passage of time. In this paper, we study time as a learnable visual conc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-23 · Jun Wang, Ziyin Zhang, Rui Wang, Hang Yu, Peng Di
General AI
Real-time detection and mitigation of technical anomalies are critical for large-scale cloud-native services, where even minutes of downtime can result in massive financial losses and diminished user trust. While customer incidents serve as a vital signal for discovering risks missed by monitoring, extracting actionabl…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-24 · Negar Arabzadeh, Andrew Drozdov, Michael Bendersky, Matei Zaharia
General AI
Large Language Models (LLMs) have made query reformulation ubiquitous in modern retrieval and Retrieval-Augmented Generation (RAG) pipelines, enabling the generation of multiple semantically equivalent query variants. However, executing the full pipeline for every reformulation is computationally expensive, motivating …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-24 · Hyo Jin Jon, Longbin Jin, Eun Yi Kim
General AI
CLIP has demonstrated strong generalization in visual domains through natural language supervision, even for video action recognition. However, most existing approaches that adapt CLIP for action recognition have primarily focused on temporal modeling, often overlooking spatial perception. In real-world scenarios, visu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-24 · Chengyang Li, Kaiyi Xiong, Yuan Xu, Lei Qian, Yizhou Wang, Wentao Zhu
General AI
Embodied foundation models have achieved significant breakthroughs in robotic manipulation, yet they still depend heavily on large-scale robot demonstrations. Although recent works have explored leveraging human data to alleviate this dependency, effectively extracting transferable knowledge remains a significant chall…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-24 · Manyi Zhang, Ji-Fu Li, Zhongao Sun, Xiaohao Liu, Zhenhua Dong, Xianzhi Yu, Haoli Bai, Xiaobo Xia
General AI
Autonomous agent systems such as OpenClaw introduce significant efficiency challenges due to long-context inputs and multi-turn reasoning. This results in prohibitively high computational and monetary costs in real-world development. While quantization is a standard approach for reducing cost and latency, its impact on…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-27 · Zhou Ziheng, Huacong Tang, Jinyuan Zhang, Haowei Lin, Bangcheng Yang, Qian Long, Fang Sun, Yizhou Sun, Yitao Liang, Ying Nian Wu, Demetri Terzopoulos, Xiaofeng Gao
Research Track A · General AI
Discovering causal regularities and applying them to build functional systems--the discovery-to-application loop--is a hallmark of general intelligence, yet evaluating this capacity has been hindered by the vast complexity gap between scientific discovery and real-world engineering. We introduce SciCrafter, a Minecraft…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-27 · Amal Akli, Mike Papadakis, Maxime Cordy, Yves Le Traon
General AI
Large language models are widely used for code generation, yet they rely on an implicit assumption that the task descriptions are sufficiently detailed and well-formed. However, in practice, users may provide defective descriptions, which can have a strong effect on code correctness. To address this issue, we develop S…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-27 · Shiyi Zhang, Yiji Cheng, Tiankai Hang, Zijin Yin, Runze He, Yu Xu, Wenxun Dai, Yunlong Lin, Chunyu Wang, Qinglin Lu, Yansong Tang
General AI
Unified multi-modal understanding/generative models have shown improved image editing performance by incorporating fine-grained understanding into their Chain-of-Thought (CoT) process. However, a critical question remains underexplored: what forms of CoT and training strategy can jointly enhance both the understanding …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-28 · Jiahang Lin, Shichun Liu, Chengjun Pan, Lizhi Lin, Shihan Dou, Xuanjing Huang, Hang Yan, Zhenhua Han, Tao Gui
General AI
Harnesses have become a central determinant of coding-agent performance, shaping how models interact with repositories, tools, and execution environments. Yet automating harness engineering is hard: a heterogeneous action space, sparse and noisy evaluation signal, multi-million-token trajectories, and edits whose effec…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-28 · Ajmain Inqiad Alam, Palash Roy, Chanchal K. Roy, Banani Roy, Kevin A. Schneider
General AI
The accelerating adoption of Large Language Models (LLMs) in software engineering (SE) has brought with it a silent crisis: unsustainable computational cost. While these models demonstrate remarkable capabilities in different SE tasks, they are unmanageably large, slow to deploy, memory-intensive, and carbon-heavy. Thi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-28 · Jinxiang Meng, Shaoping Huang, Fangyu Lei, Jingyu Guo, Haoxiang Liu, Jiahao Su, Sihan Wang, Yao Wang, Enrui Wang, Ye Yang, Hongze Chai, Jinming Lv, Anbang Yu, Huangjing Zhang, Yitong Zhang, Yiming Huang, Zeyao Ma, Shizhu He, Jun Zhao, Kang Liu
General AI
Real-world data visualization (DV) requires native environmental grounding, cross-platform evolution, and proactive intent alignment. Yet, existing benchmarks often suffer from code-sandbox confinement, single-language creation-only tasks, and assumption of perfect intent. To bridge these gaps, we introduce DV-World, a…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-28 · Pahal D. Patel, Sanmay Ganguly
General AI
Graph neural networks such as ParticleNet and transformer based networks on point clouds such as ParticleTransformer achieve state-of-the-art performance on jet tagging benchmarks at the Large Hadron Collider, yet the physical reasoning behind their predictions remains opaque. We present different methods, i.e. perturb…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-28 · Nazia Shehnaz Joynab, Soneya Binta Hossain
General AI
Resolution of complex post-production issues in large-scale open-source software (OSS) projects requires significant cognitive effort, as developers need to go through long, unstructured and fragmented issue discussion threads before that. In this paper, we present SWE-MIMIC-Bench, an issue trajectory dataset generated…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-29 · My Thi Diem Phan, Trung Tuyen Truong, Hoai Phuong Ha, Dat Thanh Nguyen
General AI
Norway's electricity market is heavily dominated by hydropower, but the 2021--2022 energy crisis and stronger integration with Continental Europe have fundamentally altered price formation, reducing the reliability of forecasting models calibrated on historical data. Despite the critical need for updated models, a unif…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-29 · Darren Fürst, Sebastian Steindl, Ulrich Schäfer
General AI
Speech Sound Disorders (SSD) affect roughly five percent of children, yet speech-language pathologists face severe staffing shortages and unmanageable caseloads. We test a hierarchical approach to SSD classification on the granular multi-task SLPHelmUltraSuitePlus benchmark. We propose a cascading approach from binary …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-29 · Raj Kumar Ranabhat, Tayler D Ross, Tony Jiao, Jeremie Larouche, Joel Finkelstein, Michael Hardisty
General AI
Surgical training involves didactic teaching, mentor-led learning, surgical skills laboratories, and direct exposure to surgery; however, increasing clinical pressures have limited operating room (OR) exposure. This work leverages virtual reality (VR) to provide a safe and immersive training environment. Existing VR tr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-29 · Wanyue Zhang, Wenxiang Wu, Wang Xu, Jiaxin Luo, Helu Zhi, Yibin Huang, Shuo Ren, Zitao Liu, Jiajun Zhang
General AI
Vision-language models (VLMs) have shown strong performance on static visual understanding, yet they still struggle with dynamic spatial reasoning that requires imagining how scenes evolve under egocentric motion. Recent efforts address this limitation either by scaling spatial supervision with synthetic data or by cou…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-30 · Chenxin Li, Zhengyang Tang, Huangxin Lin, Yunlong Lin, Shijue Huang, Shengyuan Liu, Bowen Ye, Rang Li, Lei Li, Benyou Wang, Yixuan Yuan
General AI
LLM agents are expected to complete end-to-end units of work across software tools, business services, and local workspaces. Yet many agent benchmarks freeze a curated task set at release time and grade mainly the final response, making it difficult to evaluate agents against evolving workflow demand or verify whether …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-30 · Han Liu, Shanghao Shi, Yevgeniy Vorobeychik, Chongjie Zhang, Ning Zhang
General AI
Low-Rank Adaptation (LoRA), which leverages the insight that model updates typically reside in a low-dimensional space, has significantly improved the training efficiency of Large Language Models (LLMs) by updating neural network layers using low-rank matrices. Since the generation of adversarial examples is an optimiz…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-30 · Jun Yeon Won, Xin Jin, Shiqing Ma, Zhiqiang Lin
General AI
Large Language Models (LLMs) have achieved remarkable progress in recent years, driving their adoption across a wide range of domains, including computer security. In reverse engineering, LLMs are increasingly applied to critical tasks such as function and variable name recovery and type inference. However, despite the…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-04-30 · Zainab Rehan, Christian Medeiros Adriano, Sona Ghahremani, Holger Giese
General AI
Rule-based systems remain central in safety-critical domains but often struggle with scalability, brittleness, and goal misspecification. These limitations can lead to reward hacking and failures in formal verification, as AI systems tend to optimize for narrow objectives. In previous research, we developed a neuro-sym…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-05-01 · Yutao Hou, Yihan Jiang, Yuhan Xie, Jian Yang, Liwen Zhang, Hailiang Huang, Guanhua Chen, Yun Chen
General AI
Large language models (LLMs) are increasingly applied in financial scenarios. However, they may produce harmful outputs, including facilitating illegal activities or unethical behavior, posing serious compliance risks. To systematically evaluate LLM safety in finance, we propose FinSafetyBench, a bilingual (English-Chi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-05-01 · Theodore Papamarkou, Pierre Alquier, Matthias Bauer, Wray Buntine, Andrew Davison, Gintare Karolina Dziugaite, Maurizio Filippone, Andrew Y. K. Foong, Vincent Fortuin, Dimitris Fouskakis, Jes Frellsen, Eyke Hüllermeier, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Nikita Kotelevskii, Salem Lahlou, Yingzhen Li, Fang Liu, Clare Lyle, Thomas Möllenhoff, Konstantina Palla, Maxim Panov, Yusuf Sale, Kajetan Schweighofer, Artem Shelmanov, Siddharth Swaroop, Martin Trapp, Willem Waegeman, Andrew Gordon Wilson, Alexey Zaytsev
General AI
LLMs excel at predictive tasks and complex reasoning tasks, but many high-value deployments rely on decisions under uncertainty, for example, which tool to call, which expert to consult, or how many resources to invest. While the usefulness and feasibility of Bayesian approaches remain unclear for LLM inference, this p…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-06-04 · Mengzhuo Chen, Junjie Wang, Zhe Liu, Yawen Wang, Qing Wang
General AI
LLM-based agents increasingly rely on harnesses that provide execution environments, tool interfaces, context, lifecycle orchestration, observability, verification, and governance. Existing self-improving agents and automatic harness evolution methods mainly improve agents through runtime supervision, prompt optimizati…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-06-04 · Mandana Samiei, Eunice Yiu, Anthony GX-Chen, Dongyan Lin, Jocelyn Shen, Blake A. Richards, Alison Gopnik, Doina Precup
General AI
A long-standing finding in the causal learning literature is that adults struggle to identify conjunctive causal rules, where an effect requires the simultaneous presence of multiple causes, while performing better in disjunctive settings. However, most demonstrations of this ``conjunctive handicap'' rely on passive ob…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-06-04 · Ziwen Kan, Wugeng Zheng, Tianlong Chen, Song Wang
General AI
In healthcare, multimodal time series tasks often operate on incomplete observations in practice, for example when ECG segments are lost because electrodes detach or an entire respiratory channel is unavailable during overnight monitoring. Such missingness typically appears in two structurally distinct patterns: within…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-06-04 · Mykyta Ielanskyi, Kajetan Schweighofer, Lukas Aichberger, Sepp Hochreiter
General AI
Recent advancements in reasoning language models have been driven by Reinforcement Learning (RL) fine-tuning. Most often, these rely on the Group Relative Policy Optimization (GRPO) algorithm or modifications thereof to steer the models to produce Chain-of-Thought (CoT) traces. The final answer can only be verified, an…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-06-04 · Shubham Gaur, Ian Lane
Research Track B · General AI
Web agents operating over long horizons ingest raw DOM and accessibility trees -- routinely tens of thousands of tokens -- at every action step, causing progressive context degradation that erodes reasoning well before tasks complete. We argue that this coupling of observation frequency to action frequency is an archit…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.3
2026-06-04 · Eric Spencer, Arslan Bisharat, Brian Ortiz, Khushboo Bhadauria, TaiNing Wang, George K. Thiruvathukal, Konstantin Laufer, Mohammed Abuhamad
General AI
TLA+ is a formal specification language for verifying distributed systems and safety-critical protocols. Large language models (LLMs) frequently produce TLA+ specifications that fail the TLC model checker for semantic reasons. Across 25 LLMs, the best public baseline is 26.6% syntactic parse and 8.6% semantic model-che…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-06-04 · Dong Jing, Jingchen Nie, Tianqi Zhang, Jiaqi Liu, Huaxiu Yao, Zhiwu Lu, Mingyu Ding
General AI
Robot manipulation alternates between low-risk transit phases that call for fast execution and high-risk contact stages that demand slow, precise motion. Yet existing Vision-Language-Action models (VLAs) only inherit a single fixed speed from training demonstrations. Prior efforts to accelerate VLAs through model compr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-06-04 · Chenming Zhu, Jingli Lin, Yilin Long, Peizhou Cao, Tai Wang, Jiangmiao Pang, Xihui Liu
General AI
While Vision-Language Models (VLMs) have shown strong visual reasoning capabilities, their spatial reasoning abilities remain largely constrained to the observed images and text-oriented chain-of-thought. They often struggle to infer unobserved layouts, maintain cross-view consistency, and reason from alternative viewp…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-06-05 · Gonzalo Mancera, Daniel DeAlcala, Aythami Morales, Julian Fierrez, Ruben Tolosana, Francisco Jurado
General AI
We present LoRA-MINT, a new methodology for Membership Inference Test (MINT) applied to recent Large Language Models (LLMs) fine-tuned for specific Natural Language Processing (NLP) tasks through Low-Rank Adaptation (LoRA). The primary goal is to assess whether individual samples were part of the training data of these…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.3
2026-06-05 · Arthur Bouton, Tristan D. Hasseler, Michael Paton, Travis Brown, Jacob Levy, William Reid, Joshua Martin, Hari Nayar
Research Track A · General AI
This paper presents ERNEST, a four-wheeled planetary rover concept equipped with a two-degree-of-freedom Active Gimbal Suspension that combines yaw and roll actuation to enable wheel reconfiguration, steering, and active load redistribution. A single neural network controller, trained to track a desired path across cha…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.3
2026-06-08 · Jisong Cai, Long Ling, Shiwei Chu, Zhongshan Liu, Jiayue Kang, Zhixuan Liang, Wenjie Xu, Yinan Mao, Weinan Zhang, Xiaokang Yang, Ru Ying, Ran Zheng, Yao Mu
General AI
World-action models have emerged as a promising paradigm for robot manipulation, jointly modeling visual scene dynamics and actions to inject physical priors into policy learning. However, existing world-action models couple world prediction and action execution at the same temporal resolution, forcing the world branch…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-06-08 · Abhinav Mishra, Kumar Sharad
General AI
Delegation-scoped execution is not identifiable from standard observables: audit logs and execution traces can be identical under multiple incompatible delegation assignments. This gap is especially acute in LLM-based agentic systems, where agents dynamically select tools, vary execution sequences across runs for the s…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-06-08 · Pu Ning, Quan Chen, Kun Tao, Xinyu Tang, Tianshu Wang, Qianggang Cao, Xinyu Kong, Zujie Wen, Zhiqiang Zhang, Jun Zhou
General AI
Large language models are increasingly expected to handle complex, long-horizon real-world tasks whose context demands can grow without bound, yet model context windows remain inherently finite. Recent work explores a paradigm where a main agent decomposes tasks and dispatches subtasks to subagents, which execute and r…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-06-08 · Zhenyu Wu, Xiuwei Xu, Yukun Zhou, Yifan Li, Qiuping Deng, Xiaofeng Wang, Zheng Zhu, Bingyao Yu, Ziwei Wang, Jiwen Lu, Haibin Yan
General AI
Embodied world models have emerged as a pivotal paradigm for visual robotic decision-making and interactive environment simulation. However, conventional embodied frameworks rely on low-dimensional structured action vectors (e.g., joint angles and end-effector poses), which suffer from limited expressive capacity, poor…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-06-09 · Tong Xie, Yuanhao Ban, Yunqi Hong, Sohyun An, Yihang Chen, Cho-Jui Hsieh
General AI
Supervised fine-tuning (SFT) typically maximizes the likelihood of every token in a demonstrated trajectory. However, an observed token can be non-unique, noisy, or misaligned with the model prior. Strictly fitting toward this one-hot target may be suboptimal, especially when the pretrained model encodes a rich knowled…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-06-09 · Weixian Xu, Shilong Liu, Mengdi Wang
General AI
In this paper, we propose EEVEE, the first multi-dataset test-time prompt learning framework for LLM agents, enabling test-time prompt learning under real-world task streams. Existing methods are largely designed for single-dataset settings, while real-world applications require models to handle heterogeneous input str…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-06-13 · Yuhong Jiang, Zhishu Shen, Tong Yin, Qiushi Zheng, Yichao Jin, Fidan Mehmeti, Jiong Jin
General AI
The rapid growth of remote sensing data in Low Earth Orbit (LEO) satellite networks is increasingly constrained by limited downlink capacity to terrestrial networks. Satellite edge computing alleviates this pressure by enabling in-orbit data processing. However, it introduces a new challenge of spatio-temporal resource…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.3
2026-06-15 · Xiuwei Xu, Haowen Sun, Angyuan Ma, Yiwei Zhang, Zhenyu Wu, Xiaofeng Wang, Bingyao Yu, Zheng Zhu, Jie Zhou, Jiwen Lu
General AI
Spatial generalization is critical for imitation-learned manipulation policies, but achieving it typically requires scaling demonstrations across diverse object poses, robot configurations, and camera viewpoints. Data augmentation from a few source demonstrations offers a practical alternative to costly real-world coll…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-06-15 · Naiyu Yin, Dennis Wei, Tian Gao, Amit Dhurandhar, Karthikeyan Natesan Ramamurthy, Yue Yu
General AI
A prominent research direction in mechanistic interpretability is learning sparse circuits over LLM components to reveal how they jointly produce model behavior. However, raw neurons are polysemantic, making learned circuits hard to interpret. Sparse autoencoder (SAE) features alleviate this, but their high dimensional…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.3
2026-06-16 · Xiaojun Jia, Jie Liao, Simeng Qin, Ke Ma, Wenbo Guo, Yebo Feng, Aishan Liu, Yang Liu
General AI
Agent skills are emerging as an important attack surface in LLM-based systems. Through an empirical study of existing skill scanners, we find that current defenses primarily rely on textual descriptions, manifests, and source code as the main signals for security analysis, which can leave visually conveyed malicious in…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.2
2026-06-20 · Colin Samplawski, Ramneet Kaur, Manoj Acharya, Anirban Roy, Adam D. Cobb
General AI
Large multi-modal language models are increasingly deployed in high-stakes domains, making well-calibrated uncertainty essential. Traditional Bayesian methods approximate posteriors over all model weights, which becomes intractable for modern large models. For this reason, recent work instead considers Bayesian low-ran…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.2
2026-06-23 · Laura Colazzo, Giuseppe Anzillotti
General AI
Agentic Web Browsers (AWBs), powered by Large Language Models (LLMs), are emerging as autonomous systems capable of navigating the Web on behalf of users. Beyond enhancing productivity, they could also offer significant promise as Assistive Technologies (ATs) for visually-impaired individuals, transforming web interact…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.2
2026-06-23 · Negin Raoof, Richard Zhuang, Marianna Nezhurina, Etash Guha, Atula Tejaswi, Ryan Marten, Charlie F. Ruan, Tyler Griggs, Alexander Glenn Shaw, Hritik Bansal, E. Kelly Buchanan, Artem Gazizov, Reinhard Heckel, Chinmay Hegde, Sankalp Jajee, Daanish Khazi, Emmanouil Koukoumidis, Xiangyi Li, Hange Liu, Shlok Natarajan, Harsh Raj, Nicholas Roberts, Ethan Shen, Nishad Singhi, Michael Siu, Ashima Suvarna, Hanwen Xing, Patrick Yubeaton, Robert Zhang, Leon Liangyu Chen, Xiaokun Chen, Steven Dillmann, Saadia Gabriel, Xunyi Jiang, Anurag Kashyap, Boxuan Li, Yein Park, Minh Pham, Sujay Sanghavi, Lin Shi, Ke Sun, Yixin Wang, Zhiwei Xu, Erica Zhang, Siyan Zhao, Wanjia Zhao, Jenia Jitsev, Alex Dimakis, Benjamin Feuer, Ludwig Schmidt
General AI
Agentic language models dramatically expand the applications of AI yet little is publicly known about how to curate training data for broadly capable agents. Existing open efforts such as SWE-Smith, SERA, and Nemotron-Terminal typically target a single benchmark, leaving open the question of how to train models that ge…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.2
2026-06-23 · Yikai Lu, Yifei Wu, Xinyu Lu, Tongxin Li
General AI
In the big-world regime, agents cannot be universally capable and their ability is inevitably specialized across a world model in pieces. Consequently, standard uniform guarantees fail to distinguish between the understanding of critical bottlenecks and irrelevant failures. We first formalize this limitation by proving…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.2
2026-06-24 · Aditya Singh, Gerson Kroiz, Senthooran Rajamanoharan, Neel Nanda
General AI
A central goal of safety research is determining whether a model is misaligned. Prior work has largely focused on detecting concerning behavior. But behavior alone does not establish misalignment: a concerning action can arise from benign causes such as confusion. This motivates model forensics: investigating whether t…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.2
2026-06-24 · Andrei Liviu Nicolicioiu, Mohammad Pezeshki, Aaron Courville
General AI
On-policy self-distillation achieves strong pass@1 accuracy by using a single model as both teacher and student, with the teacher conditioned on a correct demonstration to provide dense token-level feedback. We show that this could come at a hidden cost: rollout diversity decreases and pass@k curves flatten (i.e., gene…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.2
2026-06-24 · Seth Dobrin, Łukasz Chmiel
General AI
AI agents are granted access to tools, APIs, and other infrastructure, making them active principals in those systems. The dominant approach places controls inside the agent's own runtime: system prompts, output filters, and guardrail libraries. Any control in the agent's address space is reachable by inputs that influ…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.0
2026-03-24 · Connor Mclaughlin, Nigel Lee, Lili Su
Research Track A
Machine learning models often need to adapt to new data after deployment due to structured or unstructured real-world dynamics. The Continual Learning (CL) framework enables continuous model adaptation, but most existing approaches either assume each task contains sufficiently many data samples or that the learning tas…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.0
2026-04-09 · Mu Nan, Muquan Yu, Weijian Mai, Jacob S. Prince, Hossein Adeli, Rui Zhang, Jiahang Cao, Benjamin Becker, John A. Pyles, Margaret M. Henderson, Chunfeng Song, Nikolaus Kriegeskorte, Michael J. Tarr, Xiaoqing Hu, Andrew F. Luo
Research Track A · General AI
Visual decoding from brain signals is a key challenge at the intersection of computer vision and neuroscience, requiring methods that bridge neural representations and computational models of vision. A field-wide goal is to achieve generalizable, cross-subject models. A major obstacle towards this goal is the substanti…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.0
2026-04-14 · Chaoyao Shen, Linfeng Jiang, Yixian Shen, Tao Xu, Guoqing Li, Anuj Pathania, Andy D. Pimentel, Meng Zhang
Research Track A
Deep learning (DL) compilers rely on cost models and auto-tuning to optimize tensor programs for target hardware. However, existing approaches depend on large offline datasets, incurring high collection costs and offering suboptimal transferability across platforms. In this paper, we introduce TCL, a novel efficient an…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 9.0
2026-04-15 · Ahmadreza Eslaminia, Kuan-Chieh Lu, Klara Nahrstedt, Chenhui Shao
Research Track A
Ultrasonic metal welding (UMW) is widely used in industrial applications but is sensitive to tool wear, surface contamination, and material variability, which can lead to unexpected process faults and unsatisfactory weld quality. Conventional monitoring systems typically rely on supervised learning models that assume a…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.0
2026-05-08 · Andrea Gurioli, Federico Pennino, Maurizio Gabbrielli
General AI
Embedding-based code retrieval often suffers when encoders overfit to surface syntax. Prior work mitigates this by using LLMs to rephrase queries and corpora into a normalized style, but leaves two questions open: how much representational shift helps, and when is the per-query LLM call justified? We study a hierarchy …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.0
2026-05-12 · Zhennan Chen, Junwei Zhu, Xu Chen, Jiangning Zhang, Jiawei Chen, Zhuoqi Zeng, Wei Zhang, Chengjie Wang, Jian Yang, Ying Tai
General AI
Pixel diffusion models have recently regained attention for visual generation. However, training advanced pixel-space models from scratch demands prohibitive computational and data resources. To address this, we propose the Latent-to-Pixel (L2P) transfer paradigm, an efficient framework that directly harnesses the rich…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.0
2026-05-12 · Lezhong Wang, Mehmet Onurcan Kaya, Siavash Bigdeli, Jeppe Revall Frisvad
General AI
Recent single-image relighting methods, powered by advanced generative models, have achieved impressive photorealism on synthetic benchmarks. However, their effectiveness in the complex visual landscape of the real world remains largely unverified. A critical gap exists, as current datasets are typically designed for m…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.0
2026-05-23 · Bill Psomas, Dionysis Christopoulos, Thanasis Petropoulos, Nikos Efthymiadis, Ioannis Kakogeorgiou, Ondřej Chum, Yannis Avrithis, Giorgos Tolias, Konstantinos Karantzalos
General AI
Remote sensing composed image retrieval (RSCIR) enables search in large satellite image archives using composed queries that combine a reference image with a textual modifier. Although RSCIR offers a flexible interface for expressing targeted retrieval intent, the transferability of modern composition methods to Earth …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.0
2026-05-24 · Yubo Li, Yidi Miao
Research Track B · General AI
Long-horizon LLM inference turns the key--value (KV) cache into the dominant GPU memory consumer and makes per-token attention increasingly expensive. Many common eviction policies use static recency windows or historical attention, leaving unused a signal computed on every decoding step: the model's current uncertaint…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.0
2026-05-27 · Rajarshi Chowdhury, Akshay Shah, Zakaria Alrmaih, Chenhao Guo, Anubhav Singh, Sue Lee
Research Track A · General AI
Oracle Exadata consolidates thousands of tenant databases onto shared storage infrastructure deployed at hundreds of customer sites worldwide. Oracle Multitenant architecture enables this extreme density, with thousands of tenant databases sharing a single Exadata storage system -- but this creates a multi-level resour…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.0
2026-05-27 · Ngoc Phan Phuoc Loc, Toan Huynh La Viet, Thanh Tran Khanh, Duy A Nguyen, Tuan Anh Nguyen Pham, Thanh Nguyen, Nitesh V. Chawla, Wray Buntine, Kok-Seng Wong, Khoa D. Doan, Binh T. Nguyen
General AI
The rapid growth in submissions to machine learning venues has strained the scientific peer-review system and intensified interest in LLM-based automated peer reviewers. However, how good these systems are actually, especially compared to human reviewers at catching scientific gaps, remains poorly understood. In this w…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.0
2026-05-28 · Zhenyu Sun, Zheng Xu, Ermin Wei
Research Track A · General AI
Reinforcement Learning from Human Feedback (RLHF) typically relies on static reward models to align Large Language Models with human preferences. However, human values are inherently diverse and heterogeneous, and a single reward model often lacks the robustness required to generalize to unseen preference domains. Whil…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.0
2026-05-29 · Wai-Chung Kwan, Aryo Pradipta Gema, Joshua Ong Jun Leang, Pasquale Minervini
General AI
Self-play can train language models without external supervision. However, existing methods require rule-checkable answers, leaving open-ended tasks dependent on curated prompts or frontier-model judges. We introduce SCOPE, a data-free self-play framework for open-ended tasks that co-evolves two policies: a Challenger …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.0
2026-06-11 · Xiaobin Zhang, Lefei Shen, Mouxiang Chen, Zhuo Li, Hongkai Li, Han Fu, Jianling Sun, Xiaoxue Ren, Chenghao Liu
Research Track A · General AI
Driven by conservative over-provisioning to guarantee service reliability, resource utilization in cloud data centers remains at low levels. To mitigate this, the forecast-then-optimize paradigm has emerged to optimize consolidation by anticipating future demands. While emerging time series foundation models promise to…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.0
2026-06-13 · Ida Momennejad, Roberta Raileanu
Research Track A
Open-ended intelligence is the capacity to adapt to novel problems and environments that are substantially different from those in training. A mathematics of open-ended intelligence requires two pillars: first, a minimal set of representational primitives (e.g., states, actions) and algorithmic primitives (e.g., neares…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.0
2026-06-14 · Fendi Tsim, Alina Gutoreva
Research Track A
We introduce SCAN -- a human-centric decision-making framework to facilitate learners for effective task allocation with Generative Artificial Intelligence (GenAI) based on Vygotsky's Zone of Proximal Development and Metacognition. In SCAN, we systematize and formalize AI-human interaction by introducing a task-identif…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.0
2026-06-29 · Tianyu Wang, Gourav Rattihalli, Aditya Dhakal, Longfei Shangguan, Dejan Milojicic
Research Track A
As LLM inference becomes a major cloud workload, its growing energy footprint makes cluster-wide energy optimization increasingly important. Serverless LLM serving helps platforms absorb traffic volatility by elastically sharing GPU resources across models, but this sharing also makes energy optimization difficult. Mul…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-05-04 · Mingming Zha, Xiaofeng Wang
General AI
Autonomous LLM agents operate as long-running processes with persistent workspaces, memory files, scheduled task state, and messaging integrations. These features create a new propagation risk: attacker-influenced content can be written into persistent agent state, re-enter the LLM decision context through scheduled au…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.8
2026-05-04 · Quang Hieu Pham, Yang He, Ping Nie, Canwen Xu, Davood Rafiei, Yuepeng Wang, Xi Ye, Jocelyn Qiaochu Chen
General AI
Text-to-SQL over large analytical databases requires navigating complex schemas, resolving ambiguous queries, and grounding decisions in actual data. Most current systems follow a fixed pipeline where schema elements are retrieved once upfront and the database is only revisited for post-hoc repair, limiting recovery fr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.8
2026-05-04 · Tienyu Chang, Zhen Chen, Renjie Liang, Jinyu Ding, Jie Xu, Sunu Mathew, Amir Reza Hajrasouliha, Andrew J. Saykin, Ruogu Fang, Yu Huang, Jiang Bian, Qingyu Chen
General AI
The advent of foundation models has heralded a new era in medical artificial intelligence (AI), enabling the extraction of generalizable representations from large-scale unlabeled datasets. However, current ophthalmic AI paradigms are predominantly constrained to single-modality inference, thereby creating a dissonance…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.8
2026-05-04 · Yingtian Shi, Abivishaq Balasubramanian, Jessica Herring, Jiachen Li, Juan Macias Romero, Rosemarie Santa Gonzalez, Varun Mishra, Agata Rozga, Xiang Zhi Tan, Thomas Plötz
General AI
Human activity recognition (HAR) in smart homes remains challenging because many daily activities exhibit similar local sensor patterns, while minimally intrusive sensing provides sparse and ambiguous observations. As a result, methods based on short temporal or event windows often fail to capture the broader temporal …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.8
2026-05-04 · Pehuén Moure, Niclas Pokel, Bilal Bounajma, Yingqiang Gao, Roman Boehringer, Longbiao Cheng, Shih-Chii Liu
General AI
Automatic speech recognition (ASR) systems remain brittle on dysarthric and other atypical speech. Recent audio-language models raise the possibility of improving performance by conditioning on additional clinical context at inference time, but it is unclear whether these models can make use of such information. We int…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.8
2026-05-06 · Srikar Kashyap Pulipaka
General AI
We present our system for SemEval-2026 Task 9: Multilingual Polarization Detection, a binary classification task spanning 22 languages. Our approach fine-tunes separate Gemma~3 models (12B and 27B parameters) per language using Low-Rank Adaptation (LoRA), augmented with synthetic data generated by a large language mode…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-05-07 · Xiaofang Xiao, Guangchao Li, Guangrong Zhao, Qi Lin, Wen Ma, Hongkai Wen, Yanxiang Wang, Yiran Shen
General AI
Automatic sign language recognition (SLR) has become a key enabler of inclusive human-computer interaction, fostering seamless communication between deaf individuals and hearing communities. Despite significant advances in multimodal learning, existing SLR research remains dominated by vision-based datasets, which are …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-05-22 · Haoyuan Wang, Xiaohao Liu, Jiajie Su, Jianmao Xiao, Chaochao Chen
General AI
Multimodal large language models (MLLMs) need efficient mechanisms to update knowledge without degrading existing capabilities. While intrinsic multimodal knowledge editing achieves strong reliability and locality, it often exhibits limited generality, failing to propagate edits across semantically equivalent visual an…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-05-28 · Yaxin Luo, Jiacheng Cui, Xiaohan Zhao, Xinyi Shang, Jiacheng Liu, Xinyue Bi, Zhaoyi Li, Zhiqiang Shen
General AI
The pretraining data mixture of Large Language Models (LLMs) constitutes their "digital DNA", shaping model behaviors, capabilities, and failure modes. Yet this composition is rarely disclosed, making post-hoc auditing of data combination or provenance difficult. In this work, we formalize $\textbf{Data Mixture Surgery…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-05-29 · Jiazheng Xing, Hangjie Yuan, Lingling Cai, Xinyu Liu, Yujie Wei, Fei Du, Hai Ci, Tao Feng, Jiasheng Tang, Weihua Chen, Fan Wang, Yong Liu
General AI
Connector-based video unified models have demonstrated strong capability in instruction-grounded video synthesis, but integrating a large high-fidelity generator into the unified training loop is computationally prohibitive, limiting achievable visual quality. We therefore propose Lumos-Nexus, a training-efficient unif…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-05-29 · Zhenghua Bao, Fengya Tian, Chris Zhang, Zhenjun Chen, Xile Ma, Yi Shi
General AI
The rapid development of large language models, each with distinct capabilities and inference costs, raises a practical deployment question: given an incoming request, which model should handle it? We present OrcaRouter, a production-oriented LLM router that combines a LinUCB-based contextual bandit over lexical and se…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-05-29 · Hengbo Xu, Shengjie Jin, Yanbiao Ma, Zhiwu Lu
General AI
With the rapid advancement of large multimodal models (LMMs), inference-time overhead has become a key bottleneck for real-world deployment. Existing methods typically prune visual tokens at prefill, assuming the required visual evidence remains static during reasoning. However, we empirically show that visual evidence…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-05-29 · Chu Fei Luo, Samuel Dahan, Xiaodan Zhu
General AI
Test-time reasoning has become a significant field of study since the introduction of chain-of-thought reasoning in large language models (LLMs). However, the mechanisms of this reasoning process are still under-explored -- from the same input prompt, and even the same partial solution, LLMs can produce varied answers …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-06-28 · Satish Narayana Srirama
General AI
Fog computing utilizes proximal computational resources for sensor data processing and actuation, and addresses the latency, network load, and privacy issues of cloud-centric Internet of Things. On the other hand, Large Language Models (LLMs) are a type of deep learning AI models, which are trained on enormous text dat…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-06-29 · Huaijie Wang, Shusheng Xu, Yi Wu, Kaifeng Lyu
General AI
A key step toward artificial general intelligence is to train models that can perform multiple tasks. In this paper, we study how to build such models by first training separate RL experts for individual tasks and then consolidating them via distillation, as an alternative to directly training a single model on mixed t…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-06-29 · Kunyang Li, Kyle Domico, Jonathan Gregory, Patrick McDaniel
General AI
Multi-agent systems (MAS) are increasingly used to automate complex, distributed workflows. However, their inter-agent communication channels introduce new attack surfaces that remain poorly understood and are difficult to defend against. In this paper, we address how defenders should prioritize limited security effort…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-06-29 · Jameel Hassan, Yasiru Ranasinghe, Vishal Patel
General AI
3D Gaussian Splatting (3DGS) has emerged at the forefront of 3D scene reconstruction. Extending 3DGS with language-driven, open-vocabulary understanding has gained significant attention for real-world applications such as embodied AI. Recent methods achieve this by learning an instance feature attribute and assigning s…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-06-29 · Ziwei Su, Junyu Ren, Victor Veitch
General AI
Contrastive embedding models trained with scale-invariant losses are typically paired with distance metrics like cosine similarity, effectively ignoring embedding magnitudes. However, surprisingly, empirical studies reveal that despite this, these "discarded" norms seem to correlate with semantic properties such as con…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-06-29 · Yihao Wang, Yuheng Ji, Mingyu Cao, Yanqing Shen, Runze Xiao, Huaihai Lyu, Senwei Xie, Euan Liu, Klara Tian, Tianfeng Long, Yichi Zhang, Zhengliang Cai, Ruike Chen, Jifan Zhao, Ruochuan Shi, Zihan Tang, Jing Lyu, Wenxing Tan, Ningbo Zhang, Yangtao Hu, Yuming Gao, Xiansheng Chen, Junkai Zhao, Congsheng Xu, Boan Zhu, Ziqi Wang, Yupu Feng, Qiongqiong Zhang, Yingli Zhao, Yulong Ao, Shaoxuan Xie, You Liu, Guocai Yao, Leiduo Zhang, Xiaodan Liu, Yunyan Zhang, Yance Jiao, Xinyan Yang, Jiaxing Wei, Xu Liu, Tengfei Pan, Shaokai Nie, Chunlei Men, Sen Cui, Xiaojie Jin, Hongyang Li, Jianlan Luo, Yao Mu, Yunchao Wei, Jun Yan, Hang Zhao, Xiaolong Zheng, Jiaming Li, Yonghua Lin, Tiejun Huang, Zhongyuan Wang, Pengwei Wang
General AI
We introduce Orca, an initial instantiation of a general world foundation model. Orca learns a unified world latent space from multimodal world signals and exposes it through multimodal readout interfaces. Rather than optimizing isolated next-token, next-frame, or next-action prediction, we are centered on Next-State-P…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-06-29 · Shanshan Wang, Derek F. Wong, Jingming Yao, Lidia S. Chao
General AI
Traditional automatic evaluation methods have been shown to be unsuitable for modern Chinese poetry because of the distinct nature of this literary genre. Human evaluation remains reliable, but is expensive and not applicable to large-scale data. In this paper, we propose Poller (Poetry LLM Evaluator), a novel method l…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-06-29 · Seunghun Baek, Jihwan Park, Jaeyoon Sim, Hoseok Lee, Seungjoo Lee, Won Hwa Kim
General AI
Multimodal MRI is essential for accurate brain tumor segmentation. However, acquiring all modalities at inference is often challenging in practice, which causes intrinsic uncertainty due to unavoidable information loss. Without modeling this uncertainty, existing methods encode incomplete evidence into deterministic re…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.8
2026-07-02 · Haiyang Li, Yuming Fu, Qun Song, Hongchao Liao, Jing Chen, Mounim A. EI-Yacoubi, Xin Jin
General AI
Vein recognition is a secure biometric technology often constrained by limited annotated data and imaging variations. While data augmentation mitigates this, strategies designed for natural images may disrupt the fine-grained topology and textures essential for identity discrimination. We present AGVBench, which evalua…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.6
2026-07-02 · Xi Zhang, Papi Menon, Vivian Chu, Koray Cosguner
General AI
Since ChatGPT's launch in November 2022, open-source agentic frameworks have proliferated, making framework selection important for engineering teams while obscured by popularity signals such as GitHub stars. This paper analyzes 15 major open-source AI agent framework repositories from late 2022 to early 2026, using 80…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-02-22 · Juan Rodriguez, Haotian Zhang, Abhay Puri, Tianyang Zhang, Rishav Pramanik, Meng Lin, Xiaoqing Xie, Marco Terral, Darsh Kaushik, Aly Shariff, Perouz Taslakian, Spandana Gella, Sai Rajeswar, David Vazquez, Christopher Pal, Marco Pedersoli
General AI
We introduce VectorGym, a comprehensive benchmark suite for Scalable Vector Graphics (SVG) that spans generation from text and sketches, complex editing, and visual understanding. VectorGym addresses the lack of realistic, challenging benchmarks aligned with professional design workflows. Our benchmark comprises four t…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-03-27 · Antoine Edy, Max Conti, Quentin Macé
General AI
While Late Interaction models exhibit strong retrieval performance, many of their underlying dynamics remain understudied, potentially hiding performance bottlenecks. In this work, we focus on two topics in Late Interaction retrieval: a length bias that arises when using multi-vector scoring, and the similarity distrib…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.5
2026-03-30 · Zidi Tao, A. Agung Julius, John T Wen
Research Track A
Sleep is vital for maintaining cognitive function, facilitating metabolic waste removal, and supporting memory consolidation. However, modern societal demands, particularly shift work, often disrupt natural sleep patterns. This can induce excessive sleepiness among shift workers in critical sectors such as healthcare a…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 8.5
2026-03-30 · Kailai Feng, Yuxiang Wei, Bo Chen, Yang Pan, Hu Ye, Songwei Liu, Chenqian Yan, Yuan Gao
General AI
Diffusion models have made significant progress in both text-to-image (T2I) generation and text-guided image editing. However, these models are typically built with billions of parameters, leading to high latency and increased deployment challenges. While on-device diffusion models improve efficiency, they largely focu…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 8.5
2026-03-31 · Shifang Zhao, Yihan Hu, Ying Shan, Yunchao Wei, Xiaodong Cun
General AI
Editing the video content with audio alignment forms a digital human-made art in current social media. However, the time-consuming and repetitive nature of manual video editing has long been a challenge for filmmakers and professional content creators alike. In this paper, we introduce CutClaw, an autonomous multi-agen…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 8.5
2026-04-06 · Andrew Sellergren, Chufan Gao, Fereshteh Mahvar, Timo Kohlberger, Fayaz Jamil, Madeleine Traverse, Alberto Tono, Bashir Sadjad, Lin Yang, Charles Lau, Liron Yatziv, Tiffany Chen, Bram Sterling, Kenneth Philbrick, Richa Tiwari, Yun Liu, Madhuram Jajoo, Chandrashekar Sankarapu, Swapnil Vispute, Harshad Purandare, Abhishek Bijay Mishra, Sam Schmidgall, Tao Tu, Anil Palepu, Chunjong Park, Tim Strother, Rahul Thapa, Yong Cheng, Preeti Singh, Kat Black, Yossi Matias, Katherine Chou, Avinatan Hassidim, Kavi Goel, Joelle Barral, Tris Warkentin, Shravya Shetty, Dale Webster, Sunny Virmani, David F. Steiner, Can Kirmizibayrak, Daniel Golden
General AI
We introduce MedGemma 1.5 4B, the latest model in the MedGemma collection. MedGemma 1.5 expands on MedGemma 1 by integrating additional capabilities: high-dimensional medical imaging (CT/MRI volumes and histopathology whole slide images), anatomical localization via bounding boxes, multi-timepoint chest X-ray analysis,…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-04-09 · Junjie Fei, Jun Chen, Zechun Liu, Yunyang Xiong, Chong Zhou, Wei Wen, Junlin Han, Mingchen Zhuge, Saksham Suri, Qi Qian, Shuming Liu, Lemeng Wu, Raghuraman Krishnamoorthi, Vikas Chandra, Mohamed Elhoseiny, Chenchen Zhu
General AI
Adapting Multimodal Large Language Models (MLLMs) for hour-long videos is bottlenecked by context limits. Dense visual streams saturate token budgets and exacerbate the lost-in-the-middle phenomenon. Existing heuristics, like sparse sampling or uniform pooling, blindly sacrifice fidelity by discarding decisive moments …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-04-10 · Aarush Sinha, Arion Das, Soumyadeep Nag, Charan Karnati, Shravani Nag, Chandra Vadhan Raj, Aman Chadha, Vinija Jain, Suranjana Trivedy, Amitava Das
General AI
As large language models (LLMs) are increasingly deployed as autonomous agents, understanding how strategic behavior emerges in multi-agent environments has become an important alignment challenge. We take a neutral empirical stance and construct a controlled environment in which strategic behavior can be directly obse…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-04-13 · Shuquan Lian, Juncheng Liu, Yazhe Chen, Yuhong Chen, Hui Li
General AI
Prior representative ReAct-style approaches in autonomous Software Engineering (SWE) typically lack the explicit System-2 reasoning required for deep analysis and handling complex edge cases. While recent reasoning models demonstrate the potential of extended Chain-of-Thought (CoT), applying them to the multi-turn SWE …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.5
2026-04-14 · Tianchang Shen, Sherwin Bahmani, Kai He, Sangeetha Grama Srinivasan, Tianshi Cao, Jiawei Ren, Ruilong Li, Zian Wang, Nicholas Sharp, Zan Gojcic, Sanja Fidler, Jiahui Huang, Huan Ling, Jun Gao, Xuanchi Ren
Research Track A
Recent advances in video generation enable a new paradigm for 3D scene creation: generating camera-controlled videos that simulate scene walkthroughs, then lifting them to 3D via feed-forward reconstruction techniques. This generative reconstruction approach combines the visual fidelity and creative capacity of video m…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.5
2026-04-15 · Julian Killingback, Ofer Meshi, Henry Li, Hamed Zamani, Maryam Karimzadehgan
Research Track A · General AI
Traditional Retrieval-Augmented Generation (RAG) approaches generally assume that retrieval and generation occur on powerful servers removed from the end user. While this reduces local hardware constraints, it introduces significant drawbacks: privacy concerns regarding data access, recurring maintenance and storage co…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-04-16 · Ido Galil, Moshe Kimhi, Ran El-Yaniv
General AI
Deep Neural Networks (DNNs) can be catastrophically disrupted by flipping only a handful of parameter bits. We introduce Deep Neural Lesion (DNL), a data-free and optimizationfree method that locates critical parameters, and an enhanced single-pass variant, 1P-DNL, that refines this selection with one forward and backw…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-04-16 · Yifu Chen, Shengpeng Ji, Qian Chen, Tianle Liang, Yangzhuo Li, Ziqing Wang, Wen Wang, Jingyu Lu, Haoxiao Wang, Xueyi Pu, Fan Zhuo, Zhou Zhao
General AI
End-to-end spoken dialogue models have garnered significant attention because they offer a higher potential ceiling in expressiveness and perceptual ability than cascaded systems. However, the intelligence and expressiveness of current open-source spoken dialogue models often remain below expectations. Motivated by the…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-04-19 · Qingcheng Zeng, Yuheng Lu, Zeqi Zhou, Heli Qi, Puxuan Yu, Fuheng Zhao, Hitomi Yanaka, Weihao Xuan, Naoto Yokoya
General AI
Code-switching is a pervasive linguistic phenomenon in global communication, yet modern information retrieval systems remain predominantly designed for, and evaluated within, monolingual contexts. To bridge this critical disconnect, we present a holistic study dedicated to code-switching IR. We introduce CSR-L (Code-Sw…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-04-20 · Qingcheng Zeng, Puxuan Yu, Aman Mehta, Fuheng Zhao, Rajhans Samdani
General AI
Instruction-following information retrieval (IF-IR) studies retrieval systems that must not only find documents relevant to a query, but also obey explicit user constraints such as required attributes, exclusions, or output preferences. However, most retrievers are trained primarily for semantic relevance and often fai…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-04-22 · Wenhong Zhu, Ruobing Xie, Rui Wang, Pengfei Liu
General AI
Knowledge distillation (KD) is a powerful paradigm for compressing large language models (LLMs), whose effectiveness depends on intertwined choices of divergence direction, optimization strategy, and data regime. We break down the design of existing KD methods and present a unified view that establishes connections bet…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.5
2026-04-22 · Daniele Corradetti, Renato Corradetti
Research Track A · General AI
We present a biologically detailed extension of the classical Hopfield/Marr auto-associative memory model for CA3, implementing ten populations (two asymmetric pyramidal subtypes, eight GABAergic interneuron classes), forty-seven compartments, multi-rule plasticity (recurrent Hebb, BCM anti-saturation, mossy-fiber shor…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 8.5
2026-04-23 · Yao Zhang, Zhuchenyang Liu, Thomas Ploetz, Yu Xiao
General AI
The world knowledge and reasoning capabilities of text-based large language models (LLMs) are advancing rapidly, yet current approaches to human motion understanding, including motion question answering and captioning, have not fully exploited these capabilities. Existing LLM-based methods typically learn motion-langua…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-04-23 · Kwan Yun, Changmin Lee, Ayeong Jeong, Youngseo Kim, Seungmi Lee, Junyong Noh
General AI
Creative face stylization aims to render portraits in diverse visual idioms such as cartoons, sketches, and paintings while retaining recognizable identity. However, current identity encoders, which are typically trained and calibrated on natural photographs, exhibit severe brittleness under stylization. They often mis…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 8.5
2026-04-24 · Yaxuan Li, Zhongyi Zhou, Yefei Chen, Yaokai Xue, Yichen Zhu
General AI
Evaluating robotics policies across thousands of environments and thousands of tasks is infeasible with existing approaches. This motivates the need for a new methodology for scalable robotics policy evaluation. In this paper, we propose dWorldEval, which uses a discrete diffusion world model as a scalable evaluation p…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-04-27 · NVIDIA, Amala Sanjay Deshmukh, Kateryna Chumachenko, Tuomas Rintamaki, Matthieu Le, Tyler Poon, Danial Mohseni Taheri, Ilia Karmanov, Guilin Liu, Jarno Seppanen, Arushi Goel, Mike Ranzinger, Greg Heinrich, Guo Chen, Lukas Voegtle, Philipp Fischer, Timo Roman, Karan Sapra, Collin McCarthy, Shaokun Zhang, Fuxiao Liu, Hanrong Ye, Yi Dong, Mingjie Liu, Yifan Peng, Piotr Zelasko, Zhehuai Chen, Nithin Rao Koluguri, Nune Tadevosyan, Lilit Grigoryan, Ehsan Hosseini Asl, Pritam Biswas, Leili Tavabi, Yuanhang Su, Zhiding Yu, Peter Jin, Alexandre Milesi, Netanel Haber, Yao Xu, Sarah Amiraslani, Nabin Mulepati, Eric Tramel, Jaehun Jung, Ximing Lu, Brandon Cui, Jin Xu, Zhiqi Li, Shihao Wang, Yuanguo Kuang, Huck Yang, Boyi Li, Hongxu Yin, Song Han, Pavlo Molchanov, Adi Renduchintala, Charles Wang, David Mosallanezhad, Soumye Singhal, Luis Vega, Katherine Cheung, Sreyan Ghosh, Yian Zhang, Alexander Bukharin, Venkat Srinivasan, Johnny Greco, Andre Manoel, Maarten Van Segbroeck, Suseella Panguliri, Rohit Watve, Divyanshu Kakwani, Shubham Pachori, Jeffrey Glick, Radha Sri-Tharan, Aileen Zaman, Khanh Nguyen, Shi Chen, Jiaheng Fang, Qing Miao, Wenfei Zhou, Yu Wang, Zaid Pervaiz Bhat, Varun Praveen, Arihant Jain, Ramanathan Arunachalam, Tomasz Kornuta, Ashton Sharabiani, Amy Shen, Wei Huang, Yi-Fu Wu, Ali Roshan Ghias, Huiying Li, Brian Yu, Nima Tajbakhsh, Chen Cui, Wenwen Gao, Li Ding, Terry Kong, Manoj Kilaru, Anahita Bhiwandiwalla, Marek Wawrzos, Daniel Korzekwa, Pablo Ribalta, Grzegorz Chlebus, Besmira Nushi, Ewa Dobrowolska, Maciej Jakub Mikulski, Kunal Dhawan, Steve Huang, Jagadeesh Balam, Yongqiang Wang, Nikolay Karpov, Valentin Mendelev, George Zelenfroynd, Meline Mkrtchyan, Omri Almog, Bhavesh Pawar, Rameshwar Shivbhakta, Sudeep Sabnis, Ashrton Sharabiani, Negar Habibi, Geethapriya Venkataramani, Pamela Peng, Prerit Rodney, Serge Panev, Richard Mazzarese, Nicky Liu, Michael Fukuyama, Andrii Skliar, Roger Waleffe, Duncan Riach, Yunheng Zou, Jian Hu, Hao Zhang, Binfeng Xu, Yuhao Yang, Zuhair Ahmed, Carlo del Mundo, Chad Voegele, Zhiyu Cheng, Nave Assaf, Daniel Afrimi, Natan Bagrov, Ran Zilberstein, Ofri Masad, Eugene Khvedchenia, Borys Tymchenko, Tomer Asida, Parth Mannan, Victor Cui, Michael Evans, Katherine Luna, Jie Lou, Pinky Xu, Guyue Huang, Michael Boone, Pradeep Thalasta, Adeola Adesoba, Dina Yared, Christopher Parisien, Leon Derczynski, Shaona Ghosh, Wes Feely, Micah Schaffer, Barnaby Simkin, Tomasz Grzegorzek, Rishabh Garg, Aastha Jhunjhunwala, Sergei Kolchenko, Farzan Memarian, Haran Kumar, Shiv Kumar, Isabel Hulseman, Anjali Shah, Kari Briski, Padmavathy Subramanian, Joey Conway, Udi Karpas, Jane Polak Scowcroft, Annie Surla, Shilpa Ammireddy, Ellie Evans, Jesse Oliver, Tom Balough, Chia-Chih Chen, Sandip Bhaskar, Alejandra Rico, Bardiya Sadeghi, Seph Mard, Meredith Price, Laya Sleiman, Saori Kaji, Wesley Helmholz, Wendy Quan, Michael Lightstone, Jonathan Cohen, Jian Zhang, Oleksii Kuchaiev, Boris Ginsburg, Jan Kautz, Eileen Long, Mohammad Shoeybi, Mostofa Patwary, Oluwatobi Olabiyi, Andrew Tao, Bryan Catanzaro
Research Track B · General AI
We introduce Nemotron 3 Nano Omni, the latest model in the Nemotron multimodal series and the first to natively support audio inputs alongside text, images, and video. Nemotron 3 Nano Omni delivers consistent accuracy improvements over its predecessor, Nemotron Nano V2 VL, across all modalities, enabled by advances in …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-04-27 · Mohammadmehdi Ataei, Farzaneh Askari, Kamal Rahimi Malekshan, Pradeep Kumar Jayaraman
General AI
Computer-Aided Design (CAD) models are defined by their construction history: a parametric recipe that encodes design intent. However, existing large-scale 3D datasets predominantly consist of boundary representations (B-Reps) or meshes, stripping away this critical procedural information. To address this scarcity, we …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 8.5
2026-04-28 · Arnon Mazza, Elad Levi
General AI
Deploying guardrails for custom policies remains challenging, as generic safety models fail to capture task-specific requirements, while prompting LLMs suffers from inconsistent boundary-case performance and high inference costs. Training custom classifiers achieves both accuracy and efficiency, yet demands substantial…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.5
2026-05-06 · Sohom Datta, Alex Nahapetyan, William Enck, Alexandros Kapravelos
Research Track B · General AI
Large language models (LLMs) are increasingly being integrated into web browsers to create agentic browsing systems that execute actions on behalf of the user. Prior work considering the security of agentic browsers focuses exclusively on indirect prompt-injection attacks. However, by failing to consider traditional we…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-05-07 · Pranav Mantini, Shishir K. Shah
Research Track A
We address the challenge of knowledge composition in Vision-Language Models (VLMs), where accumulating expertise across multiple domains or tasks typically leads to catastrophic forgetting. We introduce GeoStack (Geometric Stacking), a modular framework that allows independently trained domain experts to be composed in…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.5
2026-05-11 · Wei Chow, Linfeng Li, Xian Sun, Lingdong Kong, Zefeng Li, Qi Xu, Hang Song, Tian Ye, Xian Wang, Jinbin Bai, Shilin Xu, Xiangtai Li, Junting Pan, Shaoteng Liu, Ran Zhou, Tianshu Yang, Songhua Liu
Research Track A · General AI
Diffusion models dominate image editing, yet their global denoising mechanism entangles edited regions with surrounding context, causing modifications to propagate into areas that should remain intact. We propose a fundamentally different approach by leveraging Masked Generative Transformers (MGTs), whose localized tok…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.5
2026-05-14 · Gloria Fernández-Nieto, Kiyoshige Garcés, Mladen Raković, Tongguang Li, Xinyu Li, Linxuan Zhao, Dragan Gašević
Research Track A
Background: Abilities for effective self-regulated learning (SRL) are critical for lifelong learning, particularly during adolescence when these skills consolidate and strongly influence future learning. Their importance has grown with the rise of online and blended education. Yet, little is known about how secondary s…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-05-21 · Farhat Shaikh, Ayan Banerjee, Sandeep Gupta
General AI
We introduce EMMA, a physics-informed multimodal framework that recovers all identifiable dynamical parameters of a system directly from raw video, audio, and image-based time-series observations. Unlike prior video-only approaches that struggle with occluded states, hidden actuation inputs, or assumptions about known …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.5
2026-05-21 · Sayantani Ghosh, Rajashik Datta, Amit Kumar Das, Amlan Chakrabarti
Research Track A
Transformer-based language models such as BERT having 110M+ parameters have revolutionized natural language understanding, yet their internal mechanisms remain largely opaque to researchers and practitioners. Traditional attention-based interpretability methods often emphasize structurally important but semantically we…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.5
2026-05-29 · Fatima Ahmad Muazu, Festus Adedoyin, Huseyin Dogan, Abiodun Adedeji, Melike Akca, Olumuyiwa Ayorinde
Research Track A · General AI
This study investigates how UX research (UXR) principles, combined with Large Language Model (LLM)-supported analysis, can be used to improve the quality of requirements for mobile learning systems designed for learners with cognitive disabilities. Using the UXR Point-of-View (PoV) pyramid as a methodological framework…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-06-01 · Richard Schwarzkopf, Fabian Immel, Alexander Blumberg, Jonas Merkert, Nils Rack, Kaiwen Wang, Fabian Konstantinidis, Julian Truetsch, Carlos Fernandez, Annika Bätz, Kevin Rösch, Marlon Steiner, Willi Poh, Yinzhe Shen, Royden Wagner, Felix Hauser, Dominik Strutz, Jaime Villa, Gleb Stepanov, Holger Caesar, Ömer Şahin Taş, Frank Bieder, Jan-Hendrik Pauls, Christoph Stiller
General AI
Existing autonomous driving datasets have enabled major progress, but fall short in sensor fidelity, map completeness, or geographic diversity. We present KITScenes Multimodal, a European dataset built around high-fidelity sensors and maps. Our fully synchronized sensor suite combines high-resolution global-shutter cam…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-06-08 · Ziqian Zhong, Ivgeni Segal, Ivan Bercovich, Shashwat Saxena, Kexun Zhang, Aditi Raghunathan
General AI
Agent benchmarks score submissions with outcome verifiers that are typically hand-written and brittle, leaving them open to reward hacking. We audit 1,968 tasks across five terminal-agent benchmarks and find 323 (16%) hackable by frontier models given only the task description. This corrupts both leaderboard rankings a…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-06-10 · Yuchen Xian, Yunqiu Xu, Yang He, Yi Yang
General AI
Multimodal image fusion aims to integrate complementary information from different modalities into a fused image that preserves rich local details while maintaining globally consistent appearance. Existing approaches build shared representations on 2D feature grids, which excel at modeling local structures but offer li…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-06-10 · Dingyu Yao, Junhao Zhou, Chenxu Yang, Chuanyu Qin, Haowen Hou, Zheming Liang, Congcong Wang, Yuhang Cao, Shenglong Ye, Shuai Xie, Shuhuan Gu, Haoyang Huang, Qingyi Si, Nan Duan, Jiaqi Wang
General AI
Many moments in the real world do not wait for a user to ask. A fire starts on a security monitor, an expression flickers across a video call, or a product a viewer wants flashes by in a livestream. Yet today's large models remain mostly turn-based by design: they answer only when addressed, and even video-call apps th…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.4
2026-06-25 · Fenghe Guo, Runjie Shen, Chenyang Sun, Junrui Zhang, Quanxi Zhan, Yongchun Wang, Junjie Zhang
General AI
Hydropower tunnel inspection is critical for infrastructure integrity yet remains inefficient and hazardous using manual methods. We propose FLISP (Fast LiDAR-IMU Synchronized Path Planner), a mapless planning framework for cooperative UGV-UAV inspection. Unlike traditional map-based paradigms, FLISP features three cor…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-03-23 · Donald Shenaj, Federico Errica, Antonio Carta
General AI
Low Rank Adaptation (LoRA) is the de facto fine-tuning strategy to generate personalized images from pre-trained diffusion models. Choosing a good rank is extremely critical, since it trades off performance and memory consumption, but today the decision is often left to the community's consensus, regardless of the pers…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-03-24 · Wei Sun, Ting Wang, Xinran Tian, Wanshun Lan, Xuhan Feng, Haoyue Li, Fangxin Wang
General AI
Existing LLM-based Kubernetes diagnostic systems cannot learn from operational experience, operating on static knowledge bases without improving from past resolutions. We present MetaKube, an experience-aware LLM framework through three synergistic innovations: (1) an Episodic Pattern Memory Network (EPMN) that abstrac…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-03-26 · Moiz Sadiq Awan, Muhammad Haris Noor, Muhammad Salman Munaf
Research Track A · General AI
Automated benchmarks dominate the evaluation of large language models, yet no systematic study has compared user satisfaction, adoption motivations, and frustrations across competing platforms using a consistent instrument. We address this gap with a cross-platform survey of 388 active AI chat users, comparing satisfac…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-03-26 · Sk Miraj Ahmed, Xi Yu, Yunqi Li, Yuewei Lin, Wei Xu
General AI
Accurate biodiversity identification from large-scale field data is a foundational problem with direct impact on ecology, conservation, and environmental monitoring. In practice, the core task is taxonomic prediction - inferring order, family, genus, or species from imperfect inputs such as specimen images, DNA barcode…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-03-26 · Lei Wang, YuXin Song, Ge Wu, Haocheng Feng, Hang Zhou, Jingdong Wang, Yaxing Wang, jian Yang
General AI
Reference-to-video (R2V) generation is a controllable video synthesis paradigm that constrains the generation process using both text prompts and reference images, enabling applications such as personalized advertising and virtual try-on. In practice, existing R2V methods typically introduce additional high-level seman…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-03-29 · Fengxiang Li, Han Zhang, Haoyang Huang, Jinghui Wang, Jinhua Hao, Kun Yuan, Mengtong Li, Minglei Zhang, Pengcheng Xu, Wenhao Zhuang, Yizhen Shao, Zongxian Feng, Can Tang, Chao Wang, Chengxiao Tong, Fan Yang, Gang Xiong, Haixuan Gao, Han Gao, Hao Wang, Haochen Liu, Hongliang Sun, Jiabao Li, Jingwen Chang, Jun Du, Junyi Peng, Leizhen Cui, Meimei Jing, Mingqi Wu, Shangpeng Yan, Shaotong Qi, Suzhe Xu, Wenxuan Zhao, Xianda Sun, Xuan Xie, Yanbo Wang, Yao Xia, Yinghan Cui, Yingpeng Chen, Yong Wang, Yuze Shi, Zhiwei Shen, Ziyu Wang, Ming Sun, Lin Ye, Bin Chen
General AI
We present KAT-Coder-V2, an agentic coding model developed by the KwaiKAT team at Kuaishou. KAT-Coder-V2 adopts a "Specialize-then-Unify" paradigm that decomposes agentic coding into five expert domains - SWE, WebCoding, Terminal, WebSearch, and General - each undergoing independent supervised fine-tuning and reinforce…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-03-30 · Gabriele Gemmi, Michele Polese, Tommaso Melodia
General AI
The large-scale deployment of 5G networks has not delivered the expected return on investment for mobile network operators, raising concerns about the economic viability of future 6G rollouts. At the same time, surging demand for Artificial Intelligence (AI) inference and training workloads is straining global compute …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-03-30 · Maoguo Gao, Zejun Zhu, Zhiming Sun, Zhengwei Ma, Longze Yuan, Zhongjing Ma, Zhigang Gao, Jinhui Zhang, Suli Zou
General AI
Open-Vocabulary Object Navigation (OVON) requires an embodied agent to locate a language-specified target in unknown environments. Existing zero-shot methods often reason over dense frontier points under incomplete observations, causing unstable route selection, repeated revisits, and unnecessary action overhead. We pr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-03-30 · Mohamed Elgouhary, Amr S. El-Wakeel
General AI
Pure Pursuit (PP) is a widely used path-tracking algorithm in autonomous vehicles due to its simplicity and real-time performance. However, its effectiveness is sensitive to the choice of lookahead distance: shorter values improve cornering but can cause instability on straights, while longer values improve smoothness …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-03-30 · Minh-Thu Do, Quynh-Chau Le-Tran, Duc-Duy Nguyen-Mai, Thien-Trang Nguyen, Khanh-Duy Le, Minh-Triet Tran, Tam V. Nguyen, Trung-Nghia Le
General AI
The Four Books have shaped East Asian intellectual traditions, yet their multi-layered interpretive complexity limits their accessibility in the digital age. While traditional bilingual commentaries provide a vital pedagogical bridge, computational frameworks are needed to preserve and explore this wisdom. This paper b…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-03-31 · Theodora Panagea, Nikolaos Koursioumpas, Lina Magoula, Ramin Khalili
General AI
Progressing toward a new generation of mobile networks, a clear focus on integrating distributed intelligence across the system is observed to drive performance, autonomy, and real-time adaptability. Federated learning (FL) stands out as a key emerging technique, enabling on-device model training while preserving data …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-03-31 · Yudong Gao, Zongjie Li, Yuanyuanyuan, Zimo Ji, Pingchuan Ma, Shuai Wang
General AI
LLM-based coding agents rely on \emph{skills}, pre-packaged instruction sets that extend agent capabilities, yet every token of skill content injected into the context window incurs both monetary cost and attention dilution. To understand the severity of this problem, we conduct a large-scale empirical study of 55,315 …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-01 · Kazuki Yano, Jun Suzuki, Shinji Watanabe
General AI
Adapting pre-trained text Large Language Models (LLMs) into Speech Language Models (Speech LMs) via continual pretraining on speech data is promising, but often degrades the original text capabilities. We propose Multimodal Depth Upscaling, an extension of an emerging strategy in continual LLM pre-training, where new t…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-02 · Abinitha Gourabathina, Inkit Padhi, Manish Nagireddy, Subhajit Chaudhury, Prasanna Sattigeri
General AI
For Large Language Models (LLMs) to be reliably deployed, models must effectively know when not to answer: abstain. Reasoning models, in particular, have gained attention for impressive performance on complex tasks. However, reasoning models have been shown to have worse abstention abilities. Taking the vulnerabilities…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-02 · Karan Taneja, Anjali Singh, Ashok K. Goel
General AI
Multimodal Large Language Models (MLLMs) offer an opportunity to support multimedia learning through conversational systems grounded in educational content. However, while conversational AI is known to boost engagement, its impact on learning in visually-rich STEM domains remains under-explored. Moreover, there is limi…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-02 · Saman Motamed, William Harvey, Benjamin Klein, Luc Van Gool, Zhuoning Yuan, Ta-Ying Cheng
General AI
Existing video object removal methods excel at inpainting content "behind" the object and correcting appearance-level artifacts such as shadows and reflections. However, when the removed object has more significant interactions, such as collisions with other objects, current models fail to correct them and produce impl…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-03 · Zhenyu Gao, Wenxi Jiang, Yutong Yan
General AI
Prior research shows that large language models (LLMs) exhibit systematic extrapolation bias when forming predictions from both experimental and real-world data, and that prompt-based approaches appear limited in alleviating this bias. We propose a supervised fine-tuning (SFT) approach that uses Low-Rank Adaptation (Lo…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-03 · Haotian Xiang, Bingcong Li, Qin Lu
General AI
When deploying large language models (LLMs) to safety-critical applications, uncertainty quantification (UQ) is of utmost importance to self-assess the reliability of the LLM-based decisions. However, such decisions typically suffer from overconfidence, particularly after parameter-efficient fine-tuning (PEFT) for down…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-06 · Rafael O. Jarczewski, Gabriel U. Talasso, Leandro Villas, Allan M. de Souza
General AI
Although Federated Learning (FL) promises privacy and distributed collaboration, its effectiveness in real-world scenarios is often hampered by the stochastic heterogeneity of clients and unpredictable system dynamics. Existing static optimization approaches fail to adapt to these fluctuations, resulting in resource un…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-06 · Mohammad Zangooei, Jannis Weil, Amr Rizk, Mina Tahmasbi Arashloo, Raouf Boutaba
General AI
Deep reinforcement learning (DRL) has shown remarkable performance on complex control problems in systems and networking, including adaptive video streaming, wireless resource management, and congestion control. For safe deployment, however, it is critical to reason about how agents behave across the range of system st…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-07 · Pranjal Aggarwal, Graham Neubig, Sean Welleck
General AI
Computer-use agents hold the promise of assisting in a wide range of digital economic activities. However, current research has largely focused on short-horizon tasks over a limited set of software with limited economic value, such as basic e-commerce and OS-configuration tasks. A key reason is that creating environmen…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-07 · Hao Chen, Fang Qiu, Fangchao Dong, Defei Yang, Eve Bohnett, Li An
General AI
This study proposes a lightweight multimodal adaptation framework to bridge the representation gap between RGB-pretrained VLMs and thermal infrared imagery, and demonstrates its practical utility using a real drone-collected dataset. A thermal dataset was developed from drone-collected imagery and was used to fine-tune…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-07 · Sangwook Lee, Sang Won Lee, Adnan Abbas, Young-Ho Kim, Yan Chen
General AI
Modern task-oriented chatbots present GUI elements alongside natural-language dialogue, yet the agent's role has largely been limited to interpreting natural-language input as GUI actions and following a linear workflow. In preference-driven, multi-step tasks such as booking a flight or reserving a restaurant, earlier …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-09 · Zhiyuan Wang, Erzhen Hu, Mark Rucker, Laura E. Barnes
General AI
Personal AI tools can now be generated from natural-language requests, but they often remain isolated after creation. We present PSI, a shared-state architecture that turns independently generated modules into coherent instruments: persistent, connected, and chat-complementary artifacts accessible through both GUIs and…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-12 · Wenhao Zhang, Lin Mu, Li Ni, Peiquan Jin, Yiwen Zhang
General AI
Low-rank adaptation (LoRA) is a widely used strategy for efficient fine-tuning of large language models (LLMs), but its strictly linear structure fundamentally limits expressive capacity. The bilinear formulation of weight updates captures only first-order dependencies between low-rank factors, restricting the modeling…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-13 · J. Oppliger, M. Stifter, A. Rüegg, I. Biało, L. Martinelli, P. G. Freeman, D. Prabhakaran, J. Zhao, Q. Wang, J. Chang
General AI
Automation underpins progress across scientific and industrial disciplines. Yet, automating tasks requiring interpretation of abstract visual information remain challenging. For example, crystal alignment strongly relies on humans with the ability to comprehend diffraction patterns. Here we introduce an autonomous syst…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-13 · Chenxi Qing, Junxi Wu, Zheng Liu, Yixiang Qiu, Hongyao Yu, Bin Chen, Hao Wu, Shu-Tao Xia
General AI
Recently, large language models (LLMs) are capable of generating highly fluent textual content. While they offer significant convenience to humans, they also introduce various risks, like phishing and academic dishonesty. Numerous research efforts have been dedicated to developing algorithms for detecting AI-generated …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-13 · Wei Zhao, Zhe Li, Peixin Zhang, Jun Sun
General AI
Tool-augmented Large Language Model (LLM) agents have demonstrated impressive capabilities in automating complex, multi-step real-world tasks, yet remain vulnerable to indirect prompt injection. Adversaries exploit this weakness by embedding malicious instructions within tool-returned content, which agents directly inc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-13 · Adam Stein, Davis Brown, Hamed Hassani, Mayur Naik, Eric Wong
General AI
To identify safety violations, auditors often search over large sets of agent traces. This search is difficult because failures are often rare, complex, and sometimes even adversarially hidden and only detectable when multiple traces are analyzed together. These challenges arise in diverse settings such as misuse campa…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-13 · Shiyu Teng, Jiaqing Liu, Hao Sun, Yu Li, Shurong Chai, Ruibo Hou, Tomoko Tateyama, Lanfen Lin, Yen-Wei Chen
General AI
Depression remains widely underdiagnosed and undertreated because stigma and subjective symptom ratings hinder reliable screening. To address this challenge, we propose a coarse-to-fine, multi-stage framework that leverages large language models (LLMs) for accurate and interpretable detection. The pipeline performs bin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-14 · Jiahao Shao, Anam Nawaz Khan, Christopher Brett, Tom Berg, Xueping Li, Bing Yao
General AI
Pathology reports serve as the definitive record for breast cancer staging, yet their unstructured format impedes large-scale data curation. While Large Language Models (LLMs) offer semantic reasoning, their deployment is often limited by high computational costs and hallucination risks. This study introduces a paramet…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-16 · Zoe Fingleton, Nazanin Siavash, Armin Moin
General AI
In this paper, we focus on automating two of the widely used Verification and Validation (V&V) activities in the Software Development Lifecycle (SDLC): Software testing and software inspection (also known as review). Concerning the former, we concentrate on automated test case generation using Large Language Models (LL…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-16 · Mélanie Roschewitz, Kenneth Styppa, Yitian Tao, Jiwoong Sohn, Jean-Benoit Delbrouck, Benjamin Gundersen, Nicolas Deperrois, Christian Bluethgen, Julia Vogt, Bjoern Menze, Farhad Nooralahzadeh, Michael Krauthammer, Michael Moor
General AI
Vision-language models (VLM) have markedly advanced AI-driven interpretation and reporting of complex medical imaging, such as computed tomography (CT). Yet, existing methods largely relegate clinicians to passive observers of final outputs, offering no interpretable reasoning trace for them to inspect, validate, or re…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-16 · Aihua Li
General AI
Flow matching retains the generation quality of diffusion models while enabling substantially faster inference, making it a compelling paradigm for generative modeling. However, when applied to language modeling, it exhibits fundamental limitations in representing complex latent distributions with irregular geometries,…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-16 · Joel Perca, Luis Sante, Juanpablo Heredia, Joao Rulff, Claudio Silva, Jorge Poco
General AI
Extracting actionable insights from long-duration urban videos is often labor-intensive: analysts must manually sift through raw footage to pinpoint target events or uncover broader behavioral trends. In this work, we present URBANCLIPATLAS, a visual analytics system for exploring long urban videos recorded at street i…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-21 · Nico Baumgart, Markus Lange-Hegermann, Jan Henze
General AI
Efficient semantic access to industrial product data is a key enabler for factory automation and emerging LLM-based agent workflows, where both human engineers and autonomous agents must identify suitable components from highly structured catalogs. However, the vocabulary mismatch between natural-language queries and a…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-21 · Segun Aroyehun, Stephan Lewandowsky, David Garcia
General AI
The pursuit of truth is central to democratic deliberation and governance, yet political discourse reflects varying epistemic orientations, ranging from evidence-based reasoning grounded in verifiable information to intuition-based reasoning rooted in beliefs and subjective interpretation. We introduce a scalable appro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-21 · Boyan Shi, Wei Chen, Shuyuan Zhao, Junfeng Shen, Shengnan Guo, Shaojiang Wang, Huaiyu Wan
General AI
The combination of Mixture-of-Experts (MoE) and Low-Rank Adaptation (LoRA) has shown significant potential for enhancing the multi-task learning capabilities of Large Language Models. However, existing methods face two primary challenges: (1)Imprecise Routing in the current MoE-LoRA method fails to explicitly match inp…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-21 · Austin Coursey, Abel Diaz-Gonzalez, Marcos Quinones-Grueiro, Gautam Biswas
Research Track A · General AI
Reinforcement learning (RL) offers a compelling data-driven paradigm for synthesizing controllers for complex systems when accurate physical models are unavailable; however, most existing control-oriented RL methods assume stationarity and, therefore, struggle in real-world non-stationary deployments where system dynam…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-22 · Guotao Liang, Zhangcheng Wang, Juncheng Hu, Haitao Zhou, Ziteng Xue, Jing Zhang, Dong Xu, Qian Yu
General AI
Multimodal Large Language Models (MLLMs) have shown promising capabilities in generating Scalable Vector Graphics (SVG) via direct code synthesis. However, existing paradigms typically adopt an open-loop "blind drawing" approach, where models generate symbolic code sequences without perceiving intermediate visual outco…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-22 · Hanzhi Liu, Chaofan Shou, Xiaonan Liu, Hongbo Wen, Yanju Chen, Ryan Jingyang Fang, Yu Feng
General AI
LLM agents have begun to find real security vulnerabilities that human auditors and automated fuzzers missed for decades, in source-available targets where the analyst can build and instrument the code. In practice the work is split among several agents, wired together by a harness: the program that fixes which roles e…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-23 · Songen Gu, Yuhang Zheng, Weize Li, Yupeng Zheng, Yating Feng, Xiang Li, Yilun Chen, Pengfei Li, Wenchao Ding
General AI
Recently, end-to-end robotic manipulation models have gained significant attention for their generalizability and scalability. However, they often suffer from limited robustness to camera viewpoint changes when training with a fixed camera. In this paper, we propose VistaBot, a novel framework that integrates feed-forw…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-23 · Pegah Khayatan, Jayneel Parekh, Arnaud Dapogny, Mustafa Shukor, Alasdair Newson, Matthieu Cord
General AI
Despite impressive progress in capabilities of large vision-language models (LVLMs), these systems remain vulnerable to hallucinations, i.e., outputs that are not grounded in the visual input. Prior work has attributed hallucinations in LVLMs to factors such as limitations of the vision backbone or the dominance of the…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-24 · Hong Su
General AI
Autonomous robots operating in open environments need the ability to continuously handle tasks that are not covered by predefined local methods. However, existing approaches often rely on repeated large-language-model (LLM) interaction for uncovered tasks, and even successful executions or observed successful external …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-24 · Jia Li, Hongyi Deng, Yiran Zhang, Kechi Zhang, Tianqi Shao, Tiankuo Zhao, Weinan Wang, Zhi Jin, Ge Li, Yang Liu, Yingtao Fang, Yihong Dong
General AI
Writing code requires significant time and effort in software development. To automate this process, researchers have made substantial progress using Large Language Models (LLMs) for code generation. Many benchmarks like HumanEval and EvoCodeBench have been created to evaluate LLMs by requiring them to generate code fr…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-04-27 · Dibyadip Chatterjee, Zhanzhong Pang, Fadime Sener, Yale Song, Angela Yao
General AI
Streaming video models should respond the moment an event unfolds, not after the moment has passed. Yet existing online VideoQA benchmarks remain largely retrospective. They pause the video at fixed timestamps, pose questions about current or past events, and score models only at those moments. This protocol leaves str…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-27 · Sinin Zhang, Yunfei Xie, Yuxuan Cheng, Haoyu Zhang, Tong Zhang
General AI
Vision-Language Models (VLMs) have demonstrated strong performance on textbook-style physics problems, yet they frequently fail when confronted with dynamic real-world scenarios that require temporal consistency and causal reasoning across frames. We identify two fundamental challenges underlying these failures: (1) sp…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-27 · Zijian Guo, İlker Işık, H. M. Sabbir Ahmad, Wenchao Li
General AI
Specification-guided reinforcement learning (RL) provides a principled framework for encoding complex, temporally extended tasks using formal specifications such as linear temporal logic (LTL). While recent methods have shown promising results, their ability to generalize across unseen specifications and diverse enviro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-28 · Chu-Cheng Lin, Eugene Ie
General AI
Adapting reasoning models to new tasks during post-training with only output-level supervision stalls under reinforcement learning from verifiable rewards (RLVR) when the initial success probability $p_0$ is small. Using the Tsallis $q$-logarithm, we define a loss family $J_Q$ that interpolates between RLVR (at $q{=}0$…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-28 · Zeming Dong, Yuejun Guo, Qiang Hu, Yao Zhang, Maxime Cordy, Hao Liu, Mike Papadakis, Yongqiang Lyu
General AI
Source code and its accompanying comments are complementary yet naturally aligned modalities-code encodes structural logic while comments capture developer intent. However, existing vulnerability detection methods mostly rely on single-modality code representations, overlooking the complementary semantic information em…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-28 · Steve Coyne
General AI
Preference-based alignment methods, most prominently Reinforcement Learning with Human Feedback (RLHF), use the judgments of human annotators to shape large language model behaviour. However, the normative role of these judgments is rarely made explicit. I distinguish three conceptual models of that role. The first is …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-28 · Even Eilertsen, Vasileios Mavroeidis, Gudmund Grov
General AI
Security analysts are overwhelmed by the volume of alerts and the low context provided by many detection systems. Early-stage investigations typically require manual correlation across multiple log sources, a task that is usually time-consuming. In this paper, we present an experimental, agentic workflow that leverages…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-28 · Clinton Enwerem, Shreya Kalyanaraman, John S. Baras, Calin Belta
General AI
Contact variability, sensing uncertainty, and external disturbances make grasp execution stochastic. Expected-quality objectives ignore tail outcomes and often select grasps that fail under adverse contact realizations. Risk-sensitive POMDPs address this failure mode, but many use particle-filter beliefs that scale poo…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-29 · Zylan Benjert, Júlia Komjáthy, Johannes Lengler, John Lapinskas, Ulysse Schaller
General AI
It is a fundamental question in epidemiology to estimate, model and predict the growth rate of a pandemic. Analogously, analysing the diffusion of innovation, (fake) news, memes, and rumours is of key importance in the social sciences. The resulting epidemic growth curves can be classified according to their growth rat…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-29 · Manar Aljohani, Brandon Ho, Kenneth McKinley, Dennis Ren, Xuan Wang
General AI
Accurate and consistent Emergency Severity Index (ESI) assignment remains a persistent challenge in emergency departments, where highly variable free-text triage documentation contributes to mistriage and workflow inefficiencies. This study evaluates whether open-source small language models (SLMs) can serve as reliabl…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-29 · Yiqi Liu, Noelle Crawford, Michael Wang, Jilong Xue, Jian Huang
General AI
To overcome the well-known memory bottleneck of AI chips, 3D stacked architectures that employ advanced packaging technology with high-density through-silicon vias (TSVs) pins have proven to be a promising solution. The 3D-stacked AI chip enables ultra-high memory bandwidth between compute and memory by stacking numero…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-04-30 · Lincan Li, Zheng Chen, Yushun Dong
General AI
Electroencephalogram (EEG) signals are vital for automated seizure detection, but their inherent noise makes robust representation learning challenging. Existing graph construction methods, whether correlation-based or learning-based, often generate redundant or irrelevant edges due to the noisy nature of EEG data. Thi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.3
2026-05-01 · Jinpai Zhao, Nishant Panda, Yen Ting Lin, Eirik Valseth, Diane Oyen, Clint Dawson
General AI
We introduce HyCOP, a modular framework that learns parametric PDE solution operators by composing simple modules (advection, diffusion, learned closures, boundary handling) in a query-conditioned way. Rather than learning a monolithic map, HyCOP learns a policy over short programs - which module to apply and for how l…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-06-03 · Arpit Garg, Simon Lucey, Hemanth Saratchandran
General AI
Low-Rank Adaptation (LoRA) is a popular parameter-efficient fine-tuning (PEFT) method that restricts weight updates to low-rank adapters, introducing a fixed low-rank inductive bias by optimizing in a low-dimensional subspace. In this work, we question whether a fixed-rank constraint is the most effective inductive bia…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-06-03 · Matheus Kunzler Maldaner, Adam Fourney, Amanda Swearngin, Hussein Mozzanar, Gagan Bansal, Maya Murad, Rafah Hosn, Saleema Amershi, Hussein Mozannar
General AI
AI agents are increasingly asked to carry out work that spans minutes, hours, or longer. Yet the default model of agent behavior is continuous action: issuing tool calls, refreshing pages, searching for alternatives, or otherwise trying to force progress. This is the wrong approach for many long-running tasks, which ar…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-06-04 · Sondos Mahmoud Bsharat, Jiacheng Liu, Xiaohan Zhao, Tianjun Yao, Xinyi Shang, Yi Tang, Jiacheng Cui, Ahmed Elhagry, Salwa K. Al Khatib, Hao Li, Salman Khan, Zhiqiang Shen
General AI
As AI writing assistants become increasingly integrated into real-world drafting and revision workflows, many documents are no longer purely human-written or AI-generated, but instead result from progressive human-AI co-editing. However, existing AI-text detection benchmarks largely focus on final outputs and provide l…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-06-04 · Renjith Prasad, Chathurangi Shyalika, Anushka Pawar, Amit Sheth
General AI
Multimodal generative models produce fluent outputs but remain unreliable when generation must respect structured, domain-specific, or safety-critical knowledge. Existing methods incorporate knowledge through mechanisms such as prompt augmentation, guidance, latent editing, or fine-tuning, yet they are typically catego…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-06-05 · Duc Tri Tran, Trung Thanh Nguyen, Vijay John, Phi Le Nguyen, Yasutomo Kawanishi
General AI
Video Text Spotting (VTS) is essential for urban surveillance and intelligent transportation systems, enabling automated reading of street signs, vehicle markings, and scene text in video streams. However, reliable recognition remains challenging due to dynamic video factors common in surveillance scenarios, including …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-06-08 · Tianyu Ruan, Fengzhuo Zhang, Shuche Wang, Shihua Zhang
General AI
Muon has recently emerged as a state-of-the-art optimizer for pretraining Large Language Models (LLMs) and vision classifiers. Despite its efficiency advantage over Adam and SGD, the feature-learning advantage of Muon remains unclear. This paper investigates Muon's feature-learning advantage through the lens of robustn…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-06-08 · Jiarui Yao, Xiangxin Zhou, Penghui Qi, Wee Sun Lee, Liefeng Bo, Tianyu Pang
General AI
Reinforcement learning (RL) has become a key component of post-training large language models (LLMs). In practice, LLM RL is often off-policy because of training-inference mismatch and policy staleness, making trust-region control essential for stable optimization. Mainstream methods such as PPO and GRPO approximate th…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-06-08 · Wendy K. Tam
General AI
The ambition behind alignment training is to make large language models safe and useful. The primary mechanism, reinforcement learning from human feedback (RLHF), shapes the behavior of deployed language models by aligning them with ``human values.'' Yet the process is opaque. What values are being encoded; whose value…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-06-09 · Junke Wang, Xiao Wang, Jiacheng Pan, Xuefeng Hu, Feng Li, Jingxiang Sun, Chaorui Deng, Zilong Chen, Yunpeng Chen, Kaibin Tian, Matthew Gwilliam, Hao Chen, Danhui Guan, Kun Xu, Weilin Huang, Zuxuan Wu, Haoqi Fan, Yu-Gang Jiang, Zhenheng Yang
General AI
This paper introduces ARM, a discrete representation-based AutoRegressive Model that unifies image understanding, generation, and editing within a next-token prediction framework. ARM is built on three efforts: first, we train a discrete semantic visual tokenizer that maps images into compact token sequences. Our token…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-06-09 · Andrew Kang, Priya Narasimhan
General AI
We recast pass evaluation in football (soccer) as a Monte Carlo Tree Search (MCTS)-like evaluation problem whose components mostly exist in the literature under different names: a value model (possession value), a world model (multi-agent trajectories with ball interactions), and a policy over counterfactual actions (s…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-06-09 · Ilay Kamai, Hugues Van Assel, Aviv Regev, Hagai B. Perets, Randall Balestriero
General AI
Cross-modal alignment (CA) and cross-modal prediction (CP) are the dominant paradigms for multimodal representation learning, yet there is no systematic understanding of when each succeeds, when each fails, and when cross-modal training helps at all -- a gap that leaves practitioners, especially in scientific domains l…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-06-11 · Amy Xin, Jiening Siow, Junjie Wang, Zijun Yao, Fanjin Zhang, Jian Song, Lei Hou, Juanzi Li
General AI
LLM-based agents have shown increasing potential in automating scientific discovery. Given an optimizable metric and an execution environment, they can propose, validate, and iterate scientific solutions, and have produced results that outperform human-designed approaches. As model capabilities continue to improve, we …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-06-12 · Huong Binh Vu
General AI
Rapid post-event landslide mapping is essential for disaster response but remains difficult to automate due to extreme class imbalance. This study evaluates whether Clay v1.5, a Geospatial Foundation Model (GFM), can improve pixel-level landslide segmentation on the Landslide4Sense (L4S) benchmark, which contains 3,799…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-06-13 · Daksh Mittal, Tommaso Castellani, Thomson Yen, Naimeng Ye, Fangyu Wu, Minghui Chen, Tiffany Cai, Emmanouil Koukoumidis, William Zeng, Hongseok Namkoong
General AI
We envision continually learning agentic systems that become more useful over time: as they encounter sequences of related tasks, they should infer the hidden structure shared across those tasks and use it to improve future decisions. This cross-task experiential learning capability is pivotal in domains such as person…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-06-14 · Xiongjun Guan, Jianjiang Feng, Jie Zhou
Research Track A · General AI
Small-area fingerprint sensing on mobile devices creates a fundamental mismatch between acquisition and recognition: each touch captures only a tiny, pose-varying local patch, while reliable biometric matching ultimately requires a stable and sufficiently complete fingerprint representation. Existing pipelines largely …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-06-15 · Violet Xiang, Amrith Setlur, Chase Blagden, Nick Haber, Aviral Kumar
General AI
Sparse reward reinforcement learning (RL) has become a standard tool for improving LLM reasoning, but its success depends critically on the coverage present in the base model. In practice, models are often primed for RL through \emph{mid-training} on curated reasoning traces that teach useful primitive skills such as d…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-06-16 · Sajad Movahedi, Vera Milovanović, Shlomo Libo Feigin, Alexander Theus, Thomas Hofmann, Valentina Boeva, T. Konstantin Rusch, Antonio Orvieto
General AI
Looped architectures provide an inductive bias toward learning step-by-step procedures for tasks that require compositional reasoning. The number of effective layers reached by looping determines the quality of the solution these models find. Like deep architectures, looped architectures are prone to a signal propagati…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-06-16 · Tianyu Liu, Ziqing Wang, Zhaokang Liang, Tong Ding, Peter Humphrey, Lorraine Colón-Cartagena, Emily Ling-Lin Pai, Kenneth Tou En Chang, Mohamed Kahila, Jonathan Chong Kai Liew, Tinglin Huang, Rex Ying, Kaize Ding, Faisal Mahmood, Wengong Jin
General AI
Predicting immune biomarkers associated with the tumor immune microenvironment (TIME) is critical for advancing precision oncology, yet existing approaches are largely limited to single image modalities and suffer from insufficient resolution and incomplete utilization of complementary clinical and biological informati…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-06-17 · Po-Han Cheng, Chia-Mu Yu, Ying-Dar Lin, Yu-Sung Wu, Wei-Bin Lee
General AI
Code large language models increasingly retrieve external code context from repositories, documentation, issue threads, and coding-agent environments, creating an indirect prompt-injection surface where attackers hide instructions in comments, strings, identifiers, or decoy code. We propose CodeSentinel, a three-layer …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.2
2026-06-23 · Tian Zheng, Kai-Tai Hsu
General AI
Agentic data analysis systems produce rich outputs, including code, numerical results, and verbal diagnostics. This makes them more challenging to evaluate than single-turn LLM responses. It is therefore necessary to distinguish genuine disagreement between an agent's output and a ground-truth answer from grading artif…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.2
2026-06-23 · Marvin Rüdt, Hao Pang, Constantin Enke, Zäzilia Seibold, Kai Furmans
General AI
Autonomous mobile robots operating in intralogistics environments rely on geometric maps for localization and navigation, but lack semantic understanding of objects and their contextual properties. We present a contextual semantic mapping pipeline that combines SLAM-based geometric mapping, SAM-based instance segmentat…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.2
2026-06-24 · Fariba Tohidinezhad, Douwe J. Spaanderman, Natalia Oviedo Acosta, Kaouther Mouheb, Karthik Prathaban, David F. Hanff, Dirk J. Grünhagen, Cornelis Verhoef, Joris M. van Sabben, Evelyne Roets, Jette J. Slettenhaar, Hans Gelderblom, Ingrid M. E. Desar, Anna K. L. Reyners, Neeltje Steeghs, Stefan Klein, Martijn P. A. Starmans
General AI
Background: Response to neoadjuvant imatinib in gastrointestinal stromal tumors (GISTs) is highly variable and cannot be reliably predicted using current clinical or molecular markers. This study developed and evaluated an explainable multimodal deep learning framework integrating computed tomography (CT) imaging and c…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.2
2026-06-24 · Giulian Biolo, Michael Tezza, Yuanjun Gong, Fabio Massacci
General AI
Software vulnerability remediation is a cognitively demanding task that requires specialized security expertise often lacking in general developers. In the meantime, Large Language Models (LLMs) assisted tools show potential in vulnerability detection, location, and repair tasks. [Hypothesis:] While LLM-assistance is h…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.2
2026-06-24 · JoungBin Lee, Jaewoo Jung, Jongmin Lee, Tongmin Kim, Hyunsung Kim, Takuya Narihira, Kazumi Fukuda, Jahyeok Koo, Jisang Han, Yuki Mitsufuji, Seungryong Kim
General AI
Synthesizing a novel-view video from a monocular reference video along a target camera trajectory requires both geometric consistency and motion fidelity with respect to the reference video. Existing methods based on explicit 3D representations are limited by the accuracy of off-the-shelf reconstruction modules, which …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.2
2026-06-24 · Alexandre Bouayad
General AI
Large language models (LLMs) attain remarkable surface fluency on code, yet they neither formally guarantee the syntactic validity of their output nor leverage the hierarchical structure defining the target language. While existing constrained-decoding frameworks address the former, they operate under rigid assumptions…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.0
2026-02-27 · Abisheka Pitumpe, Amir Rahmati
Research Track B · General AI
Job-based smishing scams, where victims are recruited under the guise of remote job opportunities, represent a rapidly growing and understudied threat within the broader landscape of online fraud. In this paper, we present Anansi, the first scalable, end-to-end measurement pipeline designed to systematically engage wit…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.0
2026-03-31 · Shanxian Lin, Yuichi Nagata, Haichuan Yang
Research Track A
Metaheuristic algorithms such as Particle Swarm Optimization (PSO) and Evolutionary Algorithms (EA) excel at exploring solution spaces but lack mechanisms to accumulate and reuse procedural knowledge from successful search trajectories. This paper proposes Associative Constructive Evolution (ACE), a framework that enha…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 8.0
2026-04-05 · Leonardo Bitzki, Diego Kreutz, Tiago Heinrich, Douglas Fideles, Leandro Bertholdo, Silvio Quincozes, Angelo Diniz
Research Track A
Cybersecurity research increasingly depends on reproducible evidence, such as traffic traces, logs, and labeled datasets, yet most public datasets remain static and offer limited support for controlled re-execution and traceability, especially in heterogeneous multi-protocol environments. This paper presents NetSecBed,…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.0
2026-04-12 · Yuan Sun, Hong Yi, Jinyuan Liu
Research Track A
Personalized learning systems are almost universally designed around a single objective: help people acquire knowledge and skills more efficiently. We argue this framing misses the more consequential problem. The most damaging failures in human life-financial ruin, health collapse, professional obsolescence-are rarely …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.0
2026-04-15 · Mohammad Nooraiepour, Zezhang Song, Wei Li, Sarah Perez
Research Track A
Accurate methane sorption prediction across heterogeneous coal ranks requires models that combine thermodynamic consistency, efficient knowledge transfer across data-scarce geological systems, and calibrated uncertainty estimates, capabilities that are rarely addressed together in existing frameworks. We present a phys…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.0
2026-04-27 · Phung Gia Huy, Hai An Vu, Minh-Phuc Truong, Thang Duc Tran, Linh Ngo Van, Thanh Hong Nguyen, Trung Le
Research Track A · General AI
Representation learning is fundamental to NLP, but building embeddings that work well at different computational budgets is challenging. Matryoshka Representation Learning (MRL) offers a flexible inference paradigm through nested embeddings; however, learning such structures requires explicit coordination of how inform…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 8.0
2026-05-04 · Yangfu Li, Yuning Gong, Hongjian Zhan, Teng Li, Yuanhuiyi Lyu, Tianyi Chen, Qi Liu, Ziyuan Huang, Zhihang Zhong, Dandan Zheng, Yue Lu
General AI
Despite the success of Large-Vision Language Models (LVLMs), general optimization objectives (e.g., standard MLE) fail to constrain visual trajectories, leading to language bias and hallucination. To mitigate this, current methods introduce geometric priors from visual experts as additional supervision. However, we obs…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.0
2026-05-18 · Woongyeng Yeo, Yumin Choi, Taekyung Ki, Sung Ju Hwang
General AI
Training long-horizon LLM agents with reinforcement learning is challenging because sparse outcome rewards reveal whether a task succeeds, but not which intermediate actions caused the outcome or how they should be corrected. Recent methods alleviate this issue by generating rewards or textual hints from turn-level act…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.0
2026-05-19 · Prashant Pandey, Himanshu Kumar, Devineni Sri Venkatraya Chowdary, Brejesh Lall
Research Track A
Evolving data streams induce joint nonstationarity in continual semantic segmentation, where semantic classes, input distributions, and supervision availability change simultaneously over time. This setting reflects practical structured prediction systems, yet remains largely unexplored in prior continual learning work…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.0
2026-05-19 · Juncheng Wu, Hardy Chen, Haoqin Tu, Xianfeng Tang, Freda Shi, Hui Liu, Hanqing Lu, Cihang Xie, Yuyin Zhou
General AI
Recent advances in vision-language models (VLMs) emphasize long chain-of-thought reasoning; yet, we find that their performance on visual tasks is primarily limited by a lack of visual perception as opposed to reasoning itself. In this work, we systematically study the interplay between perception and reasoning in VLM …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.0
2026-05-26 · Dongyoon Hahm, Dylan Hadfield-Menell, Kimin Lee
General AI
Reinforcement Learning from Human Feedback (RLHF) is the standard method to align Large Language Models (LLMs) with human preferences. In this work, we introduce alignment tampering, a potential vulnerability where the LLM undergoing alignment influences the preference dataset, causing RLHF to amplify undesired behavio…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.0
2026-05-29 · Jian Mu, Tianyi Lin, Chengwei Qin, Zhongxiang Dai, Yao Shu
General AI
Large language models are increasingly deployed in multi-turn interactive settings where users or environments can iteratively provide lightweight feedback. Unfortunately, optimizing such behavior presents a sharp dilemma in practice: online reinforcement learning is able to effectively address multi-turn dynamics but …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.0
2026-05-29 · Zhenhao Yang, Xiaoshi Wu, Zhengyao Lv, Xiaoyu Shi, Xintao Wang, Pengfei Wan, Kun Gai, Kwan-Yee K. Wong
General AI
Recent advances in video generative models have promoted rapid progress in controllable world models. However, maintaining fine-grained spatio-temporal consistency under long-horizon reasoning remains a key challenge. In this work, we move beyond explicit 3D memory and coarse frame-level implicit modeling, and propose …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.0
2026-06-15 · Hyungmin Kim, Minsoo Kim, Hongseok Kim, Jungwook Choi
Research Track A · General AI
Multi-turn LLM serving accumulates dialogue history whose Key-Value (KV) cache grows with every turn and every user, quickly exceeding the model weights themselves and making memory -- not compute -- the binding constraint on throughput. Non-uniform KV compression, which allocates heterogeneous budgets across attention…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-04-28 · Shuxiang Cao, Zijian Zhang, Abhishek Agarwal, Grace Bratrud, Niyaz R. Beysengulov, Daniel C. Cole, Alejandro Gómez Frieiro, Elena O. Glen, Hao Hsu, Gang Huang, Raymond Jow, Greshma Shaji, Tom Lubowe, Ligeng Zhu, Luis Mantilla Calderón, Nicola Pancotti, Joel Pendleton, Brandon Severin, Charles Etienne Staub, Sara Sussman, Antti Vepsäläinen, Neel Rajeshbhai Vora, Yilun Xu, Varinia Bernales, Daniel Bowring, Elica Kyoseva, Ivan Rungger, Giulia Semeghini, Sam Stanwyck, Timothy Costa, Alán Aspuru-Guzik, Krysta Svore
Research Track A · General AI
Quantum computing calibration depends on interpreting experimental data, and calibration plots provide the most universal human-readable representation for this task, yet no systematic evaluation exists of how well vision-language models (VLMs) interpret them. We introduce QCalEval, the first VLM benchmark for quantum …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-05-04 · Lingxiao Kong, Cong Yang, Oya Deniz Beyan, Zeyd Boukhers
General AI
Despite significant advances in Reinforcement Learning (RL), model performance remains highly sensitive to algorithm and hyperparameter configurations, while generalization gaps across environments complicate real-world deployment. Although prior work has studied RL generalization, the relative contribution of specific…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-05-04 · Vicente Pelechanoa, Antoni Mestre, Manoli Albert, Miriam Gil
General AI
Deciding how to distribute work between humans and AI systems is a central challenge in organisational design. Most approaches treat this as a binary choice, yet the operational reality is richer: humans and AI routinely share tasks or take complementary roles depending on context, fatigue, and the stakes involved. Gov…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-05-04 · Haoquan Fang, Jiafei Duan, Donovan Clay, Sam Wang, Shuo Liu, Weikai Huang, Xiang Fan, Wei-Chuan Tsai, Shirui Chen, Yi Ru Wang, Shanli Xing, Jaemin Cho, Jae Sung Park, Ainaz Eftekhar, Peter Sushko, Karen Farley, Angad Wadhwa, Cole Harrison, Winson Han, Ying-Chun Lee, Eli VanderBilt, Rose Hendrix, Suveen Ellawela, Lucas Ngoo, Joyce Chai, Zhongzheng Ren, Ali Farhadi, Dieter Fox, Ranjay Krishna
General AI
Vision-Language-Action (VLA) models aim to provide a single generalist controller for robots, but today's systems fall short on the criteria that matter for real-world deployment. Frontier models are closed, open-weight alternatives are tied to expensive hardware, reasoning-augmented policies pay prohibitive latency fo…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-05-04 · Fatih Aksu, Laura Ciuffetti, Francesco Di Feola, Filippo Ruffini, Giulia Romoli, Fabrizia Gelardi, Arturo Chiti, Valerio Guarrasi, Paolo Soda
General AI
Accurate histological differentiation between adenocarcinoma (ADC) and squamous cell carcinoma (SCC) is critical for personalized treatment in non-small cell lung cancer (NSCLC). While [$^{18}$F]FDG PET/CT is a standard tool for the clinical evaluation of lung cancer, its utility is often limited by high costs and radi…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-05-07 · Ryan Wang, Akshita Bhagia, Sewon Min
General AI
Large language models are typically deployed as monolithic systems, requiring the full model even when applications need only a narrow subset of capabilities, e.g., code, math, or domain-specific knowledge. Mixture-of-Experts (MoEs) seemingly offer a potential alternative by activating only a subset of experts per inpu…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-05-07 · Yixuan Wang, Dan Guralnik, Warren Dixon
General AI
Safety-critical autonomy in adversarial settings demands more than Lyapunov stability of tracking error signals. An agent executing a goal-directed trajectory is intrinsically legible to a passive observer running online Bayesian inference, because the contractive dynamics of any Lyapunov basin of attraction concentrat…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-05-07 · Amir Ivry
General AI
Large audio language models (LALMs) are increasingly used to reason over long audio clips, yet deployment often compresses audio before inference to reduce memory and latency. The risk is that compression can leave aggregate accuracy acceptable while sharply degrading answers for a deployment-critical query family. We …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-05-12 · Bokang Yang, Xinyi Sun, Kaituo Feng, Xingping Dong, Dongming Wu, Xiangyu Yue
General AI
Visual perception connects high-level semantic understanding to pixel-level perception, but most existing settings assume that the decisive evidence for identifying a target is already in the image or frozen model knowledge. We study a more practical yet harder open-world case where a visible object must first be resol…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-05-12 · Gunjan, Sidahmed Benabderrahmane, Talal Rahwan
General AI
Large Language Models (LLMs) can generate fluent political text at scale, raising concerns about synthetic discourse during crises and social conflict. Existing AI-text detection often focuses on sentence-level cues such as perplexity, burstiness, or token irregularities, but these signals may weaken as generative syst…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-05-22 · S M Mehedi Zaman, Kiran Garimella
General AI
Hundreds of millions of users now hold detailed, multi-turn conversations with ChatGPT and similar LLM assistants. We measure two privacy-relevant features of these conversations on a corpus of complete ChatGPT histories donated by over 1,000 users in four Global South countries (Brazil, India, Nigeria, Pakistan). Firs…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-05-28 · David Lindner, Victoria Krakovna, Sebastian Farquhar
General AI
We introduce Gram, an automated alignment auditing framework to assess the propensity of AI agents to engage in sabotage. We evaluate Gemini models across 17 simulated agentic deployment scenarios that incentivize sabotage. We find Gemini models misbehave in about 2-3% of our simulated trajectories. Many of these cases…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-05-28 · Yan Chen, Taojie Zhu, Meng Zhang, Xin Chen, Jiaqi Huang, Dongyang Xu, Yizhi Wang
Research Track A · General AI
Continual supervised fine-tuning (SFT) is the de facto recipe for adapting large language models (LLMs) to a stream of downstream tasks, but it suffers from catastrophic forgetting of earlier capabilities. Recent work shows that on-policy signals -- training on the model's own outputs -- reduce forgetting more reliably…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-05-29 · William Hohnen-Ford, Sarah Chen, Kathryn B. Francis, Madeline G. Reinecke, Ilina Singh, David Lyreskog
General AI
Radical Moral Disagreements (RMDs) are highly polarising topics that are increasingly censored in everyday life, with growing evidence suggesting that this polarisation carries measurable costs to public mental health. To address these challenges, some researchers have proposed Large Language Models (LLMs) as a means t…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-05-29 · Nurjahan Sultana, Moi Hoon Yap, Xinqi Fan, Wenqi Lu
General AI
Models for AI-based skin cancer screening suffer a severe performance drop when shifting from expert dermoscopic (source) images to consumer-grade clinical (target) images, hindering real-world deployment. Existing domain adaptation methods often ignore crucial semantic invariants, such as clinical concepts. While new …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-05-29 · Sindhu B Hegde, K R Prajwal, Andrew Zisserman
General AI
While humans naturally gesture during speech, only a sparse subset of these movements are visually depictive and semantically linked to specific spoken words. Current multimodal models struggle to capture these semantic co-speech gestures, heavily bottlenecked by a lack of precisely annotated training data. To address …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-05-29 · Yuqing Wang, Zhijie Lin, Ceyuan Yang, Yang Zhao, Fei Xiao, Hao He, Qi Zhao, Zihan Ding, Fuyun Wang, Shuai Wang, Youliang Zhang, Haoqi Fan, Xihui Liu
General AI
Unified multimodal models (UMMs) aim to handle perception and generation in a single model. Yet existing UMMs still rely on a frozen, separately pretrained VAE for image generation, imposing a structural bottleneck. Naively removing it introduces a quality gap, as the model must learn both high-level structure and low-…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-05-29 · Yibin Zhao, Fangxin Shang, Dingrui Yang, Yuqi Wang
General AI
Table question answering requires models to recover semantic relations encoded implicitly by two-dimensional layout, merged cells, and hierarchical headers. Current pipelines typically use HTML or Markdown as intermediate table representations, but these layout-oriented serializations introduce markup overhead and requ…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-05-29 · Ruotong Liao, Guowen Huang, Qing Cheng, Guangyao Zhai, Lei Zhang, Xun Xiao, Thomas Seidl, Daniel Cremers, Volker Tresp
General AI
Text-to-video (T2V) generation faces challenging questions when generating videos with long horizons containing multiple events. Inspired by the intrinsics of the diffusion process, we probe video diffusion transformers (DiTs) and uncover intrinsic turning points in the DiT denoising trajectory where conditioning text …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-06-29 · Subramanyam Sahoo, Aman Chadha, Vinija Jain, Divya Chaudhary
General AI
Conservative offline training is widely advocated as a safe foundation for subsequent online adaptation: if a policy stays close to well-supported behaviour, the argument goes, it is less likely to exploit imperfections in a learned reward model. We challenge this intuition empirically and mechanistically. We train a Q…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-06-29 · Seunghun Baek, Jihwan Park, Jaeyoon Sim, Minjae Jeong, Hoseok Lee, Won Hwa Kim
General AI
As real-world prediction systems often face missing modalities at inference, incomplete multimodal learning (IML) remains a practical challenge. While prior methods aim to learn representations robust to missing inputs, representations from incomplete modalities inevitably deviate from their full-modality counterparts …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.6
2026-07-02 · Timo Bertram, Sidhant Bhavnani, Richard Freinschlag, Erich Kobler, Andreas Mayr, Günter Klambauer
General AI
In this work, we focus on SE-RRMs, a symbol-equivariant instantiation of RRMs that exhibits improved extrapolation to larger problem sizes. We propose a neuro-symbolic approach, ``Guiding with Recurrent Reasoning Models'' (G-RRM), which integrates SE-RRMs with symbolic solvers for constraint satisfaction problems. SE-R…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.6
2026-07-02 · Federico Lincetto, Gianluca Agresti, Mattia Rossi, Piergiorgio Sartor, Pietro Zanuttigh
General AI
Neural rendering techniques allow for accurate reconstruction of the geometry and color appearance of 3D scenes. Some methods have extended their use to additional imaging modalities, such as multispectral, infrared, or polarimetric data. However, all of these approaches require expensive sensors and calibrated setups …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-03-27 · Shihua Zhang, Qiuhong Shen, Shizun Wang, Tianbo Pan, Xinchao Wang
General AI
Empowered by large-scale training, vision-language models (VLMs) achieve strong image and video understanding, yet their ability to perform spatial reasoning in both static scenes and dynamic videos remains limited. Recent advances try to handle this limitation by injecting geometry tokens from pretrained 3D foundation…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-03-30 · Bharath Krishnamurthy, Ajita Rattani
General AI
Recent multimodal face generation models address the spatial control limitations of text-to-image diffusion models by augmenting text-based conditioning with spatial priors such as segmentation masks, sketches, or edge maps. This multimodal fusion enables controllable synthesis aligned with both high-level semantic int…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-04-02 · Esakkivel Esakkiraja, Sai Rajeswar, Denis Akhiyarov, Rajagopal Venkatesaramani
General AI
We consider the question: when a large language reasoning model makes a choice, did it think first and then decide to, or decide first and then think? In this paper, we present evidence that detectable, early-encoded decisions shape chain-of-thought in reasoning models. Specifically, we show that a simple linear probe …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-04-05 · Xudong Lu, Yang Bo, Jinpeng Chen, Shuhan Li, Xintong Guo, Huankang Guan, Fang Liu, Dunyuan Xu, Peiwen Sun, Heyang Sun, Rui Liu, Hongsheng Li
General AI
Video Large Language Models (VideoLLMs) have achieved strong performance on many video understanding tasks, but most existing systems remain offline and are not well-suited for live video streams that require continuous observation and timely response. Recent streaming VideoLLMs have made progress, yet current approach…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-04-06 · Zhengqing Yuan, Hanchi Sun, Lichao Sun, Yanfang Ye
General AI
We present MegaTrain, a memory-centric system that efficiently trains 100B+ parameter large language models at full precision on a single GPU. Unlike traditional GPU-centric systems, MegaTrain stores parameters and optimizer states in host memory (CPU memory) and treats GPUs as transient compute engines. For each layer…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-04-08 · Qihan Ren, Peng Wang, Ruikun Cai, Shuai Shao, Dadi Guo, Yuejin Xie, Yafu Li, Quanshi Zhang, Xia Hu, Jing Shao, Dongrui Liu
General AI
A prevailing narrative in LLM post-training holds that supervised finetuning (SFT) memorizes while reinforcement learning (RL) generalizes. We revisit this claim for reasoning SFT with long chain-of-thought (CoT) supervision and find that cross-domain generalization is not absent but conditional, jointly shaped by opti…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-04-09 · Luozheng Qin, Jia Gong, Qian Qiao, Tianjiao Li, Li Xu, Haoyu Pan, Chao Qu, Zhiyu Tan, Hao Li
General AI
Unified multimodal models integrating visual understanding and generation face a fundamental challenge: visual generation incurs substantially higher computational costs than understanding, particularly for video. This imbalance motivates us to invert the conventional paradigm: rather than extending understanding-centr…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-04-13 · Sreyan Ghosh, Arushi Goel, Kaousheik Jayakumar, Lasha Koroshinadze, Nishit Anand, Zhifeng Kong, Siddharth Gururani, Sang-gil Lee, Jaehyeon Kim, Aya Aljafari, Chao-Han Huck Yang, Sungwon Kim, Ramani Duraiswami, Dinesh Manocha, Mohammad Shoeybi, Bryan Catanzaro, Ming-Yu Liu, Wei Ping
General AI
We present Audio Flamingo Next (AF-Next), the next-generation and most capable large audio-language model in the Audio Flamingo series, designed to advance understanding and reasoning over speech, environmental sounds and music. Compared to Audio Flamingo 3, AF-Next introduces: (i) a stronger foundational audio-languag…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-04-14 · Sha Sajadieh, Loredana Fattorini, Raymond Perrault, Yolanda Gil, Vanessa Parli, Lapo Santarlasci, Juan Pava, Nestor Maslej, Russ Altman, Erik Brynjolfsson, Carla Brodley, Jack Clark, Virginia Dignum, Vipin Kumar, James Landay, Terah Lyons, James Manyika, Juan Carlos Niebles, Yoav Shoham, Elham Tabassi, Russell Wald, Toby Walsh, Dan Weld
General AI
Welcome to the ninth edition of the AI Index report. As AI continues to advance rapidly, the question becomes whether the systems built around it can keep up. Governance frameworks, evaluation methods, education systems, and the data infrastructure needed to track AI's impact are struggling to match the pace of the tec…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.5
2026-04-14 · Ramy E. Ali, Federico Penna
Research Track A
Deploying machine learning (ML) algorithms on mobile phones is bottlenecked by performance degradation under dynamic, real-world conditions that differ from the offline training conditions. While continual learning and adaptation are essential to mitigate this distributional shift, conventional online learning methods …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-04-15 · Wangjie Gan, Miao Pan, Linbo Xi, Wenqi Zhang, Jintao Chen, Jianwei Yin, Xuhong Zhang
General AI
Large language models are typically post-trained using supervised fine-tuning (SFT) and reinforcement learning (RL), yet effectively unifying efficient knowledge injection with robust generalization remains challenging. In this work, we provide a training-dynamics analysis showing that SFT can be interpreted as a speci…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-04-16 · Yixu Huang, Tinghui Zhu, Muhao Chen
General AI
Visual reasoning models (VRMs) have recently shown strong cross-modal reasoning capabilities by integrating visual perception with language reasoning. However, they often suffer from overthinking, producing unnecessarily long reasoning chains for any tasks. We attribute this issue to Reasoning Path Redundancy in visual…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-04-16 · Haoyi Sun, Xiaoxiao Wang, Ning Mao, Qian Wang, Lifu Mu, Wen Zheng, Tao Wei, Wei Chen
General AI
Vision-Language Models (VLMs) have shown remarkable capabilities in joint vision-language understanding, but their large scale poses significant challenges for deployment in resource-constrained scenarios. Knowledge Distillation (KD) offers a viable way to improve model capabilities without increasing model size or dat…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-04-17 · Qwen Team
General AI
In this work, we present Qwen3.5-Omni, the latest advancement in the Qwen-Omni model family. Representing a significant evolution over its predecessor, Qwen3.5-Omni scales to hundreds of billions of parameters and supports a 256k context length. By leveraging a massive dataset comprising heterogeneous text-vision pairs…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-04-19 · Yuezhou Hu, Jintao Zhang
General AI
Autoregressive video diffusion is emerging as a promising paradigm for streaming video synthesis, with step distillation serving as the primary means of accelerating inference. Whether speculative decoding, the dominant acceleration strategy for large language models, can be effectively adapted to autoregressive video …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-04-20 · Rongyuan Tan, Jue Zhang, Zhuozhao Li, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang
General AI
Interpretability tools are increasingly used to analyze failures of Large Language Models (LLMs), yet prior work largely focuses on short prompts or toy settings, leaving their behavior on commonly used benchmarks underexplored. To address this gap, we study contrastive, LRP-based attribution as a practical tool for an…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-04-21 · Chaitanya Dwivedi, Binxuan Huang, Himanshu Gupta, Pratik Jayarao, Neeraj Varshney, Bing Yin
General AI
Mixture-of-Experts (MoE) has become the dominant architecture for scaling large language models: frontier models routinely decouple total parameters from per-token computation through sparse expert routing. Scaling laws show that under fixed active computation, model quality scales predictably with total parameters, an…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-04-21 · Ying Zeng, Miaosen Luo, Guangyuan Li, Yang Yang, Ruiyang Fan, Linxiao Shi, Qirui Yang, Jian Zhang, Chengcheng Liu, Siming Zheng, Jinwei Chen, Bo Li, Peng-Tao Jiang
General AI
Traditional photographic image editing typically requires users to possess sufficient aesthetic understanding to provide appropriate instructions for adjusting image quality and camera parameters. However, this paradigm relies on explicit human instruction of aesthetic intent, which is often ambiguous, incomplete, or i…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-04-24 · Harshit Joshi, Priyank Shethia, Jadelynn Dao, Monica S. Lam
General AI
Real-world document question answering is challenging. Analysts must synthesize evidence across multiple documents and different parts of each document. However, any fixed LLM context window can be exceeded as document collections grow. A common workaround is to decompose documents into chunks and assemble answers from…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.5
2026-04-24 · Hillary Mutisya, John Mugane
Research Track A · General AI
We investigate whether neural models trained exclusively on modern morphological data can recover cross-lingual lexical structure consistent with historical reconstruction. Using BantuMorph v7, a transformer over Bantu morphological paradigms, we analyze 14 Eastern and Southern Bantu languages, extract encoder embeddin…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-04-29 · Zhen Zhang, Changyi Yang, Zijie Xia, Zhen Yang, Chengzhi Liu, Zhaotiao Weng, Yepeng Liu, Haobo Chen, Jin Pan, Chenyang Zhao, Yuheng Bu, Alkesh Patel, Zhe Gan, Xin Eric Wang
General AI
Token serves as the fundamental unit of computation in modern autoregressive models, and generation length directly influences both inference cost and reasoning performance. Despite its importance, existing approaches lack fine-grained length modeling, operating primarily at the coarse-grained sequence level. We introd…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-05-01 · Minghui Chen, Chenxu Yang, Hengjie Zhu, Dayan Wu, Zheng Lin, Qingyi Si
General AI
Large Vision-Language Models (LVLMs) often suffer from hallucinations, generating descriptions that include visual details absent from the input image. Recent preference alignment methods typically rely on supervision distilled from stronger models such as GPT. However, this offline paradigm introduces a Supervision-Pe…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-05-01 · Houyuan Chen, Hong Li, Xianghao Kong, Tianrui Zhu, Shaocong Xu, Weiqing Xiao, Yuwei Guo, Chongjie Ye, Lvmin Zhang, Hao Zhao, Anyi Rao
General AI
Recent progress has shown that video diffusion models (VDMs) can be repurposed for diverse multimodal graphics tasks. However, existing methods often train separate models for each problem setting, which fixes the input-output mapping and limits the modeling of correlations across modalities. We present UniVidX, a unif…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-05-06 · Yu-Cheng Lin, Yu-Chao Hsu, I-Shan Tsai, Chun-Hua Lin, Kuo-Chung Peng, Jiun-Cheng Jiang, Yun-Yuan Wang, Tzung-Chi Huang, Tai-Yue Li, Kuan-Cheng Chen, Samuel Yen-Chi Chen, Nan-Yow Chen
General AI
High-performance computing (HPC) is increasingly important for scalable quantum chemistry workflows that couple classical generative models, quantum circuit simulation, and selected configuration interaction postprocessing. We present the generative quantum-inspired Kolmogorov-Arnold eigensolver (GQKAE), a parameter-ef…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-05-26 · Yonghoon Dong, Kyungmin Lee, Changyeon Kim, Jaehyuk Kim, Jinwoo Shin
General AI
Off-policy reinforcement learning of pretrained flow policies remains challenging due to the instability of optimization arising from the multi-step sampling process. Recently, Q-learning with Adjoint Matching (QAM) addressed this issue by reformulating into a memoryless stochastic optimal control (SOC) problem with a …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-06-01 · Ziwen Li, Jianing Wen, Tianshi Li
General AI
Agentic LLMs with web search change the threat model for text anonymization: weak contextual cues can become cross-referenceable evidence for re-identification, yet those same details also carry downstream analytic value of the text. Existing defenses either remove explicit identifiers, perturb text for formal privacy,…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-06-02 · Sanket Badhe, Deep Shah
General AI
Advanced reasoning typically requires Chain-of-Thought prompting, which is accurate but incurs prohibitive latency and substantial test-time inference costs. The standard alternative, fine-tuning smaller models, often sacrifices interpretability while introducing significant resource and operational overhead. To addres…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-06-04 · Shaoyang Xu, Jingshen Zhang, Long P. Hoang, Jinyuan Li, Wenxuan Zhang
General AI
Multicultural multi-agent systems are increasingly deployed in globally diverse settings, where different agents are grounded in different cultural backgrounds. Existing cultural evaluation focuses on value alignment: how closely a single agent matches a target culture. Yet alignment is a per-agent property and cannot …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-06-04 · Gianluca Barmina, Peter Schneider-Kamp, Lukas Galke Poech
General AI
Large language models can reproduce training data, but existing memorization evaluations mostly measure whether models can be forced to do so, rather than whether they do so under ordinary use. We introduce PropMe, a propensity-aware framework for memorization evaluation that contrasts prefix-based capability attacks w…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-06-04 · Wenbo Pan, Shujie Liu, Chin-Yew Lin, Jingying Zeng, Xianfeng Tang, Xiangyang Zhou, Yan Lu, Xiaohua Jia
General AI
AI agents rely on a harness of skills, tools, and workflows to solve complex problems. Continually improving this harness is essential for adapting to new tasks. However, existing optimization methods typically require ground-truth validation sets, yet such labeled data is difficult to acquire in practical deployment s…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-06-08 · Jiaqi Yan, Xiangyu Chen, Xinlin Zhong, Haibin Huang, Chi Zhang, Jie Liu, Jiantao Zhou, Xuelong Li
General AI
Real-time video restoration (VR) for live streams requires high-resolution outputs under strict per-frame latency constraints. Existing one-step diffusion-based VR models remain difficult to deploy on consumer-grade GPUs due to two main bottlenecks: quadratic spatial attention at high resolutions and the latency-memory…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.5
2026-06-10 · Stephen Kasica, Charles Berret, Tamara Munzner
Research Track A
Data journalists routinely integrate records across multiple independently published sources to support accountability reporting, yet no existing interactive wrangling tool treats the collection of tables -- rather than the single table -- as its primary unit of work. We present OpenRoundup, an open-source, browser-bas…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.5
2026-06-11 · Víctor Blanco, J. Fernando Camacho-Vallejo, Yolanda Hinojosa
Research Track A
Urban waste management faces increasing operational and environmental challenges driven by population growth, heterogeneous waste streams, traffic congestion, and the need for sustainable collection infrastructures. We present an integrated optimization framework for the design of multi-type urban waste collection and …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.5
2026-06-14 · Qing Su, Kaiyang Li, Yuan Zhuang, Fei Miao, Shihao Ji
Research Track A · General AI
While video segmentation has advanced rapidly on short clips and closed-set benchmarks, open-world video segmentation remains largely unexplored. The challenge is twofold: (1) existing methods are not designed to support object discovery and identity maintenance in long videos of dynamic ego-motion, and (2) existing ev…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-06-15 · Filip Sondej, Yushi Yang, Adam Mahdi
General AI
Making large language models (LLMs) deeply forget specific knowledge and values without sacrificing general capabilities remains a central challenge in unlearning. However, current methods are easily reversed by fine-tuning or few-shot prompting, suggesting their forgetting is only shallow. We identify the root cause. …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-06-16 · Haozhe Chen, Karthik Narasimhan, Zhuang Liu
General AI
Language model agents are becoming proficient executors at isolated, short-horizon tasks such as software engineering and customer service. Yet real-world challenges require a combination of sophisticated skills that remain largely untested in agents: (1) navigating long horizons amid uncertainty; (2) acquiring informa…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.5
2026-06-16 · Kairos Team, Fei Wang, Shan You, Qiming Zhang, Tao Huang, Zuoyi Fu, Zhisheng Zheng, Yunlong Xi, Feng Lv, Xiaoming Wu, Zeyu Liu, Cong Wan, Pu Li, Ruiqing Yang, Xiaoou Li, Wei Wang, Kangkang Zhu, Yuwei Zhang, Shi Fu, Zheng Zhang, Xiaoning Wu, Xuzeng Fan, Dacheng Tao, Xiaogang Wang
General AI
World models are transitioning from passive visual generators to foundational, operational infrastructure for Physical AI: they must natively acquire world knowledge from heterogeneous experience, maintain persistent states over long horizons, and execute efficiently within real deployment constraints. We introduce Kai…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.5
2026-06-16 · Xiongjun Guan, Jianjiang Feng, Jie Zhou
Research Track A
Fingerprint recognition is still dominated by task-specific pipelines, where enhancement, structural parsing, alignment, and matching are optimized in isolation. Although effective in narrow settings, this design limits representation reuse across sensors, qualities, and downstream applications. We therefore present Uo…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.5
2026-06-17 · Wenqi Jia, Zhewen Hu, Ying Huang, Yu Gong, Stavros Kalafatis, Yuke Wang, Wei Niu, Chengming Zhang, Ang Li, Sheng Di, Yuede Ji, Bo Fang, Miao Yin
Research Track A
3D Gaussian Splatting (3DGS) enables high-fidelity and real-time 3D scene reconstruction, but scaling training to large-scale scenes requires optimizing hundreds of millions of Gaussians across multiple GPUs. Existing distributed approaches either partition scenes into isolated regions, causing global inconsistency, or…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-03-26 · Xiaofeng Mao, Shaohao Rui, Kaining Ying, Bo Zheng, Chuanhao Li, Mingmin Chi, Kaipeng Zhang
General AI
Autoregressive video diffusion models have demonstrated remarkable progress, yet they remain bottlenecked by intractable linear KV-cache growth, temporal repetition, and compounding errors during long-video generation. To address these challenges, we present PackForcing, a unified framework that efficiently manages the…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-03-31 · Max Kaufmann, David Lindner, Roland S. Zimmermann, and Rohin Shah
General AI
Chain-of-Thought (CoT) monitoring, in which automated systems monitor the CoT of an LLM, is a promising approach for effectively overseeing AI systems. However, the extent to which a model's CoT helps us oversee the model - the monitorability of the CoT - can be affected by training, for instance by the model learning …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-03-31 · Nathan Heath
General AI
Myopic Optimization with Non-myopic Approval (MONA) mitigates multi-step reward hacking by restricting the agent's planning horizon while supplying far-sighted approval as a training signal~\cite{farquhar2025mona}. The original paper identifies a critical open question: how the method of constructing approval -- partic…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-03-31 · Gianluca Aguzzi, Davide Domini, Nicolas Farabegoli, Mirko Viroli
General AI
Aggregate programming is a field-based coordination paradigm with over a decade of exploration and successful applications across domains including sensor networks, robotics, and IoT, with implementations in various programming languages, such as Protelis, ScaFi (Scala), and FCPP (C++). A recent research direction inte…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-03-31 · Iain Swift, JingHua Ye, Ruairi O'Reilly
General AI
Multimodal deep learning for cancer prognosis is commonly assumed to benefit from synergistic cross-modal interactions, yet this assumption has not been directly tested in survival prediction settings. This work adapts InterSHAP, a Shapley interaction index-based metric, from classification to Cox proportional hazards …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-03-31 · Qiyuan Zhuang, He-Yang Xu, Yijun Wang, Xin-Yang Zhao, Yang-Yang Li, Xiu-Shen Wei
General AI
Understanding object affordances is essential for enabling robots to perform purposeful and fine-grained interactions in diverse and unstructured environments. However, existing approaches either rely on retrieval, which is fragile due to sparsity and coverage gaps, or on large-scale models, which frequently mislocaliz…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-03-31 · Ming-Hua Tsai, Phat Tran
General AI
This study investigates the use of NeuralUCB for cost-aware large language model (LLM) routing. Existing routing approaches can be broadly grouped into supervised routing methods and partial-feedback methods, each with different tradeoffs in efficiency and adaptivity. We implement a NeuralUCB-based routing policy and e…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-03-31 · Iain Swift, JingHua Ye
General AI
Multimodal deep learning has improved prognostic accuracy for brain tumours by integrating histopathology and genomic data, yet the contribution of volumetric MRI within unified survival frameworks remains unexplored. This pilot study extends a bimodal framework by incorporating Fluid Attenuated Inversion Recovery (FLA…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-02 · Merve Karakas, Osama Hanna, Lin F. Yang, Christina Fragouli
General AI
In this paper, we consider a multi-armed bandit (MAB) instance and study how to identify the best arm when arm commands are conveyed from a central learner to a distributed agent over a discrete memoryless channel (DMC). Depending on the agent capabilities, we provide communication schemes along with their analysis, wh…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-02 · Daiwei Chen, Zhoutong Fu, Chengming Jiang, Haichao Zhang, Ran Zhou, Tan Wang, Chunnan Yao, Guoyao Li, Rui Cai, Yihan Cao, Ruijie Jiang, Fedor Borisyuk, Jianqiang Shen, Jingwei Wu, Ramya Korlakai Vinayak
General AI
Language models (LMs) are increasingly extended with new learnable vocabulary tokens for domain-specific tasks, such as Semantic-ID tokens in generative recommendation. The standard practice initializes these new tokens as the mean of existing vocabulary embeddings, then relies on supervised fine-tuning to learn their …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-02 · Alex Costanzino, Pierluigi Zama Ramirez, Giuseppe Lisanti, Luigi Di Stefano
General AI
We present ModMap, a natively multiview and multimodal framework for 3D anomaly detection and segmentation. Unlike existing methods that process views independently, our method draws inspiration from the crossmodal feature mapping paradigm to learn to map features across both modalities and views, while explicitly mode…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-06 · Daron Acemoglu, Tianyi Lin, Asuman Ozdaglar, James Siderius
General AI
Artificial intelligence (AI) changes social learning when aggregated outputs become training data for future predictions. To study this, we extend the DeGroot model by introducing an AI aggregator that trains on population beliefs and feeds synthesized signals back to agents. We define the learning gap as the deviation…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-06 · Justin Curry, Alberto Speranzon
General AI
In this paper, we develop a stratification-based semantics for Signal Temporal Logic (STL) in which each atomic predicate is interpreted as a membership test in a stratified space. This perspective reveals a novel correspondence principle between stratification theory and STL, showing that most STL formulas can be view…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-07 · Jingwei Zuo, Xinze Feng, Zien Liu, Kaijian Wang, Fanjiang Ye, Ye Cao, Zhuang Wang, Yuke Wang
General AI
Low-Rank Adaptation (LoRA) is now the dominant method for parameter-efficient fine-tuning of large language models, but achieving a high-quality adapter often requires systematic hyperparameter tuning because LoRA performance is highly sensitive to configuration choices. In practice, this leads to many concurrent LoRA …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-07 · Juyeong Hwang, Seong-Eun Hong, Jinhyun Kim, JaeYoung Seon, Giljoo Nam, Hanyoung Jang, HyeongYeop Kang
General AI
Crowds do not merely move; they decide. Human navigation is inherently contextual: people interpret the meaning of space, social norms, and potential consequences before acting. Sidewalks invite walking, crosswalks invite crossing, and deviations are weighed against urgency and safety. Yet most crowd simulation methods…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-07 · Tianyi Liu, Yiming Li, Wenqian Wang, Jiaojiao Wang, Chen Cai, Yi Wang, Kim-Hui Yap
General AI
Robust multimodal visual analytics remains challenging when heterogeneous modalities provide complementary but input-dependent evidence for decision-making.Existing multimodal learning methods mainly rely on fixed fusion modules or predefined cross-modal interactions, which are often insufficient to adapt to changing m…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-09 · Kooktae Lee
General AI
This paper addresses the decentralized non-uniform area coverage problem for multi-agent systems, a critical task in missions with high spatial priority and resource constraints. While existing density-based methods often rely on computationally heavy Eulerian PDE solvers or heuristic planning, we propose Stochastic De…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-09 · Joungbin An, Agrim Jain, Kristen Grauman
General AI
Video temporal grounding (VTG) is typically tackled with dataset-specific models that transfer poorly across domains and query styles. Recent efforts to overcome this limitation have adapted large multimodal language models (MLLMs) to VTG, but their high compute cost and limited video context still hinder long-video gr…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-09 · Mohamed Amine Kerkouri, Marouane Tliba, Bin Wang, Aladine Chetouani, Ulas Bagci, Alessandro Bruno
General AI
Scanpath similarity metrics are central to eye-movement research, yet existing methods predominantly evaluate spatial and temporal alignment while neglecting semantic equivalence between attended image regions. We present a semantic scanpath similarity framework that integrates vision-language models (VLMs) into eye-tr…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-10 · Gyuwon Park, DongIl Shin, SolGil Oh, SangGi Ryu, Byung-Hak Kim
General AI
The rapid evolution of Large Language Models (LLMs) has significantly impacted the field of natural language processing, but their growing complexity raises concerns about resource usage and transparency. Addressing these challenges, we participated in the NeurIPS LLM Efficiency Challenge, aiming to fine-tune a foundat…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-13 · WonJin Yoon, Kangyu Zhu, Ian Bulovic, Autumn Sehy, Yanjun Gao, Dmitriy Dligach, Majid Afshar, Timothy A. Miller
Research Track A · General AI
With the recent progress of Large Language Models (LLMs), there is a growing interest in applying these models to solve complex and challenging problems. Modern LLMs, capable of processing long contexts and generating verbalized explanations, offer significant potential in addressing real-world applications. However, a…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-13 · Lyuxing He, Eric Cai, Shobhit Aggarwal, Jianjun Wang, David Held
General AI
Recent advances in robotic manipulation have highlighted the effectiveness of learning from demonstration. However, while end-to-end policies excel in expressivity and flexibility, they struggle both in generalizing to novel object geometries and in attaining a high degree of precision. An alternative, object-centric a…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-14 · Kathakoli Sengupta, Kai Ao, Paola Cascante-Bonilla
General AI
Large Language Models (LLMs) and Vision-Language Models (VLMs) increasingly generate indoor scenes through intermediate structures such as layouts and scene graphs, yet evaluation still relies on LLM or VLM judges that score rendered views, making judgments sensitive to viewpoint, prompt phrasing, and hallucination. Wh…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-15 · Yarui Cao, Kai Liu
General AI
Fine-tuning large language models (LLMs) aims to adapt pre-trained models to specific tasks using relatively small and domain-specific datasets. Among Parameter-Efficient Fine-Tuning (PEFT) methods, Low-Rank Adaptation (LoRA) stands out by matching the performance of full fine-tuning while avoiding additional inference…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-16 · Giacomo Franchini, David Rodríguez-Martínez, Alfonso Martínez-Petersen, C. J. Pérez-del-Pulgar, Marcello Chiaberge
General AI
Autonomous robots operating in natural karstic caves face perception and navigation challenges that are qualitatively distinct from those encountered in mines or tunnels: irregular geometry, reflective wet surfaces, near-zero ambient light, and complex branching passages. Yet publicly available datasets targeting this …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-16 · Yao Tong, Jiayuan Ye, Anastasia Borovykh, Reza Shokri
General AI
Whether language models can systematically generalize remains actively debated. Yet empirical performance is jointly shaped by multiple factors such as training data, training paradigms, and inference-time strategies, making failures difficult to interpret. We introduce a controlled synthetic environment based on short…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-16 · Mengdi Wu, Xiaoyu Jiang, Oded Padon, Zhihao Jia
General AI
This paper presents Prism, the first symbolic superoptimizer for tensor programs. The key idea is sGraph, a symbolic, hierarchical representation that compactly encodes large classes of tensor programs by symbolically representing some execution parameters. Prism organizes optimization as a two-level search: it constru…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-16 · Yiyang Jiang, Li Zhang, Xiao-Yong Wei, Li Qing
General AI
Many SLT systems quietly assume that brief chunks of signing map directly to spoken-language words. That assumption breaks down because signers often create meaning on the fly using context, space, and movement. We revisit SLT and argue that it is mainly a cross-modal reasoning task, not just a straightforward video-to…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-17 · Habibeh Naderi, Behrouz Haji Soleimani, Stan Matwin
General AI
We propose HILBERT (HIerarchical Long-sequence Balanced Embedding with Reciprocal contrastive Training), a cross-attentive multimodal framework for learning document-level audio-text representations from long, segmented sequences in low-resource data settings. HILBERT leverages frozen pre-trained speech and language en…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-17 · Xiangbo Gao, Sicong Jiang, Bangya Liu, Xinghao Chen, Minglai Yang, Siyuan Yang, Mingyang Wu, Jiongze Yu, Qi Zheng, Haozhi Wang, Jiayi Zhang, Jared Yang, Jie Yang, Zihan Wang, Qing Yin, Zhengzhong Tu
General AI
As AI-assisted video creation becomes increasingly practical, instruction-guided video editing has become essential for refining generated or captured footage to meet professional requirements. Yet the field still lacks both a large-scale human-annotated dataset with complete editing examples and a standardized evaluat…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-20 · Kevin Murphy
General AI
We present BLF (Bayesian Linguistic Forecaster), an agentic system for binary forecasting that achieves state-of-the-art performance on the ForecastBench benchmark. The system is built on three ideas. (1) A Bayesian linguistic belief state: a semi-structured representation combining numerical probability estimates with…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-20 · Sijie Mai, Shiqin Han
General AI
Multimodal affective computing aims to predict humans' sentiment, emotion, intention, and opinion using language, acoustic, and visual modalities. However, current models often learn spurious correlations that harm generalization under distribution shifts or noisy modalities. To address this, we propose a causal modali…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-21 · Yusuf Çelebi, Yağız Asker, Özay Ezerceli, Mahmoud ElHussieni, Selva Taş, Reyhan Bayraktar, Fatma Betül Terzioğlu
General AI
Fine-tuning Large Language Models (LLMs) remains structurally uncertain despite parameter-efficient methods such as Low-Rank Adaptation (LoRA), as the layer-specific roles of internal representations are poorly understood, leading to heuristic decisions about where adaptation should be applied. We model the evolution o…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-21 · Zewei Zhou, Ruining Yang, Xuewei, Qi, Yiluan Guo, Sherry X. Chen, Tao Feng, Kateryna Pistunova, Yishan Shen, Lili Su, Jiaqi Ma
General AI
Vision-Language-Action (VLA) models offer a promising autonomous driving paradigm for leveraging world knowledge and reasoning capabilities, especially in long-tail scenarios. However, existing VLA models often struggle with the high latency in action generation using an autoregressive generation framework and exhibit …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-22 · Mariano Barone, Francesco Di Serio, Roberto Moio, Marco Postiglione, Giuseppe Riccio, Antonio Romano, Vincenzo Moscato
General AI
Large Language Models (LLMs) are increasingly deployed in healthcare, yet their communicative alignment with clinical standards remains insufficiently quantified. We conduct a multidimensional evaluation of general-purpose and domain-specialized LLMs across structured medical explanations and real-world physician-patie…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-22 · Hyeonwoo Kim, Jeonghwan Kim, Kyungwon Cho, Hanbyul Joo
General AI
Recent advances in video generative models enable the synthesis of realistic human-object interaction videos across a wide range of scenarios and object categories, including complex dexterous manipulations that are difficult to capture with motion capture systems. While the rich interaction knowledge embedded in these…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-22 · Joachim Baumann, Vishakh Padmakumar, Xiang Li, John Yang, Diyi Yang, Sanmi Koyejo
General AI
AI coding agents are being adopted at scale, yet we lack empirical evidence on how people actually use them and how much of their output is useful in practice. We present SWE-chat, the first large-scale dataset of real coding agent sessions collected from open-source developers in the wild. The dataset currently contai…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-22 · Yiming Bian, Joshua M. Akey
General AI
The scalability of long-context large language models is fundamentally limited by the quadratic memory cost of exact self-attention, which often leads to out-of-memory (OOM) failures on modern hardware. Existing methods improve memory efficiency to near-linear complexity, while assuming that the full query, key, and va…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-23 · Yang Hu, Vladyslav Turlo
General AI
The FAIR principles have transformed how computational data and workflows are shared in materials research, yet existing repositories can only serve pre-computed entries -- broad coverage is perpetually incomplete and cannot adapt to new questions on demand. To address these challenges, we present OptiMat Alloys, a lar…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-24 · Yuqi Pan, Jinghao Zhuang, Yupeng Feng, Fangzhi Zhong, Siyu Ding, Xuerui Qiu, Shaowei Gu, Bohan Sun, Zhiyong Qin, Yibo Zhong, Lingtao Ouyang, Kun Yang, Zehao Liu, Yuhong Chou, Shurong Wang, Anjie Hu, Han Xu, Bo Xu, Guoqi Li
General AI
Scaling context length is reshaping large-model development, yet full-attention Transformers suffer from prohibitive computation and inference bottlenecks at long sequences. A key challenge is to design foundation models that maintain performance and long-context efficiency with minimal training overhead. We introduce …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-24 · Keshav Ramji, Tahira Naseem, Ramón Fernandez Astudillo
General AI
While long, explicit chains-of-thought (CoT) have proven effective on complex reasoning tasks, they are costly to generate during inference. Non-verbal reasoning methods have emerged with shorter generation lengths by leveraging continuous representations, yet their performance lags behind verbalized CoT. We propose $\…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-27 · Aaryan Shah, Andrew Hines, Alexia Downs, Denis Bajet, Paulius Mui, Fabiano Araujo, Laura Offutt, Aida Rutledge, Elizabeth Jimenez
General AI
Objective. Clinical AI documentation systems require evaluation methodologies that are clinically valid, economically viable, and sensitive to iterative changes. Methods requiring expert review per scoring instance are too slow and expensive for safe, iterative deployment. We present a case-specific, clinician-authored…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-27 · Brandon Hsu, Daniel Beaglehole, Adityanarayanan Radhakrishnan, Mikhail Belkin
General AI
Linear activation steering is a powerful approach for eliciting the capabilities of large language models and specializing their behavior using limited labeled data. While effective, existing methods often apply a fixed steering strength to all tokens, resulting in inconsistent steering quality across diverse input pro…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-27 · Lixian Chen, Mingxuan Huang, Yanhui Chen, Junyi Lin, Yang Shi
General AI
Vision-language models transfer well in zero-shot settings, but at deployment the visual and textual branches often shift asymmetrically. Under this condition, entropy-based test-time adaptation can sharpen the fused posterior while increasing error, because an unreliable modality may still dominate fusion. We study th…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-27 · Zhiheng Liu, Weiming Ren, Xiaoke Huang, Shoufa Chen, Tianhong Li, Mengzhao Chen, Yatai Ji, Sen He, Jonas Schult, Belinda Zeng, Tao Xiang, Wenhu Chen, Ping Luo, Luke Zettlemoyer, Yuren Cong
General AI
Unified multimodal models typically rely on pretrained vision encoders and use separate visual representations for understanding and generation, creating misalignment between the two tasks and preventing fully end-to-end optimization from raw pixels. We introduce Tuna-2, a native unified multimodal model that performs …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-28 · Jan Dubiński, Jan Betley, Anna Sztyber-Betley, Daniel Tan, Owain Evans
General AI
Finetuning a language model can lead to emergent misalignment (EM) [Betley et al., 2025b]. Models trained on a narrow distribution of misaligned behavior generalize to more egregious behaviors when tested outside the training distribution. We study a set of interventions proposed to reduce EM. We confirm that these int…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-28 · Lucio La Cava, Andrea Tagarelli
General AI
Machine-generated text (MGT) detection requires identifying structurally invariant signals across generation models, rather than relying on model-specific fingerprints. In this respect, we hypothesize that while large language models excel at local semantic consistency, their autoregressive nature results in a specific…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-28 · An Nguyen, Hoang Nguyen, Phuong Le, Hung Pham, Cuong Do, Laurent El Ghaoui
General AI
We study city-scale control of electric-vehicle (EV) ride-hailing fleets where dispatch, repositioning, and charging decisions must respect charger and feeder limits under uncertain, spatially correlated demand and travel times. We formulate the problem as a hex-grid semi-Markov decision process (semi-MDP) with mixed a…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-29 · Youyuan Zhang, Jialiang Sun, Hangrui Bi, Chuqin Geng, Wenjie Ma, Zhaoyu Li, Xujie Si
General AI
We introduce DreamProver, an agentic framework that leverages a "wake-sleep" program induction paradigm to discover reusable lemmas for formal theorem proving. Existing approaches either rely on fixed lemma libraries, which limit adaptability, or synthesize highly specific intermediate lemmas tailored to individual the…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-29 · Fangqiang Fan, Zhicheng Zhao, Xiaoliang Ma, Chenglong Li, Jin Tang
General AI
Fine-grained RGBT image semantic segmentation is crucial for all-weather unmanned aerial vehicle (UAV) scene understanding. However, UAV RGBT semantic segmentation faces two coupled challenges: cross-modal spatial misalignment caused by sensor parallax and platform vibration, and severe semantic confusion among fine-gr…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-29 · Wenxuan Ye, Yangyang Zhang, Xueli An, Georg Carle, Yunpu Ma
General AI
Small language models (SLMs) offer computational efficiency for scalable deployment, yet they often fall short of the reasoning power exhibited by their larger counterparts (LLMs). To mitigate this gap, current approaches invoke an LLM to generate tokens at points of reasoning divergence, but these external calls intro…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-04-29 · Lingfeng Zhang, Xiaoshuai Hao, Xizhou Bu, Yingbo Tang, Hongsheng Li, Jinghui Lu, Xiu-shen Wei, Jiayi Ma, Yu Liu, Jing Zhang, Hangjun Ye, Xiaojun Liang, Long Chen, Wenbo Ding
General AI
Assisting humans in open-world outdoor environments requires robots to translate high-level natural-language intentions into safe, long-horizon, and socially compliant navigation behavior. Existing map-based methods rely on costly pre-built HD maps, while learning-based policies are mostly limited to indoor and short-h…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-06-04 · Bin Wen, Tien-Ping Tan
General AI
Multimodal sentiment analysis (MSA) infers human affect from language, acoustic, and visual signals. Recent methods increasingly adapt large multimodal models (LMMs) via generative readout: prompting the model to emit a sentiment score as a text string. While convenient, this ties continuous regression to discrete auto…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-06-04 · Liliana Hotsko, Yinxi Li, Yuntian Deng, Pengyu Nie
General AI
Code language models need repository-level context to resolve imports, APIs, and project conventions. Existing methods inject this knowledge as long inputs (retrieved through RAG or dependency analysis) or through per-repository fine-tuning and LoRA -- costly at repository scale and brittle to evolving codebases. We in…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-06-04 · Luzhe Sun, Jingtian Ji, Haoran Chen, Jiawei Zhou, Matthew R. Walter
General AI
Leveraging prior knowledge from pretrained policies, foundation models, or human operators offers an efficient alternative to learning robot skills from scratch. However, these agents often provide actions that are suboptimal, noisy, or misaligned with task-specific expert behavior. We propose GLOVES, a family of flow-…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-06-04 · Heng-Jui Chang, Alexander H. Liu, Saurabhchand Bhati, Mrudula Athi, Anton Ratnarajah, Amit Chhetri, James Glass
General AI
Audio encoders are critical to modern audio applications as large language models (LLMs) increasingly rely on a single encoder for diverse inputs. While self-supervised learning (SSL) has yielded strong domain-specific encoders like speech or music experts, multi-domain approaches like USAD and SPEAR remain limited in …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-06-08 · Shizhe Lin, Ladan Tahvildari
General AI
Multi-agent code generation offers a promising paradigm for autonomous software development by simulating the human software engineering lifecycle. However, system reliability remains hindered by LLM hallucinations and error propagation across interacting agents. While semantic entropy provides a principled way to quan…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-06-08 · Tianyu Lin, Jooyoung Ryu, Puvada Sreevarsha, Rahul Srinivasaragavan, Riya Satavlekar, Susan Kim, Nidhi Soley, Yujie Yan, Ishan Vatsaraj, Carl Harris, Aimon Rahman, Vishal Patel, Joseph Greenstein, Casey Taylor, Kemar E. Green
General AI
Eye movements, including saccades, are widely regarded as highly sensitive and objective biomarkers of neurophysiologic states. Detecting saccadic signatures in neurologic diseases offers a rapid, portable alternative to brain imaging, avoiding access and cost barriers. Currently, there are no robust AI-enabled video-o…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-06-08 · Weijie Wang, Haoyu Zhao, Yifan Yang, Feng Chen, Zeyu Zhang, Yefei He, Zicheng Duan, Donny Y. Chen, Yuqing Yang, Bohan Zhuang
General AI
Video world models that maintain 3D spatial consistency across generated frames typically rely on explicit point cloud memory constructed in RGB space. This design is both computationally expensive, requiring repeated rendering and VAE encoding, and inherently lossy, as the round trip through pixel space discards rich …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-06-08 · Gianluca Barmina, Federico Torrielli, Sven Harms, Jacob Nielsen, Felix Mächtle, Stine Lyngsø Beltoft, Peter Schneider-Kamp, Thomas Eisenbarth, Lukas Galke Poech, Anne Lauscher
General AI
Large language models (LLMs) routinely face requests that should be refused, creating a trade-off between helpfulness and harm prevention. However, refusals themselves can be helpful. In high-risk interactions involving crisis, coercion, or escalating intent, blunt non-compliance may prevent direct harm while still fai…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-06-08 · Qin Yang, Lu Malloy, Joshua Lee, Xiaohan Chang, Meisam Mohammady, Doowon Kim, Yuan Hong
General AI
Large language model (LLM)-powered content moderation systems have become a critical defense against harmful online content. However, these systems primarily operate on tokenized text and largely ignore the visual cues that humans naturally rely on when interpreting content. We show that this discrepancy creates a fund…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-06-09 · Evgenii Kortukov, Piotr Komorowski, Florian Klein, Paula Engl, Gabriele Sarti, Seong Joon Oh, Sebastian Lapuschkin, Wojciech Samek
General AI
Deployed large reasoning models (LRMs) often behave unexpectedly. Test-time steering controls LRM outputs by intervening on their hidden representations, but it can degrade output quality. We argue that prior steering work implicitly relies on internal features that detect behavior in already generated text. We show th…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-06-09 · Zhengxuan Wei, Yi Dong, Zonghui Li, Xianhui Lin, Xing Liu, Hong Gu, Shaofeng Zhang, Wenbin Li, Qi Fan
General AI
Low-Rank Adaptation (LoRA) merging can efficiently combine diverse generative capabilities from multiple trained LoRAs for a diffusion model. However, existing LoRA merging techniques often suffer from severe parameter interference, causing destructive collisions in the shared parameter space. To address this, we propo…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-06-11 · Tobias Holtdirk, Pietro Marcolongo, Anna Steinberg Schulten, Felix Henninger, Stefan Rose, Sarah Ball, Bolei Ma, Frauke Kreuter, Markus Weinmann, Stefan Feuerriegel
General AI
Reproducibility in the social and behavioral sciences is typically evaluated by independent researchers who reanalyze the original data to assess whether the published findings can be recovered. However, such approaches are resource-intensive and difficult to scale. Here, we show that large language models (LLMs) can a…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-06-11 · Elijah Cadenhead, Cristian McGee, Xin Li, El Houcine Bergou, Aritra Dutta
General AI
Low-rank adaptation (LoRA) and its variants provide a memory- and compute-efficient alternative to full fine-tuning of pre-trained models. However, questions remain about the comparative generalizability of these approaches and how the structural restrictions on low-rank updates preserve effective adaptation performanc…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-06-11 · Jialin Gan, Xin Qiu, Guangzhe Chen, Xue Wang
Research Track A · General AI
Large language models (LLMs) have enabled time series (TS) analysis by jointly modeling numerical observations and textual context through a shared token interface. However, TS tokens and prompt tokens exhibit fundamentally different information structures, making uniform token processing inefficient. In this paper, we…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-06-15 · Amdjed Belaref, Samir Sadok, Zineb Noumir, Renaud Seguier
General AI
Affective computing increasingly relies on deep learning to represent emotions, yet latent spaces often remain opaque, high-dimensional black boxes. This paper investigates whether Transformers' embeddings recover the geometric regularities of Russell's circumplex model. We unify two complementary experiments testing t…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-06-15 · Abbas Mammadov, Ozgur Kara, Kaan Oktay, Iskander Azangulov, Adil Kaan Akan, Hyungjin Chung, James Matthew Rehg, Yee Whye Teh
General AI
Diffusion and flow-based models learn powerful data priors by training a denoiser to reverse Gaussian corruption. To use this prior to solve a linear inverse problem, one needs to sample from the posterior, but the score that the prior provides is the unconditional score, not the posterior score. Existing methods eithe…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-06-15 · Xueluan Gong, Chen Chen, Jinxin Liu, Qian Wang, Kwok-Yan Lam
General AI
Foundation models are reshaping robotics by enabling robots to interpret open-ended instructions, reason over multimodal contexts, and operate in complex, open-world environments. However, their integration also introduces security and privacy (S&P) risks that extend beyond the FMs themselves to embodied execution pipe…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-06-16 · Abir Ashab Niloy, Ahmed Ryan, Imamul Hossain Rafi, Md Erfan, Md Rayhanur Rahman
General AI
Multi-stage cyberattacks span system, network, and browser logs. Detecting them requires correlating events across all three sources. Machine learning methods can learn these cross-source patterns, but they need labeled multi-source data. Existing public datasets fall short. Network-only datasets such as CICIDS and UNS…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-06-16 · Zheqi Dai, Guangyan Zhang, Zhen Ye, Jingyu Li, Haolin He, Chunyat Wu, Yiwen Guo, Qiuqiang Kong
General AI
Neural audio codecs are central to modern LLM-based Text-to-Speech (TTS) and multimodal systems. As low-bitrate semantic codecs gain prominence, the Token-to-Waveform (Token2Wav) decoder becomes a bottleneck determining both perceptual quality and system efficiency. Conventional multi-step flow-matching decoders offer …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-06-17 · Michael Finkelson, Daniel Segal, Eitan Richardson, Shahar Armon, Nani Goldring, Poriya Panet, Nir Zabari, Benjamin Brazowski, Or Patashnik, Yoav HaCohen
General AI
Existing multi-speaker dialogue systems bind speakers to utterances through structured supervision: per-turn tags, multi-stream transcriptions, or learnable speaker embeddings. These systems operate within speech-only pipelines that produce clean vocal sequences without the ambient texture of real conversations. We tak…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-06-17 · Jiaqing Zhang, Sabyasachi Bandyopadhyay, Miguel Contreras, Jessica Sena, Yuanfang Ren, Andrea Davidson, Ziyuan Guan, Tezcan Ozrazgat-Baslanti, Subhash Nerella, Azra Bihorac, Parisa Rashidi
General AI
Delirium is a common and serious complication in the Intensive Care Unit (ICU), associated with increased morbidity, prolonged hospital stays, and higher healthcare costs. Despite its prevalence, early prediction and prevention remain challenging. Environmental factors such as ambient sound and light may influence the …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.2
2026-06-15 · Jaehun Jung, Ximing Lu, Brandon Cui, Muhammad Khalifa, Shaokun Zhang, Hao Zhang, Jin Xu, Amala Sanjay Deshmukh, Karan Sapra, Andrew Tao, Yejin Choi, Jan Kautz, Mingjie Liu, Yi Dong
Research Track B · General AI
Training computer-use agents (CUAs) -- models that interact with graphical desktops through screenshots and keyboard/mouse actions -- requires large-scale, diverse trajectory data collected in full desktop environments. The largest public resource, AgentNet (22.5K human trajectories), leads to negative transfer when us…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.2
2026-06-23 · Anna Fang
General AI
Poorly designed interventions or those deployed without adequate safeguards can harm the communities they aim to serve, thus exacerbating existing vulnerabilities and leaving individuals unsupported. This is especially the case for the mental health context, where there is a growing trend of relying on technological in…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.2
2026-06-24 · Eyasu Getahun Chekole, Howard Halim, Daniël Reijsbergen, Jianying Zhou
General AI
Biometric authentication systems are increasingly deployed in security-critical applications, yet existing physiological and behavioral biometrics suffer from fundamental limitations: 1) they are vulnerable to spoofing attacks due to unreliable liveness detection, 2) biometric templates may leak privacy-sensitive infor…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.0
2026-05-01 · Daniel Song, Peter Ney, Cristina Menghini, Faizan Ahmad, Aidan Boyd, Nathaniel Li, Ziwen Han, Jean-Christophe Testud, Saisuke Okabayashi, Maeve Ryan, Jinpeng Miao, Hamza Kwisaba, Felix Binder, Spencer Whitman, Jim Gust, Esteban Arcaute, Dhaval Kapil, Jacob Kahn, Ayaz Minhas, Tristan Goodman, Lauren Deason, Alexander Vaughan, Shengjia Zhao, Summer Yue
General AI
This report documents the preparedness assessment of Code World Model (CWM), a model for code generation and reasoning about code from Meta. We conducted pre-release testing across domains identified in our Frontier AI Framework as potentially presenting catastrophic risks, and also evaluated the model's misaligned pro…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.0
2026-05-03 · Shuzheng Si, Haozhe Zhao, Yu Lei, Qingyi Wang, Dingwei Chen, Zhitong Wang, Zhenhailong Wang, Kangyang Luo, Zheng Wang, Gang Chen, Fanchao Qi, Minjia Zhang, Maosong Sun
General AI
Many real-world tasks require language models (LMs) to reason over complex contexts that exceed their parametric knowledge. This calls for context learning, where LMs directly learn relevant knowledge from the given context. An intuitive solution is inference-time skill augmentation: extracting the rules and procedures…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.0
2026-05-06 · Ivan Bondarenko, Roman Derunets, Oleg Sedukhin, Mikhail Komarov, Ivan Chernov, Mikhail Kulakov
General AI
We present our winning system for Task~B (generation with reference passages) in SemEval-2026 Task~8: MTRAGEval. Our method is a heterogeneous ensemble of seven LLMs with two prompting variants, where a GPT-4o-mini judge selects the best candidate per instance. We ranked 1st out of 26 teams, achieving a conditioned har…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.0
2026-05-07 · Zhengru Fang, Yanan Ma, Yu Guo, Senkang Hu, Yixian Zhang, Hangcheng Cao, Wenbo Ding, Yuguang Fang
Research Track A · General AI
When a chest X-ray shows consolidation but the question asks which finding is present, a medical vision-language model may answer "No consolidation." This is more than an incorrect choice: it is a polarity reversal that emits a clinical statement contradicting the image. We study this failure as negated-option attracti…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.0
2026-05-07 · Ajay Jaiswal, Lauren Hannah, Han-Byul Kim, Duc Hoang, Mehrdad Farajtabar, Minsik Cho
General AI
We revisit a universally accepted but under-examined design choice in every modern LLM: a token index is looked up once at the input embedding layer and then permanently discarded. This single-injection assumption induces two structural failures: (i) the Rare Token Problem, where a Zipf-type distribution of vocabulary …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.0
2026-05-10 · Roni Blushtein-Livnon, Tal Svoray, Itay Fischhendler, Havatzelet Yahel, Emir Galilee
Research Track A
In traditional rural societies, where social ties are embedded in physical space, the diffusion of emerging technologies may be amplified through socio-spatial contagion (SSC). Such processes may play a key role in accelerating residential PV adoption in off-grid regions. Yet empirical evidence on SSC in PV adoption re…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.0
2026-05-12 · Xinjie Shen, Rongzhe Wei, Peizhi Niu, Haoyu Wang, Ruihan Wu, Eli Chien, Bo Li, Pin-Yu Chen, Pan Li
General AI
Hidden malicious intent in multi-turn dialogue poses a growing threat to deployed large language models (LLMs). Rather than exposing a harmful objective in a single prompt, increasingly capable attackers can distribute their intent across multiple benign-looking turns. Recent studies show that even modern commercial mo…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.0
2026-05-29 · Arbaz Khan, Jeonghun J. Lee, Harpal Singh
Research Track A
In this paper, we propose and analyze a novel two-field symmetric formulation with solid displacement and fluid pressure as main unknowns for the Biot's consolidation model in poroelasticity. Firstly, we prove the well-posedness of the new formulation and then show the existence and uniqueness of optimal control where …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.0
2026-06-18 · Luca Zedda, Davide Antonio Mura, Cecilia Di Ruberto, Maurizio Atzori, Muhammed Furkan Dasdelen, Carsten Marr, Andrea Loddo
General AI
Attention-based Multiple Instance Learning aggregators in medical imaging are prone to attention concentration, producing overconfident and unstable predictions. We introduce QG-MIL, a gated transformer aggregator that addresses this through four synergistic architectural components: RMSNorm-based pre-normalization, pe…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-05-02 · Shuaipeng Zhou, Yu Zhang
General AI
Libraries of Low-Rank Adaptation (LoRA) adapters are becoming a practical by-product of parameter-efficient adaptation. Once such adapters accumulate, a natural question is no longer how to train one adapter for one task, but how to reuse an open pool of adapters for a new task given only a small support set. Prior wor…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-05-04 · Mario Rodríguez Béjar, B. Romera-Paredes, Jose L. Hernández-Ramos
General AI
Modern fuzzers increasingly use Large Language Models (LLMs) to generate structured inputs, but LLM-driven fuzzing is sensitive to prompt initialization and sampling variance, which can reduce exploration efficiency and lead to redundant inputs. We present FunFuzz, a multi-island evolutionary fuzzing framework that run…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-05-04 · Shikhar Shukla
General AI
Speculative decoding accelerates large language model (LLM) inference by using a small draft model to propose candidate tokens that a larger target model verifies. A critical hyperparameter in this process is the speculation length~$γ$, which determines how many tokens the draft model proposes per step. Nearly all exis…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-05-07 · Weiqing Xiao, Hong Li, Xiuyu Yang, Houyuan Chen, Wenyi Li, Tianqi Liu, Shaocong Xu, Chongjie Ye, Hao Zhao, Beibei Wang
General AI
Recent advances have shown that large-scale video diffusion models can be repurposed as neural renderers by first decomposing videos into intrinsic scene representations and then performing forward rendering under novel illumination. While promising, this paradigm fundamentally relies on accurate intrinsic decompositio…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-05-07 · Suoxin Zhang, Run He, Di Fang, Xiang Tan, Kaixuan Chen, Huiping Zhuang
General AI
Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method that places trainable low-rank adapters into frozen pre-trained models. Recent studies show that using fewer LoRA adapters may still maintain or even improve performance, but existing methods still distribute adapters broadly, leaving wh…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-05-07 · Abdelrahman Zaian, Sheethal Bhat, Mohamed Abdalkader, Andreas Maier
General AI
Diabetic Retinopathy (DR) is a leading cause of preventable blindness among working-age adults worldwide, yet most automated screening systems are limited to image-level classification and lack clinically structured reporting. We propose Retina-RAG, a low-cost modular framework that jointly performs DR severity grading…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-05-12 · Yabo Zhang, Kunchang Li, Dewei Zhou, Xinyu Huang, Xun Wang
General AI
While recent advancements in multimodal language models have enabled image generation from expressive multi-image instructions, existing methods struggle to maintain performance under complex interleaved instructions. This limitation stems from the structural separation of images and text in current paradigms, which fo…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-05-22 · Zisu Huang, Jingwen Xu, Yifan Yang, Ziyang Gong, Qihao Yang, Muzhao Tian, Xiaohua Wang, Changze Lv, Xuemei Gao, Qi Dai, Bei Liu, Kai Qiu, Xue Yang, Dongdong Chen, Xiaoqing Zheng, Chong Luo
General AI
Language agents increasingly improve by reusing \emph{skills} -- structured procedural artifacts distilled from past experience. In particular, \emph{domain-level} and \emph{model-generated} skills are especially promising. They offer fast adaptation within a domain by encoding domain-specific recurring procedures, and…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-05-22 · Laura R. Marusich, Mary Grace Kozuch Dhooghe, Jonathan Z. Bakdash, Murat Kantarcioglu
General AI
Large language models (LLMs) have the potential to aid and improve human decision-making in classification tasks, not only by providing fairly accurate predictions, but also in their ability to generate cogent narrative explanations of those predictions. Prior work has demonstrated that people generally find AI narrati…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-05-22 · Anastasiia Sedova, Natalie Schluter, Skyler Seto, Maartje ter Hoeve
General AI
Cross-lingual knowledge transfer is critical for building high-performing multilingual language models for languages with insufficient training data. When target language data is scarce, the knowledge required for many downstream tasks involving scientific reasoning, commonsense inference, and world knowledge must be a…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-05-22 · Yifan Lu, Qi Wu, Jay Zhangjie Wu, Zian Wang, Huan Ling, Sanja Fidler, Xuanchi Ren
General AI
Most practical high-resolution text-to-image systems, including latent diffusion and autoregressive models, perform generation in a compact latent space, and a decoder maps the generated latents back to pixels. Yet the latent-to-pixel decoder is reconstruction-oriented, optimized to invert the encoder rather than synth…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-05-22 · Constantin Blessing, Elias Geiger, Jakob Häringer, Dennis Grewe, Markus Enzweiler
General AI
Deploying heterogeneous multi-agent robot fleets for collaborative perception requires robust data exchange and scalable software architectures. However, standard ROS 2 implementations often suffer from network saturation, namespace collisions, and severe computational overhead when distributing dense sensor streams ac…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-05-27 · Fei Deng, Yanwu Xu, Zhipeng Bao, Zhixing Zhang, Haolin Jia, Karthik Raveendran, Jianing Wei
General AI
The remarkable generation quality of modern diffusion models often comes at the cost of massive parameter counts, which necessitate server-side inference with significant computational costs and potential privacy risks. Consequently, there is growing momentum toward developing efficient on-device alternatives. While re…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-05-28 · Keshigeyan Chandrasegaran, Kyle Sargent, Suchir Agarwal, Michael Jang, Michael Poli, Juan Carlos Niebles, Justin Johnson, Jiajun Wu, Li Fei-Fei
General AI
Studying scalable methods for visual generative modeling requires large, accessible, and stable datasets. We introduce GPIC, a Giant Permissive Image Corpus of approximately 28 trillion pixels. GPIC comprises diverse internet images captioned by a state-of-the-art vision-language model, including 100M training, 200K va…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-05-28 · Nhat-Minh Nguyen
General AI
Are AI agents tools, co-authors, or researchers? We present a quantified case study ($N=1$): a physicist supervising an AI coding agent (Claude Code, Sonnet and Opus models) over 12 work days and 57 sessions to build CLAX-PT, a differentiable one-loop perturbation theory module in JAX. We documented and classified 15 s…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-05-29 · Albert Sadowski, Jarosław A. Chudziak
General AI
The same arguments often need to be evaluated under different external regimes. An agent with influence over the regime has a strategic lever that standard formalisms do not directly capture. We introduce context-dependent argumentation frameworks (CDAFs), an extension of Dung's theory in which a defeat function determ…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-05-29 · Yanshu Li, Jiaqian Li, Kuai Yu, Xi Xiao, Dongfang Liu, Tianyang Wang, Ruixiang Tang
General AI
Large vision-language models (LVLMs) have demonstrated strong general multimodal capability and are increasingly deployed in downstream systems. This trend has driven growing interest in LVLM personalization, which aims to enable models to quickly and effectively learn out-of-distribution multimodal concepts to meet us…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-05-29 · Danish Ali, Li Xiaojian, Sundas Iqbal, Farrukh Zaidi
General AI
Multilingual orthopedic decision support remains challenging in low-resource healthcare settings, where clinical narratives contain specialized terminology, mixed scripts, incomplete evidence, label imbalance and language-dependent documentation patterns. This article presents a reliability-oriented framework for class…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-05-29 · Davis Brown, Samarth Bhargav, Arav Santhanam, Kasper Hong, Ivan Zhang, Matan Shtepel, Steffi Chern, Alexander Robey, Eric Wong, Hamed Hassani
General AI
Language models can find thousands of severe software vulnerabilities, and agents are increasingly being misused for cyberattacks. To avoid detection, attackers frequently distribute their misuse, splitting a harmful task across many user accounts so each individual transcript looks benign. Because safety monitors scor…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-05-29 · Dylan Steiner, Gustavo Arango-Argoty, Gerald Sun, Etai Jacob
General AI
Multimodal models in oncology can produce accurate predictions, but accurate prediction does not reveal whether the model has learned biology that is shared across modalities, biology confined to one modality, or spurious correlations that reflect confounders rather than genuine biology. We introduce DECAT, a model-agn…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-03-23 · Yuze Qin, Qingyong Li, Zhiqing Guo, Wen Wang, Yan Liu, Yangli-ao Geng
General AI
Precipitation nowcasting is critical for disaster mitigation and aviation safety. However, radar-only models frequently suffer from a lack of large-scale atmospheric context, leading to performance degradation at longer lead times. While integrating meteorological variables predicted by weather foundation models offers…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-03-25 · Terry Chen, Zhifan Ye, Bing Xu, Zihao Ye, Timmy Liu, Ali Hassani, Tianqi Chen, Andrew Kerr, Haicheng Wu, Yang Xu, Yu-Jung Chen, Hanfeng Chen, Aditya Kane, Ronny Krashinsky, Ming-Yu Liu, Vinod Grover, Luis Ceze, Roger Bringmann, John Tran, Wei Liu, Fung Xie, Michael Lightstone, Humphrey Shi
General AI
Agentic Variation Operators (AVO) are a new family of evolutionary variation operators that replace the fixed mutation, crossover, and hand-designed heuristics of classical evolutionary search with autonomous coding agents. Rather than confining a language model to candidate generation within a prescribed pipeline, AVO…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.5
2026-03-26 · Mohamed Eltahir, Ahmed O. Ibrahim, Obada Siralkhatim, Tabarak Abdallah, Sondos Mohamed
Research Track A · General AI
Vision-Language Models (VLMs) are powerful open-set reasoners, yet their direct use as anomaly detectors in video surveillance is fragile: without calibrated anomaly priors, they alternate between missed detections and hallucinated false alarms. We argue the problem is not the VLM itself but how it is used. VLMs should…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-03-26 · Han Zhang, Guo-Hua Yuan, Chaohao Yuan, Tingyang Xu, Tian Bian, Hong Cheng, Wenbing Huang, Deli Zhao, Yu Rong
General AI
Modeling cellular states and predicting their responses to perturbations are central challenges in computational biology and the development of virtual cells. Existing foundation models for single-cell transcriptomics provide powerful static representations, but they do not explicitly model the distribution of cellular…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.5
2026-03-27 · Rangya Zhang, Jiaping Xiao, Lu Bai, Yuhang Zhang, Mir Feroskhan
Research Track A
Continual learning seeks to maintain stable adaptation under non-stationary environments, yet this problem becomes particularly challenging in object detection, where most existing methods implicitly assume relatively balanced visual conditions. In extreme-sparsity regimes, such as those observed in space-based residen…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.5
2026-03-27 · Xuerui Zhang, Xuehao Wang, Zhan Zhuang, Linglan Zhao, Ziyue Li, Xinmin Zhang, Zhihuan Song, Yu Zhang
Research Track A
Lifelong learning aims to preserve knowledge acquired from previous tasks while incorporating knowledge from a sequence of new tasks. However, most prior work explores only streams of homogeneous tasks (\textit{e.g.}, only classification tasks) and neglects the scenario of learning across heterogeneous tasks that posse…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.5
2026-03-29 · Chunmei Wang, Shangyou Zhang
Research Track A
This paper presents an auto-stabilized weak Galerkin (WG) finite element method for the Biot's consolidation model within the classical displacement-pressure two-field formulation. Unlike traditional WG approaches, the proposed scheme achieves numerical stability without the requirement of traditional stabilizers. Spat…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-03-30 · Ryan Po, David Junhao Zhang, Amir Hertz, Gordon Wetzstein, Neal Wadhwa, Nataniel Ruiz
General AI
Video world models have shown immense promise for interactive simulation and entertainment, but current systems still struggle with two important aspects of interactivity: user control over the environment for reproducible, editable experiences, and shared inference where players hold influence over a common world. To …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-06 · Zijun Wang, Haoqin Tu, Letian Zhang, Hardy Chen, Juncheng Wu, Xiangyan Liu, Zhenlong Yuan, Tianyu Pang, Michael Qizhe Shieh, Fengze Liu, Zeyu Zheng, Huaxiu Yao, Yuyin Zhou, Cihang Xie
General AI
OpenClaw, the most widely deployed personal AI agent in early 2026, operates with full local system access and integrates with sensitive services such as Gmail, Stripe, and the filesystem. While these broad privileges enable high levels of automation and powerful personalization, they also expose a substantial attack s…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-07 · Rishab Balasubramanian, Pin-Jie Lin, Rituraj Sharma, Anjie Fang, Fardin Abdi, Viktor Rozgic, Zheng Du, Mohit Bansal, Tu Vu
General AI
We investigate whether post-trained capabilities can be transferred across models without retraining, with a focus on transfer across different model scales. We propose the Master Key Hypothesis, which states that model capabilities correspond to directions in a low-dimensional latent subspace that induce specific beha…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-13 · Yifan Yu, Yuqing Jian, Junxiong Wang, Zhongzhu Zhou, Donglin Zhuang, Xinyu Fang, Sri Yanamandra, Xiaoxia Wu, Qingyang Wu, Shuaiwen Leon Song, Tri Dao, Ben Athiwaratkun, James Zou, Fan Lai, Chenfeng Xu
General AI
Diffusion language models promise parallel generation, yet still lag behind autoregressive (AR) models in quality. We stem this gap to a failure of introspective consistency: AR models agree with their own generations, while DLMs often do not. We define the introspective acceptance rate, which measures whether a model …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-13 · Efstathios Karypidis, Spyros Gidaris, Nikos Komodakis
General AI
Accurate future video prediction requires both high visual fidelity and consistent scene semantics, particularly in complex dynamic environments such as autonomous driving. We present Re2Pix, a hierarchical video prediction framework that decomposes forecasting into two stages: semantic representation prediction and re…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.5
2026-04-13 · Rok Spruk
Research Track A
This paper develops a political-economy theory of statehood without capacity. I argue that under specific institutional and geopolitical conditions, a polity can become trapped in an equilibrium of nominal statehood: a state in which claims to sovereignty, external recognition, and symbolic legitimacy persist or even s…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-15 · Akira Kawabata, Saku Sugawara
General AI
Rubric-augmented verification guides reward models with explicit evaluation criteria, yielding more reliable judgments than single-model verification. However, most existing methods require costly rubric annotations, limiting scalability. Moreover, we find that rubric generation is vulnerable to a failure of cooperatio…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-17 · Jiaxi Bi, Tongxu Luo, Wenyu Du, Zhengyang Tang, Benyou Wang
General AI
Parallel reasoning enhances Large Reasoning Models (LRMs) but incurs prohibitive costs due to futile paths caused by early errors. To mitigate this, path pruning at the prefix level is essential, yet existing research remains fragmented without a standardized framework. In this work, we propose the first systematic tax…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.5
2026-04-17 · Jason Cusati, Chris Brown
Research Track A
Software engineering research has experienced rapid growth in both output and participation over the past decades. Yet concerns persist about the field's ability to accumulate, integrate, and reuse knowledge in ways that support long-term progress. To better understand how the community itself perceives these challenge…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.5
2026-04-18 · Daeun Hwangbo, Junyeong Park, Minjeong Jeon, Ick Hoon Jin
Research Track A
Computer-based assessments routinely generate detailed interaction logs -- commonly referred to as process data -- that record every action a respondent performs during task completion, yet systematic preprocessing guidance, integrated analytical workflows, and cross-method consistency checks remain scarce in the liter…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.5
2026-04-20 · Stefan Tanevski
Research Track A
This paper asks how institutional stock-market integration reshapes the transmission of monetary policy through asset prices in small open economies. Motivated by the persistent segmentation of Western Balkan capital markets, we develop a two-stage counterfactual transmission framework to identify how stock-exchange co…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-22 · Haebin Seong, Li Yin, Haoran Zhang
General AI
AI agents are increasingly deployed on complex, domain-specific workflows -- navigating enterprise web applications that require dozens of clicks and form fills, orchestrating multi-step research pipelines that span search, extraction, and synthesis, automating code review across unfamiliar repositories, and handling c…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-25 · Bingda Tang, Yuhui Zhang, Xiaohan Wang, Jiayuan Mao, Ludwig Schmidt, Serena Yeung-Levy
General AI
Aligning denoising generative models with human preferences or verifiable rewards remains a key challenge. While policy-gradient online reinforcement learning (RL) offers a principled post-training framework, its direct application is hindered by the intractable likelihoods of these models. Prior work therefore either …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-26 · Zhen Ye, Xu Tan, Aoxiong Yin, Hongzhan Lin, Guangyan Zhang, Peiwen Sun, Yiming Li, Chi-Min Chan, Wei Ye, Shikun Zhang, Wei Xue
General AI
Joint audio-video generation models have shown that unified generation yields stronger cross-modal coherence than cascaded approaches. However, existing models couple modalities throughout denoising via pervasive attention, treating high-level semantics and low-level details in a fully entangled manner. This is subopti…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-27 · Bo Ni, Leyao Wang, Yu Wang, Branislav Kveton, Franck Dernoncourt, Yu Xia, Hongjie Chen, Reuben Leura, Samyadeep Basu, Subhojyoti Mukherjee, Puneet Mathur, Nesreen Ahmed, Junda Wu, Li Li, Huixin Zhang, Ruiyi Zhang, Tong Yu, Sungchul Kim, Jiuxiang Gu, Zhengzhong Tu, Alexa Siu, Zichao Wang, David Seunghyun Yoon, Nedim Lipka, Namyong Park, Zihao Lin, Trung Bui, Yue Zhao, Tyler Derr, Ryan A. Rossi
General AI
User simulation has long played a vital role in computer science due to its potential to support a wide range of applications. Language, as the primary medium of human communication, forms the foundation of social interaction and behavior. Consequently, simulating conversational behavior has become a key area of study.…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-28 · Zeyue Xue, Siming Fu, Jie Huang, Shuai Lu, Haoran Li, Yijun Liu, Yuming Li, Xiaoxuan He, Mengzhao Chen, Haoyang Huang, Nan Duan, Ping Luo
General AI
While large-scale video diffusion models have demonstrated impressive capabilities in generating high-resolution and semantically rich content, a significant gap remains between their pretraining performance and real-world deployment requirements due to critical issues such as prompt sensitivity, temporal inconsistency…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-28 · Jiayi Guo, Linqing Wang, Jiangshan Wang, Yang Yue, Zeyu Liu, Zhiyuan Zhao, Qinglin Lu, Gao Huang, Chunyu Wang
General AI
Unified multimodal models (UMMs) integrate visual understanding and generation within a single framework. For text-to-image (T2I) tasks, this unified capability allows UMMs to refine outputs after their initial generation, potentially extending the performance upper bound. Current UMM-based refinement methods primarily…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-29 · Jun Guo, Qiwei Li, Peiyan Li, Zilong Chen, Nan Sun, Yifei Su, Heyun Wang, Yuan Zhang, Xinghang Li, Huaping Liu
General AI
We propose X-WAM, a Unified 4D World Model that unifies real-time robotic action execution and high-fidelity 4D world synthesis (video + 3D reconstruction) in a single framework, addressing the critical limitations of prior unified world models (e.g., UWM) that only model 2D pixel-space and fail to balance action effic…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-04-30 · Abdelrahman Sadallah, Kareem Elozeiri, Mervat Abassy, Rania Elbadry, Mohamed Anwar, Abed Alhakim Freihat, Preslav Nakov, Fajri Koto
General AI
Poetry has long been a central art form for Arabic speakers, serving as a powerful medium of expression and cultural identity. While modern Arabic speakers continue to value poetry, existing research on Arabic poetry within Large Language Models (LLMs) has primarily focused on analysis tasks such as interpretation or m…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-05-29 · Stine Lyngsø Beltoft, William Brach, Federico Torrielli, Jacob Nielsen, Annemette Brok Pirchert, Filippo Tonini, Peter Schneider-Kamp, Lukas Galke Poech
Research Track A · General AI
Monitoring autonomous language model agents currently relies mostly on surface behavior. But what happens when agent populations invent new languages with the goal of avoiding human oversight. Here, we study the emergent languages on Moltbook. For this, we build upon the Moltbook Files dataset and apply a two-stage app…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-06-02 · Jongwook Han, Hyeongjin Kim, Yohan Jo
General AI
While household robots are often evaluated based on task completion, everyday domestic environments involve value-conflicting situations in which robots are expected to choose actions that prioritize other values than task success, such as human autonomy, efficiency, or social appropriateness. Yet, there are no benchma…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-06-04 · Noam Issachar, Dani Lischinski, Raanan Fattal
General AI
Standard continuous-time generative models rely on monolithic architectures that must navigate vastly different signal regimes, from isotropic noise to intricate data distributions. While scaling model capacity improves performance, deploying a massive network uniformly across the entire generative timeline is inherent…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-06-04 · Rui Zhao, Kaiming Yang, Jifeng Zhu, Siyang Chen, Ziqi Wang, Weijia Wu, Kevin Qinghong Lin, Heng Wang, Mike Zheng Shou
General AI
Video generation models have made impressive strides in synthesizing visually compelling content, yet their outputs remain confined to the virtual domain. A natural question follows: how well do these models reflect the physical world when their generated videos leave the screen and enter reality? We propose robotic ma…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-06-10 · Niccolò Biondi, Federico Pernici, Simone Ricci, Alberto Del Bimbo
General AI
Learning compatible representations aims to learn feature representations that can be used interchangeably over time whenever a model undergoes updates. In this paper, we demonstrate that stationary representations learned by d-Simplex fixed classifiers imply compatibility as in its formal definition. This result estab…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-06-13 · Ziqing Qiao, Yinuo Xu, Chaojun Xiao, Zhou Su, Zihan Zhou, Yingfa Chen, Xiaoyue Xu, Xu Han, Zhiyuan Liu
General AI
Modern language models increasingly adopt hybrid architectures that combine full attention with efficient attention modules, such as sliding-window attention (SWA) and recurrent sequence mixers. However, how these efficient modules shape model capabilities remains poorly understood. To address this gap, we conduct a sy…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-06-15 · Sean Man, Ron Raphaeli, Matan Kleiner, Or Ronai
General AI
In this paper, we introduce SP^3, a novel Plug-and-Play algorithm that accelerates maximum a posteriori image restoration by replacing denoisers with Spherical Encoders (SE) as generative priors. SP^3 approximates the intractable proximal prior step by utilizing the SE tightly structured latent space as a robust projec…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-06-15 · Chenxi Xie, Yuhui Wu, Qiaosi Yi, Lei Zhang
General AI
Existing image editing methods can be generally categorized into textual instruction-based and visual prompt-based ones. Textual instructions are semantically expressive, but are limited by the coarse granularity of spatial control of the editing results. In contrast, visual prompts such as drag and point can provide p…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-06-16 · Weichen Fan, Haiwen Diao, Penghao Wu, Ziwei Liu
General AI
Pixel-space diffusion models are trained on full-bandwidth noisy images, yet the useful signal available to the denoiser is strongly frequency dependent. Under rectified-flow diffusion and natural-image power-law spectra, the per-band data-to-noise contour k^{*}(t) = (1-t)^{-2/α} separates a signal-bearing low-frequenc…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-06-16 · Jingyuan Huang, Zuming Huang, Yucheng Shi, Tianze Yang, Xiaoming Zhai, Wei Chu, Ninghao Liu
General AI
Graphical user interface (GUI) grounding requires vision-language models (VLMs) to identify small target elements in high-resolution screenshots and predict precise screen coordinates. On-policy self-distillation (OPSD) is a promising post-training approach for this coordinate-sensitive task, since it provides dense to…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-06-17 · Tim Rädsch, Yuki M Asano, Hilde Kuehne, Stefan Bauer, Priyank Jaini, Robert Geirhos, Carsten T. Lüth
General AI
Video generative models ( VGMs) have become a new frontier that can be used not just for video generation but for a multitude of downstream tasks, including world modeling. To advance these tasks, a good video model must understand the physical reality of the world. Evaluating this understanding is an emerging field an…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-03-23 · Alexandra Zelenin, Alexandra Zhuravlyova
General AI
Weight-Decomposed Low-Rank Adaptation (DoRA) extends LoRA by decoupling weight magnitude from direction, but its forward pass requires the row-wise norm of W + sBA, a computation that every major framework we surveyed implements by materializing the dense [d_out, d_in] product BA. At d_in = 8192 and rank r = 384, a sin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 6.3
2026-03-25 · Yupei Li, Shuaijie Shao, Manuel Milling, Björn Schuller
General AI
Current audio deepfake detection has achieved remarkable performance using diverse deep learning architectures such as ResNet, and has seen further improvements with the introduction of large models (LMs) like Wav2Vec. The success of large language models (LLMs) further demonstrates the benefits of scaling model parame…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-03-26 · Mingmeng Geng, Yuhang Dong, Thierry Poibeau
General AI
Through an analysis of arXiv papers, we report several shifts in word usage that are likely driven by large language models (LLMs) but have not previously received sufficient attention, such as the increased frequency of "beyond" and "via" in titles and the decreased frequency of "the" and "of" in abstracts. Due to the…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-03-26 · Jiabin Hua, Hengyuan Xu, Aojie Li, Wei Cheng, Gang Yu, Xingjun Ma, Yu-Gang Jiang
General AI
Fine-grained facial expression editing has long been limited by intrinsic semantic overlap. To address this, we construct the Flex Facial Expression (FFE) dataset with continuous affective annotations and establish FFE-Bench to evaluate structural confusion, editing accuracy, linear controllability, and the trade-off b…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-03-26 · Haoyan Yang, Mario Xerri, Solha Park, Huajian Zhang, Yiyang Feng, Sai Akhil Kogilathota, Jiawei Zhou
General AI
As large language models (LLMs) continue to advance, improving them solely through human supervision is becoming increasingly costly and limited in scalability. As models approach human-level capabilities in certain domains, human feedback may no longer provide sufficiently informative signals for further improvement. …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-03-30 · Jack Cook, Hyemin S. Lee, Kathryn Le, Junxian Guo, Giovanni Traverso, Anantha P. Chandrakasan, Song Han
General AI
NVFP4 has grown increasingly popular as a 4-bit format for quantizing large language models due to its hardware support and its ability to retain useful information with relatively few bits per parameter. However, the format is not without limitations: recent work has shown that NVFP4 suffers from its error distributio…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-03-30 · Anuj Diwan, Eunsol Choi, David Harwath
General AI
We introduce ParaSpeechCLAP, a dual-encoder contrastive model that maps speech and text style captions into a common embedding space, supporting a wide range of intrinsic (speaker-level) and situational (utterance-level) descriptors (such as pitch, texture and emotion) far beyond the narrow set handled by existing mode…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-03-30 · Oliver Aleksander Larsen, Mahyar T. Moghaddam
General AI
Modern distributed systems integrate heterogeneous services, REST APIs with different schema versions, GraphQL endpoints, and IoT devices with proprietary payloads that suffer from persistent schema mismatches. Traditional static adapters require manual coding for every schema pair and cannot handle novel combinations …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-03-30 · Aur Shalev Merin
General AI
Recurrent networks do not need Jacobian propagation to adapt online. The hidden state already carries temporal credit through the forward pass; immediate derivatives suffice if you stop corrupting them with stale trace memory and normalize gradient scales across parameter groups. An architectural rule predicts when nor…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-03-30 · Chengyin Hu, Jiaju Han, Xuemeng Sun, Qike Zhang, Yiwei Wei, Ang Li, Chunlei Meng, Xiang Chen, Jiahuan Long
General AI
Vision-language models (VLMs) rely on a shared visual-textual representation space to perform tasks such as zero-shot classification, image captioning, and visual question answering (VQA). While this shared space enables strong cross-task generalization, it may also introduce a common vulnerability: small visual pertur…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-03-31 · Breno C. Bispo, Stefania Sardellitti, Juliano B. Lima, Fernando A. N. Santos
General AI
Brain connectomics is still largely dominated by pairwise-based models, such as graphs, which cannot represent circulatory or higher-order functional interactions. In this paper, we propose a multimodal framework based on Topological Signal Processing (TSP) that models the brain as a higher-order topological domain and…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-03-31 · Sowmya Vajrala, Aakash Parmar, Prasanna R, Sravanth Kodavanti, Manjunath Arveti, Srinivas Soumitri Miriyala, Ashok Senapati
General AI
Generative Artificial Intelligence (GenAI) features such as image editing, object removal, and prompt-guided image transformation are increasingly integrated into mobile applications. However, deploying Large Vision Models (LVMs) for such tasks on resource-constrained devices remains challenging due to their high memor…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-03-31 · Timon Klein, Jonas Kusch, Sebastian Sager, Stefan Schnake, Steffen Schotthöfer
General AI
The pursuit of reducing the memory footprint of the self-attention mechanism in multi-headed self attention (MHA) spawned a rich portfolio of methods, e.g., group-query attention (GQA) and multi-head latent attention (MLA). The methods leverage specialized low-rank factorizations across embedding dimensions or attentio…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-02 · Abhilash Kar, Basisth Saha, Tanmay Sen, Biswabrata Pradhan
General AI
Multimodal time-to-event prediction often requires integrating sensitive data distributed across multiple parties, making centralized model training impractical due to privacy constraints. At the same time, most existing multimodal survival models produce single deterministic predictions without indicating how confiden…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-02 · Sten Rüdiger, Sebastian Raschka
General AI
Minor Component Adaptation (MiCA) is a novel parameter-efficient fine-tuning method for large language models that focuses on adapting underutilized subspaces of model representations. Unlike conventional methods such as Low-Rank Adaptation (LoRA), which target dominant subspaces, MiCA leverages Singular Value Decompos…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-02 · Hao Zhu, Di Zhou, Donna Slonim
General AI
Understanding causal dependencies in observational data is critical for informing decision-making. These relationships are often modeled as Bayesian Networks (BNs) and Directed Acyclic Graphs (DAGs). Existing methods, such as NOTEARS and DAG-GNN, often face issues with scalability and stability in high-dimensional data…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-06 · Dawar Khan, Alexandre Kouyoumdjian, Xinyu Liu, Omar Mena, Dominik Engel, Ivan Viola
General AI
We present ClickAIXR, a novel on-device framework for multimodal vision-language interaction with objects in extended reality (XR). Unlike prior systems that rely on cloud-based AI (e.g., ChatGPT) or gaze-based selection (e.g., GazePointAR), ClickAIXR integrates an on-device vision-language model (VLM) with a controlle…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-06 · Ke Shi, Yao Zhang, Feng Guo, Jinyuan Zhang, JunShuo Zhang, Shen Gao, Shuo Shang
General AI
Generative recommendation has emerged as a transformative paradigm for capturing the dynamic evolution of user intents in sequential recommendation. While flow-based methods improve the efficiency of diffusion models, they remain hindered by the ``Noise-to-Data'' paradigm, which introduces two critical inefficiencies: …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-07 · Qimin Zhong, Hao Liao, Haiming Qin, Mingyang Zhou, Rui Mao, Wei Chen, Naipeng Chao
General AI
Whether Large Language Models (LLMs) develop coherent internal world models remains a core debate. While conventional Next-Token Prediction (NTP) focuses on one-step-ahead supervision, Multi-Token Prediction (MTP) has shown promise in learning more structured representations. In this work, we provide a theoretical pers…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-07 · Andrew Kurtz, Klaudia Krawiecka
General AI
The governance of artificial intelligence has a blind spot: the machine identities that AI systems use to act. AI agents, service accounts, API tokens, and automated workflows now outnumber human identities in enterprise environments by ratios exceeding 80 to 1, yet no integrated framework exists to govern them. A sing…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-09 · Hananel Hazan, Yanbo Zhang, Benedikt Hartl, Michael Levin
General AI
How many of a neural network's parameters actually encode task-specific information? We investigate this question with LottaLoRA, a training paradigm in which every backbone weight is drawn at random and frozen; only low-rank LoRA adapters are trained. Across nine benchmarks spanning diverse architecture families from …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-09 · Longxiang Jiao, Lukas Hofmann, Yiru Yang, Zhanyi Wu, Jonas Egeler
General AI
While micro-scale traffic simulations provide essential data for urban planning, they are rarely coupled with the high-fidelity visualization or auralization necessary for effective stakeholder communication. In this work, we present a real-time 4D visualization framework that couples the SUMO traffic with a photoreali…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-09 · Simon Gerstenecker, Andreas Geiger, Katrin Renz
General AI
Generalization under distribution shift remains a central bottleneck for closed-loop autonomous driving. Although simulators like CARLA enable safe and scalable testing, existing benchmarks rarely measure true generalization: they typically reuse training scenarios at test time. Success can therefore reflect memorizati…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-09 · Tao Xie, Peishan Yang, Yudong Jin, Yingfeng Cai, Wei Yin, Weiqiang Ren, Qian Zhang, Wei Hua, Sida Peng, Xiaoyang Guo, Xiaowei Zhou
General AI
This paper addresses the task of large-scale 3D scene reconstruction from long video sequences. Recent feed-forward reconstruction models have shown promising results by directly regressing 3D geometry from RGB images without explicit 3D priors or geometric constraints. However, these methods often struggle to maintain…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-09 · Stephen Cheng, Sarah Wiegreffe, Dinesh Manocha
General AI
Applying steering vectors to large language models (LLMs) is an efficient and effective model alignment technique, but we lack an interpretable explanation for how it works-- specifically, what internal mechanisms steering vectors affect and how this results in different model outputs. To investigate the causal mechani…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-10 · Harshith Kethavath, Weiming Hu
General AI
Adapting vision-language models to remote sensing imagery presents a fundamental challenge: both the visual and linguistic distributions of satellite data lie far outside natural image pretraining corpora. Despite this, prompting remains the dominant deployment paradigm, driven by the assumption that domain-specific la…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-12 · Rahul Ahuja, Mudit Jain, Bala Murali Manoghar Sai Sudhakar, Venkatraman Narayanan, Pratik Likhar, Varun Ravi Kumar, Senthil Yogamani
General AI
Vision foundation models (VFMs) and Bird's Eye View (BEV) representation have advanced visual perception substantially, yet their internal spatial representations assume the rectilinear geometry of pinhole cameras. Fisheye cameras, widely deployed on production autonomous vehicles for their surround-view coverage, exhi…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-13 · Wanli Ma, Sivasakthy Selvakumaran, Dain G. Farrimond, Adam A. Dennis, Samuel E. Rigby
General AI
Accurate and rapid structural damage assessment (SDA) is crucial for post-disaster management, helping responders prioritise resources, plan rescues, and support recovery. Traditional field inspections, though precise, are limited by accessibility, safety risks, and time constraints, especially after large explosions. …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-13 · Ricardo Coimbra Brioso, Lorenzo Mondo, Damiano Dei, Nicola Lambri, Pietro Mancosu, Marta Scorsetti, Daniele Loiacono
General AI
Accurate delineation of the Clinical Target Volume (CTV) is essential for radiotherapy planning, yet remains time-consuming and difficult to assess, especially for complex treatments such as Total Marrow and Lymph Node Irradiation (TMLI). While deep learning-based auto-segmentation can reduce workload, safe clinical de…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-13 · Yuto Harada, Hiro Taiyo Hamada
General AI
Using psychological constructs such as the Big Five, large language models (LLMs) can imitate specific personality profiles and predict a user's personality. While LLMs can exhibit behaviors consistent with these constructs, it remains unclear where and how they are represented inside the model and how they relate to b…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-14 · Jian Han, Jinlai Liu, Jiahuan Wang, Bingyue Peng, Zehuan Yuan
General AI
While diffusion models dominate the field of visual generation, they are computationally inefficient, applying a uniform computational effort regardless of different complexity. In contrast, autoregressive (AR) models are inherently complexity-aware, as evidenced by their variable likelihoods, but are often hindered by…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-14 · Cristian Minoccheri, Emily Wittrup, Kayvan Najarian, Ryan Stidham
General AI
Computed tomography (CT) enterography is a primary imaging modality for assessing inflammatory bowel disease (IBD), yet the representational choices that best support automated analysis of this modality are unknown. We present the first study of vision-language transfer learning on abdominal CT enterography and identif…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-14 · Yaxuan Li, Yuxin Zuo, Bingxiang He, Jinqian Zhang, Chaojun Xiao, Cheng Qian, Tianyu Yu, Huan-ang Gao, Wenkai Yang, Zhiyuan Liu, Ning Ding
General AI
On-policy distillation (OPD) has become a core technique in the post-training of large language models, yet its training dynamics remain poorly understood. This paper provides a systematic investigation of OPD dynamics and mechanisms. We first identify that two conditions govern whether OPD succeeds or fails: (i) the s…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-16 · Jack Wei Lun Shi, Minghao Dang, Wawan Solihin, Justin K. W. Yeoh
General AI
Existing research on large language models (LLMs) for automated code compliance has primarily focused on performance, treating the models as black boxes and overlooking how training decisions affect their interpretive behavior. This paper addresses this gap by employing a perturbation-based attribution analysis to comp…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-17 · Yeganeh Abdollahinejad, Ahmad Mousavi, Naeemul Hassan, Kai Shu, Nathalie Japkowicz, Shahriar Khosravi, Amir Karami
General AI
The widespread dissemination of multimodal content on social media has made misinformation detection increasingly challenging, as misleading narratives often arise not only from textual or visual content alone, but also from semantic inconsistencies between modalities and their evolution over time. Existing multimodal …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-17 · Hitesh Mehta, Arjit Saxena, Garima Chhikara, Rohit Kumar
Research Track A · General AI
This paper explores the response of Large Language Models (LLMs) to user prompts with different degrees of politeness and impoliteness. The Politeness Theory by Brown and Levinson and the Impoliteness Framework by Culpeper form the basis of experiments conducted across three languages (English, Hindi, Spanish), five mo…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-17 · Pritesh Jha
General AI
We present PIIBench, a unified benchmark corpus for Personally Identifiable Information (PII) detection in natural language text. Existing resources for PII detection are fragmented across domain-specific corpora with mutually incompatible annotation schemes, preventing systematic comparison of detection systems. We co…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-17 · Haoran Feng, Yifan Niu, Zehuan Huang, Yang-Tian Sun, Chunchao Guo, Yuxin Peng, Lu Sheng
General AI
We introduce LaviGen, a framework that repurposes 3D generative models for 3D layout generation. Unlike previous methods that infer object layouts from textual descriptions, LaviGen operates directly in the native 3D space, formulating layout generation as an autoregressive process that explicitly models geometric rela…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-17 · Maks Pečnik Bambič, Nuno A. M. Araújo, Giorgio Volpe
General AI
Collective rotations are common in active matter, enhancing cohesion, transport, and mixing. They are typically attributed to chiral non-reciprocal dynamics due to intrinsic particle chirality, torque-generating interactions among units, or geometric confinement. Here, we uncover a different mechanism for rotational or…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-19 · Nwe Ni Win, Jim Basilakis, Steven Thomas, Seyhan Yazar, Laura Pierce, Stephanie Liu, Paul M. Middleton, Nasser Ghadiri, X. Rosalind Wang
General AI
Extracting clinically relevant information from unstructured medical narratives such as admission notes, discharge summaries, and emergency case histories remains a challenge in clinical natural language processing (NLP). Medical Entity Recognition (MER) identifies meaningful concepts embedded in these records. Recent …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-20 · Hao Meng, Siyuan Zheng, Shuran Zhou, Qiangqiang Wang, Yang Song
General AI
Large Language Models (LLMs) show promise in lyric-to-melody generation, but models trained with Supervised Fine-Tuning (SFT) often produce musically implausible melodies with issues like poor rhythm and unsuitable vocal ranges, a phenomenon we term "constraint violation". To address this, we propose a novel alignment …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-20 · Yunke Ao, Le Chen, Bruce D. Lee, Assefa S. Wahd, Aline Czarnobai, Philipp Fürnstahl, Bernhard Schölkopf, Andreas Krause
General AI
Proximal Policy Optimization (PPO) has become the predominant algorithm for on-policy reinforcement learning due to its scalability and empirical robustness across domains. However, there is a significant disconnect between the underlying foundations of trust region methods and the heuristic clipped objective used in P…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-20 · Terence Lim, Kumar Muthuraman, Michael Sury
General AI
We introduce a multi-agent framework intended to emulate parts of a quantitative research team and support equity factor research on large financial panel datasets. QRAFTI integrates a research toolkit for panel data with MCP servers that expose data access, factor construction, and custom coding operations as callable…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-21 · Alex Lin, Lei Gao, Narsimlu Kemsaram, Sriram Subramanian
General AI
AcoustoBots are mobile acoustophoretic robots capable of delivering mid-air haptics, directional audio, and acoustic levitation, but existing implementations rely on scripted commands and lack an intuitive interface for real-time human control. This work presents a gesture-based visual learning framework for contactles…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-21 · Robert Stanley, Avi Verma, Lillian Tsai, Konstantinos Kallas, Sam Kumar
General AI
AI agents promise to serve as general-purpose personal assistants for their users, which requires them to have access to private user data (e.g., personal and financial information). This poses a serious risk to security and privacy. Adversaries may attack the AI model (e.g., via prompt injection) to exfiltrate user da…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-22 · Casey Crane
General AI
We study the emergence of symmetric oscillatory behavior in multi-agent systems where each agent incorporates a continuous memory of its past states and past rates of change, modeled by distributed retarded and neutral delays. The closed-loop dynamics are described by a system of nonlinear neutral functional differenti…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-22 · Thorsten Hoeser, Felix Bachofer, Claudia Kuenzer
General AI
The offshore wind energy sector is expanding rapidly, increasing the need for independent, high-temporal-resolution monitoring of infrastructure deployment and operation at global scale. While Earth Observation based offshore wind infrastructure mapping has matured for spatial localization, existing open datasets lack …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-22 · Travis LaCroix
General AI
The value alignment problem for artificial intelligence (AI) is often framed as a purely technical or normative challenge, sometimes focused on hypothetical future systems. I argue that the problem is better understood as a structural question about governance: not whether an AI system is aligned in the abstract, but w…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-22 · Ruohan Liu, Shukang Yin, Tao Wang, Dong Zhang, Weiji Zhuang, Shuhuai Ren, Ran He, Caifeng Shan, Chaoyou Fu
General AI
Paralinguistic cues are essential for natural human-computer interaction, yet their evaluation in Large Audio-Language Models (LALMs) remains limited by coarse feature coverage and the inherent subjectivity of assessment. To address these challenges, we introduce SpeechParaling-Bench, a comprehensive benchmark for para…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-23 · Thibault Bañeras-Roux, Shashi Kumar, Driss Khalil, Sergio Burdisso, Petr Motlicek, Shiran Liu, Mickael Rouvier, Jane Wottawa, Richard Dufour
General AI
Automatic Speech Recognition (ASR) is traditionally evaluated using Word Error Rate (WER), a metric that is insensitive to meaning. Embedding-based semantic metrics are better correlated with human perception, but decoder-based Large Language Models (LLMs) remain underexplored for this task. This paper evaluates their …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-23 · D. Pauli, T. N. Parsons, R. K. Prinja
General AI
Massive stars with their strong ionizing radiation and strong stellar winds are the key feedback agents of the universe. Stellar winds of massive stars are often measured by fitting resonance lines in the UV using non-LTE stellar atmosphere models. So far, the line formation regions of these lines have not been measure…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-23 · Bartosz Balis, Michal Orzechowski, Piotr Kica, Michal Dygas, Michal Kuszewski
General AI
Scientific workflow systems automate execution -- scheduling, fault tolerance, resource management -- but not the semantic translation that precedes it. Scientists still manually convert research questions into workflow specifications, a task requiring both domain knowledge and infrastructure expertise. We propose an a…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-23 · Yuto Nishida, Naoki Shikoda, Yosuke Kishinami, Ryo Fujii, Makoto Morishita, Hidetaka Kamigaito, Taro Watanabe
General AI
Understanding what kinds of factual knowledge large language models (LLMs) memorize is essential for evaluating their reliability and limitations. Entity-based QA is a common framework for analyzing non-verbatim memorization, but typical evaluations query each entity using a single canonical surface form, making it dif…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-23 · Jiahui Liang, Shuoyao Wang, Shijian Gao
General AI
Efficient beam alignment is fundamental to high-throughput and reliable connectivity in Vehicle-to-Everything (V2X) systems. However, conventional beam management in dynamic vehicular topologies incurs prohibitive alignment overhead and struggles to maintain robust links under rapid mobility. To overcome these challeng…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-23 · Haolin Zhang, William Reber, Yuxuan Zhang, Guofei Gu, Jeff Huang
General AI
Modern phishing campaigns increasingly evade snapshot-based URL classifiers using interaction gates (e.g., checkbox/slider challenges), delayed content rendering, and logo-less credential harvesters. This shifts URL triage from static classification toward an interactive forensics task: an analyst must actively navigat…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-24 · Sheza Munir, Ratna Kandala, Anamta Khan, Deepti, Joyojeet Pal
General AI
Health misinformation remains one of the most pressing challenges on social media, particularly when cultural traditions intersect with scientific-sounding claims. These dynamics are not only global but also deeply local, manifesting in culturally specific controversies that require careful analysis. Motivated by this,…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-24 · Md Erfan, Md Kamal Hossain Chowdhury, Ahmed Ryan, Md Rayhanur Rahman
General AI
Large Language Models (LLMs) show promise in automated software engineering, yet their guarantee of correctness is frequently undermined by erroneous or hallucinated code. To enforce model honesty, formal verification requires LLMs to synthesize implementation logic alongside formal specifications that are subsequently…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-24 · Xiang Zhang, Xiaotian Li, Taoyue Wang, Nan Bi, Xin Zhou, Cody Zhou, Zoie Wang, Andrew Yang, Yuming Su, Jeff Cohn, Qiang Ji, Lijun Yin
General AI
Social interactions dominate our perceptions of the world and shape our daily behavior by attaching social meaning to acts as simple and spontaneous as gestures, facial expressions, voice, and speech. People mimic and otherwise respond to each other's postures, facial expressions, mannerisms, and other verbal and nonve…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-24 · Jay Yu, Shunfan Zhou, Hang Yin, Brian Seong
General AI
Blockchain wallets conventionally follow an ownership model where possession of a private key grants unilateral control. However, this assumption is brittle for emerging settings such as AI agent wallets, organizational custody, and enterprise payroll, where multiple actors must coordinate without exposing secrets or l…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-24 · Ilana Nguyen, Harini Suresh, Thema Monroe-White, Evan Shieh
General AI
Large language models (LLMs) are increasingly used for text generation tasks from everyday use to high-stakes enterprise and government applications, including simulated interviews with asylum seekers. While many works highlight the new potential applications of LLMs, there are risks of LLMs encoding and perpetuating h…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-27 · German Marin, Jatin Chaudhary
General AI
Autonomous AI agents can remain fully authorized and still become unsafe as behavior drifts, adversaries adapt, and decision patterns shift without any code change. We propose the \textbf{Informational Viability Principle}: governing an agent reduces to estimating a bound on unobserved risk $\hat{B}(x) = U(x) + SB(x) +…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-27 · Hermawan Manurung, Ibrahim Al-Kahfi, Ahmad Rizqi, Martin Clinton Tosima Manullang
General AI
Indonesian marketplace reviews mix standard vocabulary with slang, regional loanwords, numeric shorthands, and emoji, making lexicon-based sentiment tools unreliable in practice. This paper describes a two-track classification pipeline applied to the PRDECT-ID dataset, which contains 5,400 product reviews from 29 Indon…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-28 · Sherzod Turaev, Mary John, Jaloliddin Rustamov, Zahiriddin Rustamov, Saja Aldabet, Nazar Zaki, Khaled Shuaib
General AI
Understanding learners' cognitive and affective states underpins adaptive educational systems and effective teaching. Although research links nonverbal cues to internal states, no framework calibrates them to evidence. We present the Nonverbal Syntax Framework, drawn from a systematic review of 908 studies and 17,043 c…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-30 · Anietta Weckauff, Yuchen Zhang, Maksym Andriushchenko
General AI
Fine-tuning large language models (LLMs) on narrowly misaligned data generalizes to broadly misaligned behavior, a phenomenon termed emergent misalignment (EM). While prior work has found a correlation between harmful behavior and self-assessment in emergently misaligned models, it remains unclear how consistent this c…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-30 · Yujun Wu, Dongxu Zhang, Xinchen Li, Jinhang Xu, Yiling Duan, Yumou Liu, Jiabao Pan, Xuanhe Zhou, Jingxuan Wei, Siyuan Li, Jintao Chen, Conghui He, Cheng Tan
General AI
Existing research infrastructure is fundamentally document-centric, providing citation links between papers but lacking explicit representations of methodological evolution. In particular, it does not capture the structured relationships that explain how and why research methods emerge, adapt, and build upon one anothe…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-30 · Jeanne Monnier, Thomas George, Frédéric Guyard, Christèle Tarnec, Marios Kountouris
General AI
Fairness in machine learning remains challenging due to its ethical complexity, the absence of a universal definition, and the need for context-specific bias metrics. Existing methods still struggle with intersectionality, multiclass settings, and limited flexibility and generality. To address these gaps, we introduce …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-30 · Zeynep Okray, Nils Otto, Anna A. Cook, Clifford Talbot, Ashwin Miriyala, Martín Klappenbach, Ciara Stern, Kieran Desmond, Paola Vargas-Gutierrez, Scott Waddell
General AI
Associating multiple sensory cues with a single experience or object is a fundamental process that improves object recognition and memory performance. However, neural mechanisms that bind sensory features during learning and augment memory expression are unknown. Here we demonstrate multisensory appetitive and aversive…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-04-30 · Samuel Kiegeland, Vésteinn Snæbjarnarson, Tim Vieira, Ryan Cotterell
General AI
Surprisal theory links human processing effort to the predictability of an upcoming linguistic unit, but empirical work often leaves the notion of a unit underspecified. In practice, experimental stimuli are segmented into linguistically motivated units (e.g., words), while pretrained language models assign probability…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-05-01 · Lin Che, Xi Wang, Marc Pollefeys, Konrad Schindler, Martin Raubal, Peter Kiefer
General AI
Urban perception describes how people subjectively evaluate urban environments, shaping how cities are experienced and understood. Existing computational approaches primarily model urban perception directly from street view images, but largely ignore the human perceptual process through which such judgments are formed.…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-05-01 · Alfredo Madrid-García, Miguel Rujas
General AI
Background: Patient-facing medical chatbots based on retrieval-augmented generation (RAG) are increasingly promoted to deliver accessible, grounded health information. AI-assisted development lowers the barrier to building them, but they still demand rigorous security, privacy, and governance controls. Objective: To re…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-06-04 · Mingyang Liu, Asuman Ozdaglar, Tiancheng Yu, Kaiqing Zhang
General AI
In this paper, we study regret minimization in repeated games with \emph{adaptive} opponents who can respond based on histories of play. The standard metric of \emph{external regret} in online learning is known to fail to capture such adaptivity. To account for players' counterfactual reasoning, we introduce {\tt Repea…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-06-04 · Yan Wang, Tianyang Hu
General AI
Topological Data Analysis (TDA) offers a principled, intrinsic lens for comparing neural representations. However, existing paired topological divergences (e.g., RTD) are limited by heuristic asymmetry and, more critically, unbounded scores that depend on sample size, hindering reliable cross-scenario benchmarking. To …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-06-08 · Avijit Ghosh, Anka Reuel, Jenny Chim, Wm. Matthew Kennedy, Srishti Yadav, Jennifer Mickel, Yanan Long, Andrew Tran, Anastassia Kornilova, Damian Stachura, Kevin Klyman, Felix Friedrich, Jeba Sania, Max Lamparth, Jan Batzner, Anoop Mishra, Eliya Habba, Yixiong Hao, Nathan Heath, Shalaleh Rismani, Usman Gohar, Andrea Loehr, David Manheim, Ruchira Dhar, Sree Harsha Nelaturu, Aarush Sinha, Leshem Choshen, Drishti Sharma, Ishan Khire, Amit Saha, Subramanyam Sahoo, Michael Hardy, Michael Alexander Riegler, Kabir Manghnani, Michelle Lin, Yanan Jiang, Yilin Huang, Asaf Yehudai, Jessica Ji, Aris Hofmann, Mubashara Akhtar, Nuno Moniz, Yacine Jernite, Stella Biderman, Zeerak Talat, Sanmi Koyejo, Mykel Kochenderfer, Irene Solaiman
General AI
AI evaluation results are produced at scale but reported inconsistently across leaderboards, model cards, benchmark papers, and company blogs. The cost is interpretive: readers cannot reliably compare results across sources, identify what a report omits, or trace an aggregate claim to its underlying evidence. Recent ef…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-06-08 · Antonio Scala
General AI
AI-mediated information manipulation increasingly takes the form of social cyber attacks that target trust, attention, credibility, reputation, and decision-making rather than only technical infrastructures or isolated false contents. Existing defensive approaches often oscillate between incident-level analysis, which …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-06-08 · Oladimeji Anthonio, Dimeji Abdulsobur Olawuyi, Oloruntoba Ajayi, Temiloluwa Aderemi, Joseph Odamo
General AI
Clinical artificial intelligence (AI) systems routinely produce predictions without principled quantification of uncertainty, limiting their trustworthiness in high-stakes medical environments. This paper presents an integrated research programme addressing two interconnected problems: (1) the development of a fully en…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-06-09 · Semih Kara, Oğuzhan Ersoy
General AI
Conditioning a language model on additional context, such as feedback on a previous attempt, typically improves its response. Self-distillation trains the model to retain this improvement when the context is not present. The method works by matching the model's output distribution under two settings: a student that see…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-06-11 · Andy Tang, William Chen, Andrew Wagenmaker, Chelsea Finn, Sergey Levine
General AI
Generalist policies can learn a wide range of skills from diverse robot datasets. In order to solve or improve on challenging news tasks, we need a way to infer and invoke the appropriate actions from the policy's rich behavioral prior, especially when directly commanding the policy fails. We focus on flow matching gen…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-06-15 · Kevin Yuanbo Wu, Tianxing Zhou, Isaac Tu, Billy Yan, Irmak Guzey, David Fouhey, Dandan Shan, Lerrel Pinto
General AI
Humans can grasp objects effortlessly, whereas multi-fingered robots are far from this level of generality. We argue that the most natural source of robot grasping data is from humans, who pick up thousands of objects every day. We present HUG, a flow-matching model that generates diverse human grasps for any user-spec…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-06-15 · Kareem Amin, Rudrajit Das, Alessandro Epasto, Adel Javanmard, Dennis Kraft, Mónica Ribero, Sergei Vassilvitskii
General AI
The rapid adoption of generative AI and Large Language Models (LLMs) has spurred interest in synthetic data as a privacy-preserving alternative to sensitive real-world datasets. However, generating high-utility synthetic data often carries the risk of memorizing and regurgitating private information from the training c…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-06-16 · Rishit Dagli, Donglai Xiang, Vismay Modi, Xuning Yang, Gavriel State, David I. W. Levin, Maria Shugrina
General AI
Accurate mechanical properties (or materials) Young's modulus ($E$), Poisson's ratio ($ν$) and density ($ρ$) are essential for reliable physics simulation of digital worlds, but most 3D assets lack this information. We propose AdaVoMP, a method for predicting accurate dense spatially-varying ($E$, $ν$, $ρ$) for input 3…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-06-16 · Ning Gao, Jinliang Zheng, Xing Gao, Haoxiang Ma, Hanqing Wang, Yukai Wang, Jiantong Chen, Zanxin Chen, Shujie Zhang, Mingda Jia, Xuekun Jiang, Zihou Zhu, Xinyu Li, Shuai Wang, Hao Li, Wenzhe Cai, Yuqiang Yang, Xudong Xu, Zhaoyang Lyu, Yao Mu, Tai Wang, Jiangmiao Pang, Jia Zeng, Weinan Zhang, Chunhua Shen
General AI
We present EBench, a simulation benchmark that diagnoses generalist mobile manipulation policies beyond a single success-rate scalar. EBench comprises 26 diverse and challenging manipulation tasks annotated along 5 capability dimensions and 4 generalization dimensions. We evaluate state-of-the-art generalist manipulati…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-06-16 · Tanmay Srivastava, Paras Bhavnani, Benjir Alvee Islam, Shubham Jain
General AI
We introduce MAJIC, a multimodal emotion recognition system that leverages articulatory motion of the jaw and facial muscles for speech-based emotion recognition (SER). While most SER systems perform well on datasets with strongly expressed emotional speech of trained actors, their performance often degrades when emoti…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-06-17 · Haohua Que, Zhipeng Bao, Qianyi Wu, Handong Yao
General AI
Cloud-hosted large multimodal models (LMMs) can provide strong open-vocabulary perception for Vehicle-to-Everything systems, but naively transmitting full-resolution frames from edge to cloud causes severe communication overhead and high cloud-side prefill latency. We present CABLE, a cloud-assisted bandwidth-efficient…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.2
2026-06-23 · Zhuoren Ye, Tianyu Wo, Dinghao Xue, Mingming Zhang, Yuchen Teng, Chunming Hu, Renyu Yang
General AI
Emerging LLM services increasingly host many sparse MoE models, yet most models receive sparse requests and remain cold. This creates a GPU memory problem: model weights are stable and model-determined, while KV-cache is transient and demand-determined. Because cold models rarely reach peak KV-cache demand at the same …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.2
2026-06-23 · Maggie Wang, Lars Osterberg, Stephen Tian, Ola Shorinwa, Jiajun Wu, Mac Schwager
General AI
Vision-language-action (VLA) models can learn manipulation skills from demonstrations, but their capabilities are bounded by the skills in the training data. We present InSight, a framework that unlocks autonomous skill acquisition by rendering VLAs steerable at the primitive-action level (e.g., "move gripper to the bo…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.2
2026-06-24 · Botao He, Zhi Wang, Linna Kuang, Ishaan Ghosh, Jitendra Malik, Cornelia Fermuller, Tingfan Wu, Jiayuan Mao, Ruoshi Liu, Haozhi Qi, Yiannis Aloimonos
General AI
Human demonstrations are a scalable data source for learning robot manipulation policies. However, common sources of human demonstration data, such as motion-capture trajectories and internet videos, capture mostly motion and appearance while missing the contact forces that are critical for force-sensitive manipulation…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.2
2026-06-24 · Qiyang Lyu, Zhenyu Wu, Wei Wang, Hongming Shen, Danwei Wang
General AI
Localization in challenging environments, such as GNSS-denied, geometrically repetitive, or textureless scenes commonly found in offices, hotels, and underground parking facilities, remains an open problem for reliable autonomous mobile robot (AMR) deployment. Single-modality localization methods are inherently limited…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.2
2026-06-24 · Lawrence S. Moss, Arthur Paul Pedersen
General AI
This theoretical note studies the finite axiomatizability of strict majority reasoning in finite social decision frames. Moss and Pedersen (2026) <doi: 10.48550/arXiv.2606.23853> introduce a coherence criterion that characterizes exactly when qualitative majority judgments are representable by a finitely additive measu…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.0
2026-04-02 · William Hoy, Binxu Wang, Xu Pan
Research Track A · General AI
Evolution Strategies (ES) have emerged as a scalable gradient-free alternative to reinforcement learning based LLM fine-tuning, but it remains unclear whether comparable task performance implies comparable solutions in parameter space. We compare ES and Group Relative Policy Optimization (GRPO) across four tasks in bot…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.0
2026-04-02 · Zhanting Zhou, KaHou Tam, Ziqiang Zheng, Zeyu Ma
Research Track A · General AI
Multimodal recommendation systems (MRS) jointly model user-item interaction graphs and rich item content, but this tight coupling makes user data difficult to remove once learned. Approximate machine unlearning offers an efficient alternative to full retraining, yet existing methods for MRS mainly rely on a largely uni…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.0
2026-04-07 · Mutsumi Sasaki, Kouta Nakayama, Yusuke Miyao, Yohei Oseki, Masaru Isonuma
Research Track A · General AI
When introducing Large Language Models (LLMs) into industrial applications, such as healthcare and education, the risk of generating harmful content becomes a significant challenge. While existing machine unlearning methods can erase specific harmful knowledge and expressions, diverse harmful content makes comprehensiv…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.0
2026-04-16 · Zhen Yang, Ping Jian, Zhongbin Guo, Zuming Zhang, Chengzhi Li, Yonghong Deng, Xinyue Zhang, Wenpeng Lu
Research Track A
Over the past year, spatial intelligence has drawn increasing attention. Many prior works study it from the perspective of visual-spatial intelligence, where models have access to visuospatial information from visual inputs. However, in the absence of visual information, whether linguistic intelligence alone is suffici…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.0
2026-04-30 · Ansar Aynetdinov, Patrick Haller, Alan Akbik
General AI
Recent research has shown that filtering massive English web corpora into high-quality subsets significantly improves training efficiency. However, for high-resource non-English languages like German, French, or Japanese, aggressive filtering creates a strategic dilemma: should practitioners prioritize diversity by tra…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.0
2026-05-04 · Sinan Wang, Jinjin He, Shenyifan Lu, Ruicheng Wang, Greg Turk, Bo Zhu
General AI
We present Orbit-Space Geometric Probability Paths (OGPP), a particle-native flow-matching framework for generative modeling of particle systems. OGPP is motivated by two insights: (i) particles are defined up to permutation symmetries, so anonymous indexing inflates per-index target variance and yields curved, hard-to…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.0
2026-05-06 · Han Wang, Jintao Zhang, Kai Jiang, Haoxu Wang, Jianfei Chen, Jun Zhu
General AI
LLM-based Triton kernel generation has attracted significant interest, yet a fundamental empirical question remains unanswered: where does this capability break down, and why? We present KernelBench-X, a benchmark designed to answer this question through category-aware evaluation of correctness and hardware efficiency …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.0
2026-05-07 · Ilya Borovik
General AI
Symbolic music datasets with matched scores and performances are essential for many music information retrieval (MIR) tasks. Yet, existing resources often cover a narrow range of composers, lack performance variety, omit note-level alignments, or use inconsistent naming formats. This work presents PianoCoRe, a large-sc…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.0
2026-05-07 · Ziyun Zeng, Yiqi Lin, Guoqiang Liang, Mike Zheng Shou
General AI
In recent years, open-source efforts like Senorita-2M have propelled video editing toward natural language instruction. However, current publicly available datasets predominantly focus on local editing or style transfer, which largely preserve the original scene structure and are easier to scale. In contrast, Backgroun…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.0
2026-05-12 · Jiafei Lyu, Zichuan Lin, Scott Fujimoto, Kai Yang, Yangkun Chen, Saiyong Yang, Zongqing Lu, Deheng Ye
General AI
Model-based representations recently stand out as a promising framework that embeds latent dynamics information into the representations for downstream off-policy actor-critic learning. It implicitly combines the advantages of both model-free and model-based approaches while avoiding the training costs associated with …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.0
2026-05-12 · Haoyu Zhang, Zeyu Zhang, Zedong Zhou, Yang Zhao, Hao Tang
General AI
Transformer-based 3D reconstruction has emerged as a powerful paradigm for recovering geometry and appearance from multi-view observations, offering strong performance across challenging visual conditions. As these models scale to larger backbones and higher-resolution inputs, improving their efficiency becomes increas…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.0
2026-05-26 · Shuang Liang, Chaochuan Hou, Xu Yao, Shiping Wang, Hailiang Huang, Songqiao Han, Minqi Jiang
General AI
While previous research in multivariate time series forecasting has focused on developing complex holistic models, this work advocates for a shift toward a granular, component-level understanding of their impacts. We propose TSCOMP, the first large-scale benchmark that systematically deconstructs deep forecasting metho…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.0
2026-05-29 · Arnas Uselis, Darina Koishigarina, Seong Joon Oh
General AI
Humans easily determine which color belongs to which shape in multi-object scenes, an ability known as concept binding. Vision-language embedding models such as CLIP struggle with binding: they recognize individual concepts but fail to represent which concepts form which objects. Although CLIP behaves like a bag-of-con…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.0
2026-06-09 · Bowen Ping, Xiangxin Zhou, Penghui Qi, Minnan Luo, Liefeng Bo, Tianyu Pang
General AI
Recent work has demonstrated that online reinforcement learning (RL) can substantially improve the quality and alignment of flow matching models for image and video generation. Methods such as Flow-GRPO and CPS cast the denoising process as a Markov Decision Process and apply PPO-style ratio clipping to enforce a trust…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-05-04 · Anahita Golrang, Kshitij Sharma, olga viberg
General AI
Effective pair programming depends on coordination of attention, cognitive effort, and joint regulation over time, yet most adaptive learning systems remain individual-centric and reactive. This paper introduces ProPACT, a proactive AI-driven adaptive collaborative tutor that treats collaboration itself as the object o…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-05-07 · Sushant Gautam, Finn Schwall, Annika Willoch Olstad, Fernando Vallecillos Ruiz, Birk Torpmann-Hagen, Sunniva Maria Stordal Bjørklund, Leon Moonen, Klas Pettersen, Michael A. Riegler
General AI
Many deployments must compare candidate language models for safety before a labeled benchmark exists for the relevant language, sector, or regulatory regime. We formalize this setting as benchmarkless comparative safety scoring and specify the contract under which a scenario-based audit can be interpreted as deployment…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-05-07 · Jai Moondra, Ayela Chughtai, Bhargavi Lanka, Swati Gupta
General AI
Ranking LLMs via pairwise human feedback underpins current leaderboards for open-ended tasks, such as creative writing and problem-solving. We analyze ~89K comparisons in 116 languages from 52 LLMs from Arena, and show that the best-fit global Bradley-Terry (BT) ranking is misleading. Nearly 2/3 of the decisive votes c…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-05-11 · Mohammed M. H. Qazzaz, Syed Ali Zaidi, Aubida A. Al-Hameed, Abdelaziz Salama, Des Mclernon
General AI
This paper introduces a proactive Unmanned Aerial Vehicle (UAV) mobility management xApp for Open Radio Access Network (O-RAN) Near Real-Time Radio Intelligent Controller (Near-RT RIC) environments, employing Double Deep Q-Network (DDQN) reinforcement learning (RL) enhanced with transfer learning to optimise handover d…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-05-22 · Xu Ouyang, Deyi Liu, Yuhang Cai, Jing Liu, Yuan Yang, Chen Zheng, Thomas Hartvigsen, Yiyuan Ma
General AI
Existing scaling laws for Large Language Models (LLMs), predominantly monotonic power laws, fail to explain emerging non-monotonic phenomena such as catastrophic overtraining and quantization-induced degradation, where performance deteriorates despite increased compute. We propose the Shannon Scaling Law, a unified the…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-05-22 · Renhe Sun, Jiayi Zhou, Haolin He, Yueying Feng, Jian Liu
General AI
In this technical report, we describe our submission for the WildSpoof Challenge TTS Track: Text-to-Speech with In-the-Wild Data. We introduce F5-TTS-DPS, a model built upon the F5-TTS architecture. Our approach integrates Exponential Moving Average (EMA) into supervised fine-tuning to stabilize training and improve ge…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-05-22 · Taiming Lu, Zhuang Liu
General AI
Knowledge distillation generally assumes a strong-to-weak relationship where stronger teachers yield better students. In this work, we examine this assumption about distillation in large language model pretraining. By varying architecture sizes and training token budgets, we create strong-to-weak, same-level, and weak-…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-05-22 · Shaoxuan Zhou, Yafei Sun, Jing Zhang, Xianghang Mi
General AI
Short-video platforms like Douyin and Kwai have become central to adolescent digital life, but they also risk exposing teens to algorithmically amplified harmful content. Despite its societal importance, the scale, mechanisms, and real-world impact of this exposure remain poorly understood. Measuring it is challenging:…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-05-28 · Yalun Dai, Yangyu Huang, Tongshen Yang, Yonghan Wang, Xin Zhang, Wenshan Wu, Qihao Zhao, Hao Li, Yuanyuan Gao, Kim-Hui Yap, Scarlett Li
General AI
Large Language Models (LLMs) have revolutionized various fields, yet their training efficiency is heavily reliant on effective data curation. While data selection has been widely studied, the strategic data organization for enhanced training remains an underexplored area, particularly since current LLMs are often train…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-05-28 · Jusuk Lee, Seungjae Lee, Jonghun Shin, Hoseong Jung, Sungha Kim, Daesol Cho, H. Jin Kim, Jia-Bin Huang, Furong Huang
General AI
Robot manipulation critically depends on perception that preserves the action-relevant aspects of a scene. Yet most robot learning pipelines are built upon visual encoders pre-trained for static recognition or vision-language alignment, leaving motion understanding to downstream policies. We introduce DynaFLIP, a dynam…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-05-28 · Junyu Xie, Tengda Han, Weidi Xie, Andrew Zisserman
General AI
Moving Object Segmentation (MOS) aims to discover, segment, and track objects that move independently of the camera. Current MOS methods, however, exhibit two fundamental limitations: they rely on pre-computed 2D auxiliary modalities such as optical flow or point trajectories that lack 3D geometric information, and the…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-05-29 · Elena Yanakieva, Annette Bieniusa, Stefania Dumbrava
General AI
Distributed applications increasingly support local-first collaboration over shared data, allowing multiple users to perform updates concurrently without global coordination. Such collaboration requires careful design to capture the intended semantics of the concurrent interactions. We introduce a declarative framework…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-05-29 · Ulrich Prestel, Stefan Andreas Baumann, Nick Stracke, Björn Ommer
General AI
Self-supervised novel view synthesis (NVS) remains challenging to scale, despite the abundance of video data, largely due to the brittleness of training on realistic videos and the hard-to-predict scaling behavior of multi-network system designs. We introduce RayDer, a unified, feed-forward transformer that consolidate…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-06-06 · Bilal Hussain, Muhammad Bilal, Tan Li, Haris Pervaiz, Xiao Tang, Qinghe Du, Fawad Ahmad, Muhammad Azhar, Jun Zhang
General AI
In sixth-generation (6G) networks, billions of cyber-physical systems (CPSs) - autonomous vehicles, smart grids, industrial robots, and remote-surgical equipment - will run over ultra-reliable low-latency slices, collapsing the gap between a remote breach and physical harm to milliseconds, a budget perimeter firewalls …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-03-25 · Matan Ben-Yosef, Tavi Halperin, Naomi Ken Korem, Mohammad Salama, Harel Cain, Asaf Joseph, Anthony Chen, Urska Jelercic, Ofir Bibi
General AI
Controlling video and audio generation requires diverse modalities, from depth and pose to camera trajectories and audio transformations, yet existing approaches either train a single monolithic model for a fixed set of controls or introduce costly architectural changes for each new modality. We introduce AVControl, a …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-03-25 · Qijia He, Xunmei Liu, Hammaad Memon, Ziang Li, Zixian Ma, Jaemin Cho, Jason Ren, Daniel S Weld, Ranjay Krishna
General AI
Scalable Vector Graphics (SVG) are an essential format for technical illustration and digital design, offering precise resolution independence and flexible semantic editability. In practice, however, original vector source files are frequently lost or inaccessible, leaving only "flat" rasterized versions (e.g., PNG or …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.5
2026-03-29 · Zhongying Deng, Cheng Tang, Ziyan Huang, Jiashi Lin, Ying Chen, Junzhi Ning, Chenglong Ma, Jiyao Liu, Wei Li, Yinghao Zhu, Shujian Gao, Yanyan Huang, Sibo Ju, Yanzhou Su, Pengcheng Chen, Wenhao Tang, Tianbin Li, Haoyu Wang, Yuanfeng Ji, Hui Sun, Shaobo Min, Liang Peng, Feilong Tang, Haochen Xue, Rulin Zhou, Chaoyang Zhang, Wenjie Li, Shaohao Rui, Weijie Ma, Xingyue Zhao, Yibin Wang, Kun Yuan, Zhaohui Lu, Shujun Wang, Jinjie Wei, Lihao Liu, Dingkang Yang, Lin Wang, Yulong Li, Haolin Yang, Yiqing Shen, Lequan Yu, Xiaowei Hu, Yun Gu, Yicheng Wu, Benyou Wang, Minghui Zhang, Angelica I. Aviles-Rivero, Qi Gao, Hongming Shan, Xiaoyu Ren, Fang Yan, Hongyu Zhou, Haodong Duan, Maosong Cao, Shanshan Wang, Bin Fu, Xiaomeng Li, Zhi Hou, Chunfeng Song, Lei Bai, Yuan Cheng, Yuandong Pu, Xiang Li, Wenhai Wang, Hao Chen, Jiaxin Zhuang, Songyang Zhang, Huiguang He, Mengzhang Li, Bohan Zhuang, Zhian Bai, Rongshan Yu, Liansheng Wang, Yukun Zhou, Xiaosong Wang, Xin Guo, Guanbin Li, Xiangru Lin, Dakai Jin, Mianxin Liu, Wenlong Zhang, Qi Qin, Conghui He, Yuqiang Li, Ye Luo, Nanqing Dong, Jie Xu, Wenqi Shao, Bo Zhang, Qiujuan Yan, Yihao Liu, Jun Ma, Zhi Lu, Yuewen Cao, Zongwei Zhou, Jianming Liang, Shixiang Tang, Qi Duan, Dongzhan Zhou, Chen Jiang, Yuyin Zhou, Yanwu Xu, Jiancheng Yang, Shaoting Zhang, Xiaohong Liu, Siqi Luo, Yi Xin, Chaoyu Liu, Haochen Wen, Xin Chen, Alejandro Lozano, Min Woo Sun, Yuhui Zhang, Yue Yao, Xiaoxiao Sun, Serena Yeung-Levy, Xia Li, Jing Ke, Chunhui Zhang, Zongyuan Ge, Ming Hu, Jin Ye, Zhifeng Li, Yirong Chen, Yu Qiao, Junjun He
Research Track A
Foundation models have demonstrated remarkable success across diverse domains and tasks, primarily due to the thrive of large-scale, diverse, and high-quality datasets. However, in the field of medical imaging, the curation and assembling of such medical datasets are highly challenging due to the reliance on clinical e…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-03-30 · Zhangqi Jiang, Zheng Sun, Xianfang Zeng, Yufeng Yang, Xuanyang Zhang, Yongliang Wu, Wei Cheng, Gang Yu, Xu Yang, Bihan Wen
General AI
Recent advances in image editing have enabled models to handle complex instructions with impressive realism. However, existing evaluation frameworks lag behind: current benchmarks suffer from narrow task coverage, while standard metrics fail to adequately capture visual consistency, i.e., the preservation of identity, …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-04-02 · Aleksandar Cvejic, Rameen Abdal, Abdelrahman Eldesokey, Bernard Ghanem, Peter Wonka
General AI
When evaluating identity-focused tasks such as personalized generation and image editing, existing vision encoders entangle object identity with background context, leading to unreliable representations and metrics. We introduce the first principled framework to address this vulnerability using Near-identity (NearID) d…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-04-05 · Hui Sun, Yun-Ji Zhang, Zheng Xie, Ren-Biao Liu, Yali Du, Xin-Ye Li, Ming Li
General AI
Selecting LLM-generated code candidates using LLM-generated tests is challenging because the tests themselves may be incorrect. Existing methods either treat all tests equally or rely on ad-hoc heuristics to filter unreliable tests. Yet determining test correctness requires knowing which codes are correct, creating a c…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-04-06 · Shwai He, Guoheng Sun, Haichao Zhang, Yun Fu, Ang Li
General AI
Network pruning, which removes less important parameters or architectures, is often expected to improve efficiency while preserving performance. However, this expectation does not consistently hold across language tasks: pruned models can perform well on non-generative tasks but frequently fail in generative settings. …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-04-06 · Yicheng Xiao, Wenhu Zhang, Lin Song, Yukang Chen, Wenbo Li, Nan Jiang, Tianhe Ren, Haokun Lin, Wei Huang, Haoyang Huang, Xiu Li, Nan Duan, Xiaojuan Qi
General AI
Image spatial editing performs geometry-driven transformations, allowing precise control over object layout and camera viewpoints. Current models are insufficient for fine-grained spatial manipulations, motivating a dedicated assessment suite. Our contributions are listed: (i) We introduce SpatialEdit-Bench, a complete…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-04-13 · Md Tanvirul Alam
General AI
Large vision-language models (VLMs) often rely on familiar semantic priors, but existing evaluations do not cleanly separate perception failures from rule-mapping failures. We study this behavior as semantic fixation: preserving a default interpretation even when the prompt specifies an alternative, equally valid mappi…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-04-13 · Shanchuan Lin, Ceyuan Yang, Zhijie Lin, Hao Chen, Haoqi Fan
General AI
We propose continuous adversarial flow models, a type of continuous-time flow model trained with an adversarial objective. Unlike flow matching, which uses a fixed mean-squared-error criterion, our approach introduces a learned discriminator to guide training. This change in objective induces a different generalized di…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-04-13 · Dujun Nie, Fengjiao Chen, Qi Lv, Jun Kuang, Xiaoyu Li, Xuezhi Cao, Xunliang Cai
General AI
While the shortage of explicit action data limits Vision-Language-Action (VLA) models, human action videos offer a scalable yet unlabeled data source. A critical challenge in utilizing large-scale human video datasets lies in transforming visual signals into ontology-independent representations, known as latent actions…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-04-13 · Seongyu Kim, Seungwoo Lee, Hyeonggon Ryu, Joon Son Chung, Arda Senocak
General AI
We address the problem of tactile localization, where the goal is to identify image regions that share the same material properties as a tactile input. Existing visuo-tactile methods rely on global alignment and thus fail to capture the fine-grained local correspondences required for this task. The challenge is amplifi…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-04-13 · Bingyi Cao, Koert Chen, Kevis-Kokitsi Maninis, Kaifeng Chen, Arjun Karpur, Ye Xia, Sahil Dua, Tanmaya Dabral, Guangxing Han, Bohyung Han, Joshua Ainslie, Alex Bewley, Mithun Jacob, René Wagner, Washington Ramos, Krzysztof Choromanski, Mojtaba Seyedhosseini, Howard Zhou, André Araujo
General AI
Recent progress in vision-language pretraining has enabled significant improvements to many downstream computer vision applications, such as classification, retrieval, segmentation and depth prediction. However, a fundamental capability that these models still struggle with is aligning dense patch representations with …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-04-14 · Deyuan Liu, Peng Sun, Yansen Han, Zhenglin Cheng, Chuyan Chen, Tao Lin
General AI
The push for efficient text to image synthesis has moved the field toward one step sampling, yet existing methods still face a three way tradeoff among fidelity, inference speed, and training efficiency. Approaches that rely on external discriminators can sharpen one step performance, but they often introduce training …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-04-17 · Heewon Oh
General AI
We present ArtifactNet, a lightweight framework that detects AI-generated music by reframing the problem as forensic physics -- extracting and analyzing the physical artifacts that neural audio codecs inevitably imprint on generated audio. A bounded-mask UNet (ArtifactUNet, 3.6M parameters) extracts codec residuals fro…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-04-17 · Tristan Kirscher, Alexandra Ertl, Klaus Maier-Hein, Xavier Coubez, Philippe Meyer, Sylvain Faisan
General AI
Pancreatic ductal adenocarcinoma (PDAC) segmentation on contrast-enhanced CT is inherently ambiguous: inter-rater disagreement among experts reflects genuine uncertainty rather than annotation noise. Standard deep learning approaches assume a single ground truth, producing probabilistic outputs that can be poorly calib…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-04-18 · Gabriel Jason Lee, Jathurshan Pradeepkumar, Jimeng Sun
General AI
Electroencephalography (EEG) foundation models have shown strong potential for learning generalizable representations from large-scale neural data, yet their clinical deployment is hindered by distribution shifts across clinical settings, devices, and populations. Test-time adaptation (TTA) offers a promising solution …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-04-20 · Difan Jiao, Yilun Liu, Ye Yuan, Zhenwei Tang, Linfeng Du, Haolun Wu, Ashton Anderson
General AI
Guard models are widely used to detect harmful content in user prompts and LLM responses. However, state-of-the-art guard models rely solely on terminal-layer representations and overlook the rich safety-relevant features distributed across internal layers. We present SIREN, a lightweight guard model that harnesses the…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-04-21 · Guray Ozgur, Tahar Chettaoui, Eduarda Caldeira, Jan Niklas Kolf, Marco Huber, Andrea Atzori, Naser Damer, Fadi Boutros
General AI
Face Image Quality Assessment (FIQA) aims to assess the recognition utility of face samples and is essential for reliable face recognition (FR) systems. Existing approaches require computationally expensive procedures such as multiple forward passes, backpropagation, or additional training, and only recent work has foc…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-04-21 · Guray Ozgur, Tahar Chettaoui, Eduarda Caldeira, Jan Niklas Kolf, Andrea Atzori, Fadi Boutros, Naser Damer
General AI
Face Image Quality Assessment is crucial for reliable face recognition systems, yet existing Vision Transformer-based approaches rely exclusively on final-layer representations, ignoring quality-relevant information captured at intermediate network depths. This paper presents the first comprehensive investigation of ho…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-04-22 · Adriana Aida, Walida Amer, Katarina Bankovic, Dhruv Behl, Fabian Busch, Annie Bhalla, Minh Duong, Florian Gienger, Rohan Godse, Denis Grachev, Ralf Gulde, Elisa Hagensieker, Junpeng Hu, Shivam Joshi, Tobias Knoblauch, Likith Kumar, Damien LaRocque, Keerthana Lokesh, Omar Moured, Khiem Nguyen, Christian Preyss, Ranjith Sriganesan, Vikram Singh, Carsten Sponner, Anh Tong, Dominik Tuscher, Marc Tuscher, Pavan Upputuri
General AI
Industrial robotic manipulation demands reliable long-horizon execution across embodiments, tasks, and changing object distributions. While Vision-Language-Action models have demonstrated strong generalization, they remain fundamentally reactive. By optimizing the next action given the current observation without evalu…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-04-23 · Xiaojie Xu, Zhengyuan Lin, Kang He, Yukang Feng, Xiaofeng Mao, Yuanyang Yin, Kaipeng Zhang, Yongtao Ge
General AI
Interactive video generation models such as Genie, YUME, HY-World, and Matrix-Game are advancing rapidly, yet every model is evaluated on its own benchmark with private scenes and trajectories, making fair cross-model comparison impossible. Existing public benchmarks offer useful metrics such as trajectory error, aesth…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-04-27 · Zhongjie Duan, Hong Zhang, Yingda Chen
General AI
Controllable diffusion methods have substantially expanded the practical utility of diffusion models, but they are typically developed as isolated, backbone-specific systems with incompatible training pipelines, parameter formats, and runtime hooks. This fragmentation makes it difficult to reuse infrastructure across t…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-04-27 · Emaan Bilal Khan, Amy Winecoff, Miranda Bogen, Dylan Hadfield-Menell
General AI
Foundation models are routinely fine-tuned for use in particular domains, yet safety assessments are typically conducted only on base models, implicitly assuming that safety properties persist through downstream adaptation. We test this assumption by analyzing the safety behavior of 100 models, including widely deploye…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-04-29 · Ming Li, Jie Wu, Justin Cui, Xiaojie Li, Rui Wang, Chen Chen
General AI
While preference optimization is crucial for improving visual generative models, how to effectively scale this paradigm remains largely unexplored. Current open-source preference datasets contain conflicting preference patterns, where winners excel in some dimensions but underperform in others. Naively optimizing on su…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-03-23 · Ulugbek Shernazarov, Rostislav Svitsov, Bin Shi
General AI
Fine-tuning large language models for domain-specific tasks such as medical text summarization demands substantial computational resources. Parameter-efficient fine-tuning (PEFT) methods offer promising alternatives by updating only a small fraction of parameters. This paper compares three adaptation approaches-Low-Ran…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 5.3
2026-03-23 · Seunghan Lee, Jun Seo, Jaehoon Lee, Sungdong Yoo, Minjae Kim, Tae Yoon Lim, Dongwan Kang, Hwanil Choi, SoonYoung Lee, Wonbin Ahn
General AI
Recent advances in multimodal learning have motivated the integration of auxiliary modalities such as text or vision into time series (TS) forecasting. However, most existing methods provide limited gains, often improving performance only in specific datasets or relying on architecture-specific designs that limit gener…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-03-26 · Giulio Pisaneschi, Pierpaolo Serio, Estelle Gerbier, Andrea Dan Ryals, Lorenzo Pollini, Mario G. C. A. Cimino
General AI
This paper presents an experimental platform for studying intentional-state attribution toward a non-humanoid robot. The system combines a simulated robot, realistic task environments, and large language model-based explanatory layers that can express the same behavior in mentalistic, teleological, or mechanistic terms…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-03-26 · Cole Walsh, Rodica Ivan
General AI
Automated systems have been widely adopted across the educational testing industry for open-response assessment and essay scoring. These systems commonly achieve performance levels comparable to or superior than trained human raters, but have frequently been demonstrated to be vulnerable to the influence of construct-i…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-03-30 · Ashwini Dasare, Nirmesh Shah, Ashishkumar Gudmalwar, Pankaj Wasnik
General AI
Evaluating AI generated dubbed content is inherently multi-dimensional, shaped by synchronization, intelligibility, speaker consistency, emotional alignment, and semantic context. Human Mean Opinion Scores (MOS) remain the gold standard but are costly and impractical at scale. We present a hierarchical multimodal archi…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-03-30 · Kun Tang, Xinquan Yang, Mianjie Zheng, Xuefen Liu, Xuguang Li, Xiaoqi Guo, Ruihan Chen, Linlin Shen, He Meng
General AI
The scarcity and high cost of expert annotations in dental imaging present a significant challenge for the development of AI in dentistry. DINOv3, a state-of-the-art, self-supervised vision foundation model pre-trained on 1.7 billion images, offers a promising pathway to mitigate this issue. However, its reliability wh…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-03-30 · Sujith Pulikodan, Abhayjeet Singh, Agneedh Basu, Lokesh Rady, Nihar Desai, Pavan Kumar J, Prajjwal Srivastav, Pranav D Bhat, Raghu Dharmaraju, Ritika Gupta, Sathvik Udupa, Saurabh Kumar, Sumit Sharma, Vaibhav Vishwakarma, Visruth Sanka, Dinesh Tewari, Harsh Dhand, Amrita Kamat, Sukhwinder Singh, Shikhar Vashishth, Partha Talukdar, Raj Acharya, Prasanta Kumar Ghosh
General AI
Project VAANI is an initiative to create an India-representative multi-modal dataset that comprehensively maps India's linguistic diversity, starting with 165 districts across the country in its first two phases. Speech data is collected through a carefully structured process that uses image-based prompts to encourage …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-03-31 · Wenyi Li, Renkai Luo, Yue Yu, Huan-ang Gao, Mingju Gao, Li Yuan, Chaoyou Fu, Hao Zhao
General AI
AI-assisted coding has rapidly reshaped software practice and research workflows, yet today's models still struggle to produce correct code for complex 3D geometric vision. If models could reliably write such code, the research of our community would change substantially. To measure progress toward that goal, we introd…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-03-31 · Mohammadhossein Khojasteh, Yifan Jiang, Stefano De Giorgis, Frank van Harmelen, Filip Ilievski
General AI
Analogical reasoning is a key driver of human generalization in problem-solving and argumentation. Yet, analogies between narrative structures remain challenging for machines. Cognitive engines for structural mapping are not directly applicable, as they assume pre-extracted entities, whereas LLMs' performance is sensit…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-01 · Amin Bigdeli, Mert Incesu, Negar Arabzadeh, Charles L. A. Clarke, Ebrahim Bagheri
General AI
We present ReFormeR, a pattern-guided approach for query reformulation. Instead of prompting a language model to generate reformulations of a query directly, ReFormeR first elicits short reformulation patterns from pairs of initial queries and empirically stronger reformulations, consolidates them into a compact librar…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-01 · Kawtar Zaher, Olivier Buisson, Alexis Joly
General AI
Building on existing approaches, we revisit Human-in-the-Loop Object Retrieval, a task that consists of iteratively retrieving images containing objects of a class-of-interest, specified by a user-provided query. Starting from a large unlabeled image collection, the aim is to rapidly identify diverse instances of an ob…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-02 · Feiyu Zhou, Marios Impraimakis
General AI
The wind-induced structural response forecasting capabilities of a novel transformer methodology are examined here. The model also provides a digital twin component for bridge structural health monitoring. Firstly, the approach uses the temporal characteristics of the system to train a forecasting model. Secondly, the …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-06 · David Nordström, Johan Edstedt, Georg Bökman, Jonathan Astermark, Anders Heyden, Viktor Larsson, Mårten Wadenbäck, Michael Felsberg, Fredrik Kahl
General AI
Local feature matching has long been a fundamental component of 3D vision systems such as Structure-from-Motion (SfM), yet progress has lagged behind the rapid advances of modern data-driven approaches. The newer approaches, such as feed-forward reconstruction models, have benefited extensively from scaling dataset siz…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-07 · Yulin Zou, Yan Chen, Wenyan Chen, JooYoung Park, Shivaraman Nitin, Luo Tao, Francisco Romero, Dmitrii Ustiugov
General AI
Video streaming analytics is a crucial workload for vision-language model serving, but the high cost of multimodal inference limits scalability. Prior systems reduce inference cost by exploiting temporal and spatial redundancy in video streams, but they target either the vision transformer (ViT) or the LLM with a limit…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-07 · Junbin Zhang, Meng Cao, Feng Tan, Yikai Lin, Yuexian Zou
General AI
Achieving fine-grained and structurally sound controllability is a cornerstone of advanced visual generation. Existing part-based frameworks treat user-provided parts as an unordered set and therefore ignore their intrinsic spatial and semantic relationships, which often results in compositions that lack structural int…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-07 · Lin Mu, Haiyang Wang, Li Ni, Lei Sang, Zhize Wu, Peiquan Jin, Yiwen Zhang
General AI
Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning of Large Language Models (LLMs), and recent Mixture-of-Experts (MoE) extensions further enhance flexibility by dynamically combining multiple LoRA experts. However, existing MoE-augmented LoRA methods assume that experts operate independently, often lea…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-09 · Yunsong Zhou, Hangxu Liu, Xuekun Jiang, Xing Shen, Yuanzhen Zhou, Hui Wang, Baole Fang, Yang Tian, Mulin Yu, Qiaojun Yu, Li Ma, Hengjie Li, Hanqing Wang, Jia Zeng, Jiangmiao Pang
General AI
Robotic manipulation with deformable objects represents a data-intensive regime in embodied learning, where shape, contact, and topology co-evolve in ways that far exceed the variability of rigids. Although simulation promises relief from the cost of real-world data acquisition, prevailing sim-to-real pipelines remain …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-13 · Qin Liu
General AI
Existing LLM agent frameworks lack formal semantics: there is no principled way to determine whether an agent configuration is well-formed or will terminate. We present $λ_A$, a typed lambda calculus for agent composition that extends the simply-typed lambda calculus with oracle calls, bounded fixpoints (the ReAct loop…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-13 · David Nordström, Johan Edstedt, Fredrik Kahl, Georg Bökman
General AI
Finding matching keypoints between images is a core problem in 3D computer vision. However, modern matchers struggle with large in-plane rotations. A straightforward mitigation is to learn rotation invariance via data augmentation. However, it remains unclear at which stage rotation invariance should be incorporated. I…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-14 · Yinghao Qin, Mosab Bazargani, Edmund K. Burke, Carlos A. Coello Coello, Zhongmin Song, Jun Chen
General AI
This paper tackles the Electric Capacitated Vehicle Routing Problem (E-CVRP) through a bilevel optimization framework that handles routing and charging decisions separately or jointly depending on the search stage. By analyzing their interaction, we introduce a surrogate objective at the upper level to guide the search…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-14 · Junbin Su, Ziteng Xue, Shihui Zhang, Kun Chen, Weiming Hu, Zhipeng Zhang
General AI
Parameter-efficient fine-tuning (PEFT) in multimodal tracking reveals a concerning trend where recent performance gains are often achieved at the cost of inflated parameter budgets, which fundamentally erodes PEFT's efficiency promise. In this work, we introduce SEATrack, a Simple, Efficient, and Adaptive two-stream mu…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-16 · Yunfu Deng, Yuhao Li, Josiah P. Hanna
General AI
In recent years, reinforcement learning (RL) has shown remarkable success in robotics when a fast and accurate simulator is available for a given task. When using RL and simulation, more simulator realism is generally beneficial but becomes harder to obtain as robots are deployed in increasingly complex and widescale d…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-17 · Eric Gan, Aryan Bhatt, Buck Shlegeris, Julian Stastny, Vivek Hebbar
General AI
As AI systems are increasingly used to conduct research autonomously, misaligned systems could introduce subtle flaws that produce misleading results while evading detection. We introduce ASMR-Bench (Auditing for Sabotage in ML Research), a benchmark for evaluating the ability of auditors to detect sabotage in ML resea…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-17 · Ming-Bin Chen, Jey Han Lau, Lea Frermann
General AI
Measuring the quality of public deliberation requires evaluating not only civility or argument structure, but also the informational progress of a conversation. We introduce a framework for Conversational Information Gain (CIG) that evaluates each utterance in terms of how it advances collective understanding of the ta…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-17 · Matthew Frazier, Kostadin Damevski, Lori Pollock
General AI
Secondary school students enrolled in the AP Computer Science Principles (CSP) course commonly utilize web resources (e.g., tutorials, Q\&A sites) to better understand key concepts in the curriculum. The primary obstacle to using these resources is finding information appropriate for the learning task and student's bac…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-20 · Savya Khosla, Sethuraman T, Aryan Chadha, Alex Schwing, Derek Hoiem
General AI
Despite recent progress, vision-language encoders struggle with two core limitations: (1) weak alignment between language and dense vision features, which hurts tasks like open-vocabulary semantic segmentation; and (2) high token counts for fine-grained visual representations, which limits scalability to long videos. T…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-21 · Sarah Lykke Tost, Adson Lucas de Paiva Sales, Henrik Østergaard, Vaishali Dhanoa, Gabriela Molina León
General AI
We designed and implemented InvestChat, a multimodal tablet-based application that supports stock market exploration with multiple coordinated views and an LLM-powered chat. We evaluated the application with 12 novice investors. Our findings suggest that combining natural language, touch, and pen input during stock mar…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-21 · Mengting Chen, Zhengrui Chen, Yongchao Du, Zuan Gao, Taihang Hu, Jinsong Lan, Chao Lin, Yefeng Shen, Xingjian Wang, Zhao Wang, Zhengtao Wu, Xiaoli Xu, Zhengze Xu, Hao Yan, Mingzhou Zhang, Jun Zheng, Qinye Zhou, Xiaoyong Zhu, Bo Zheng
General AI
Recent advances in image generation and editing have opened new opportunities for virtual try-on. However, existing methods still struggle to meet complex real-world demands. We present Tstars-Tryon 1.0, a commercial-scale virtual try-on system that is robust, realistic, versatile, and highly efficient. First, our syst…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-21 · Jean Mercat, Sedrick Keh, Kushal Arora, Isabella Huang, Paarth Shah, Haruki Nishimura, Shun Iwase, Katherine Liu
General AI
We present VLA Foundry, an open-source framework that unifies LLM, VLM, and VLA training in a single codebase. Most open-source VLA efforts specialize on the action training stage, often stitching together incompatible pretraining pipelines. VLA Foundry instead provides a shared training stack with end-to-end control, …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-22 · Sina Gholami, Abdulmoneam Ali, Tania Haghighi, Ahmed Arafa, Minhaj Nur Alam
General AI
Federated learning (FL) enables collaborative model training without sharing raw data; however, the presence of noisy labels across distributed clients can severely degrade the learning performance. In this paper, we propose FedSIR, a multi-stage framework for robust FL under noisy labels. Different from existing appro…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-23 · Zihan Wang, Rui Zhang, Yu Liu, Chi Liu, Qingchuan Zhao, Hongwei Li, Guowen Xu
General AI
LLM agents increasingly rely on skills to encapsulate reusable capabilities via progressively disclosed instructions. High-quality skills inject expert knowledge into general-purpose models, improving performance on specialized tasks. This quality and ease of dissemination drive the emergence of a skill economy: free s…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-23 · Zhiqiu Xu, Shibo Jin, Shreya Arya, Mayur Naik
General AI
As frontier language models attain near-ceiling performance on static mathematical benchmarks, existing evaluations are increasingly unable to differentiate model capabilities, largely because they cast models solely as solvers of fixed problem sets. We introduce MathDuels, a self-play benchmark in which models occupy …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-24 · Ashley J. Chen, Yijia Cao, Minghao Shao, Ramesh Karri, Muhammad Shafique
General AI
The emergence of large language models has enabled vibe coding, a natural language approach to programming in which users describe intent and AI generates or revises code, potentially broadening access to programming while preserving meaningful learning outcomes. We investigate its educational value through a month-lon…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-24 · Suvam Basak, Amitangshu Pal, Debopam Bhattacherjee
General AI
The May 2024 solar superstorm highlighted the vulnerability of rapidly expanding low Earth orbit (LEO) satellite networks to severe space weather events. To systematically evaluate LEO network resilience, we introduce an open-source tool, CosmicDancePro. It enables a comprehensive analysis of the effects of solar storm…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-26 · Sophie Chiang, Tom Brennan, Fethiye Irmak Dogan, Jiaee Cheong, Hatice Gunes
General AI
In recent years, the integration of multimodal machine learning in wellbeing assessment has offered transformative potential for monitoring mental health. However, with the rapid advancement of Vision-Language Models (VLMs), their deployment in clinical settings has raised concerns due to their lack of transparency and…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-27 · Fiza Naseer, Javed Ali Khan, Muhammad Yaqoob, Alexios Mylonas, Ishaya Gambo
General AI
Context: Software vulnerabilities pose significant security threats to software systems, especially as software is increasingly used across many areas of daily life, including health, government, and finance. Recently, transformer-based models have demonstrated promising results in automatic software vulnerability iden…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-27 · Zhihan Zhang, Lizi Liao
General AI
Chart-to-code generation converts a chart image into an executable plotting script, enabling faithful reproduction and editable visualizations. Existing methods are largely Python-centric, limiting practical use and overlooking a critical source of supervision: the same chart can be expressed by semantically equivalent…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-27 · Siavash Golkar, Jake Kovalic, Irina Espejo Morales, Samuel Sledzieski, Minhuan Li, Ksenia Sokolova, Geraud Krawezik, Alberto Bietti, Claudia Skok Gibbs, Roman Klypa, Shengwei Xiong, Francois Lanusse, Liam Parker, Kyunghyun Cho, Miles Cranmer, Tom Hehir, Michael McCabe, Lucas Meyer, Rudy Morel, Payel Mukhopadhyay, Mariel Pettee, Helen Qu, Jeff Shen, David Fouhey, Hadi Sotoudeh, Vikram Mulligan, Pilar Cossio, Sonya M. Hanson, Alisha N. Jones, Olga G. Troyanskaya, Shirley Ho
General AI
Biological function emerges from coupled constraints across sequence, structure, regulation, evolution, and cellular context, yet most foundation models in biology are trained within one modality or for a fixed forward task. We present MIMIC, a generative multimodal foundation model trained on our newly curated and ali…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-27 · Boyang Wang, Guangyi Xu, Zhipeng Tang, Jiahui Zhang, Zezhou Cheng
General AI
Shot Boundary Detection (SBD) aims to automatically identify shot changes and divide a video into coherent shots. While SBD was widely studied in the literature, existing state-of-the-art methods often produce non-interpretable boundaries on transitions, miss subtle yet harmful discontinuities, and rely on noisy, low-d…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-27 · Weijie Wang, Xiaoxuan He, Youping Gu, Yifan Yang, Zeyu Zhang, Yefei He, Yanbo Ding, Xirui Hu, Donny Y. Chen, Zhiyuan He, Yuqing Yang, Bohan Zhuang
General AI
Recent video foundation models demonstrate impressive visual synthesis but frequently suffer from geometric inconsistencies. While existing methods attempt to inject 3D priors via architectural modifications, they often incur high computational costs and limit scalability. We propose World-R1, a framework that aligns v…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-29 · Ezel Üsten, Anna Sieben, Mohcine Chraibi, Armin Seyfried
General AI
In pedestrian dynamics, the internal drive that propels individuals toward their goals is typically captured by a single, fixed parameter, the desired walking speed. This simplification overlooks that motivation fluctuates in response to changing spatial and social conditions within a crowd. This paper proposes a dynam…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-29 · Yeheng Chen, Chaoxiang Xie, Yuling Shi, Wenhao Zeng, Yongpan Wang, Hongyu Zhang, Xiaodong Gu
General AI
LLMs have achieved strong results on both function-level code synthesis and repository-level code modification, yet a capability that falls between these two extremes -- compositional code creation, i.e., building a complete, internally structured class from a specification -- remains underserved. Current evaluations a…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-29 · Md Biplob Hosen, Md Alomgeer Hussein, Md Akmol Masud, Omar Faruque, Tera L Reynolds, Lujie Karen Chen
General AI
Patient portals now give individuals direct access to their electronic health records (EHRs), yet access alone does not ensure patients understand or act on the complex clinical information contained in these records. The ArchEHR-QA 2026 shared task addresses this challenge by focusing on grounded question answering ov…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-29 · Dimitris Dimakopoulos, Shay B. Cohen, Ioannis Konstas
General AI
Large language models (LLMs) acquire most of their factual knowledge during the pre-training stage, through next token prediction. Subsequent stages of post-training often introduce new facts outwith the parametric knowledge, giving rise to hallucinations. While it has been demonstrated that supervised fine-tuning (SFT…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-30 · Himanshu Pandey, Ratikanta Behera
General AI
In recent years, physics-informed neural networks (PINNs) have gained significant attention for solving differential equations, although they suffer from two fundamental limitations, namely, spectral bias inherent in neural networks and loss imbalance arising from multiscale phenomena. This paper proposes an adaptive w…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-04-30 · Vinayak Gupta, Chih-Hao Lin, Shenlong Wang, Anand Bhattad, Jia-Bin Huang
General AI
Reconstructing 3D scenes from sparse, unposed images remains challenging under real-world conditions with varying illumination and transient occlusions. Existing methods rely on scene-specific optimization using appearance embeddings or dynamic masks, which requires extensive per-scene training and fails under sparse v…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-05-01 · Zihao Ding, Beining Wu, Jun Huang
General AI
Federated Multimodal Learning (FML) trains multimodal models across decentralized clients while keeping their image-text pairs private. However, joint embedding training entangles forgotten knowledge across both modalities and client gradient subspaces, hindering federated unlearning. Previous federated unlearning appr…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-05-01 · Xinyuan Zhao, Yihang Wu, Ahmad Chaddad, Sarah A. Alkhodair, Reem Kateb
General AI
Gaze estimation methods commonly use facial appearances to predict the direction of a person gaze. However, previous studies show three major challenges with convolutional neural network (CNN)-based, transformer-based, and contrastive language-image pre-training (CLIP)-based methods, including late fusion of image feat…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-05-01 · Shradha Sharma, Swapnil Dhamal, Shweta Jain
General AI
We propose a new framework for meritocratic fairness in budgeted combinatorial multi-armed bandits with full-bandit feedback (BCMAB-FBF). Unlike semi-bandit feedback, the contribution of individual arms is not received in full-bandit feedback, making the setting significantly more challenging. To compute arm contributi…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-06-04 · Qintong Xie, Edward Koh, Xavier Cadet, Peter Chin
General AI
Many real-world competitive systems require multiple decision-makers to act simultaneously under shared constraints, limited information, and repeated interaction, as in auctions, resource allocation, and security competition. We study multi-turn simultaneous bidding as a controlled testbed for such problems and propos…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-06-04 · Ankur Garg, Michael Stettler, Aaron Schein, Julius von Kügelgen
General AI
Causal representation learning aims to infer the high-level latent causal concepts that give rise to observed low-level measurements. This is particularly relevant for heterogeneous data from different environments or domains since distribution shifts often arise through sparse, localized changes in some of the underly…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-06-04 · Jui-Hui Chung, Ziyang Cai, Zihao Li, Qishuo Yin, Rohit Agarwal, Simon Park, Rodrigo Porto, Narutatsu Ri, Ziran Yang, Shange Tang, Xingyu Dang, Hongzhou Lin, Mengdi Wang, Danqi Chen, Chi Jin, Liam H Fowl, Sanjeev Arora
General AI
We introduce Goedel-Architect, an agentic framework for formal theorem proving in Lean 4 centered on blueprint generation and refinement. A blueprint is a dependency graph of definitions and lemmas that builds up to the main theorem. First, Goedel-Architect generates a blueprint of formally stated definitions and lemma…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-06-04 · Akarsh Kumar, Phillip Isola
General AI
Training recurrent neural networks (RNNs) requires assigning credit across long sequences of computations. Standard backpropagation through time (BPTT) addresses this problem poorly: it is sequential in time, limiting parallelism, and suffers from vanishing or exploding gradients, making long-range associations difficu…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-06-04 · Thamilvendhan Munirathinam
General AI
As autonomous LLM agents increasingly hold real credentials and operate infrastructure without a human in the loop, operators have no standard way to tell an agent that a resource is off-limits. Access controls either let the agent in (it has valid credentials) or hard-fail it (indistinguishable from any other client).…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-06-08 · Eder del Blanco, David Gimeno-Gómez, Eva Navas, Carlos-D. Martínez-Hinarejos, Inma Hernáez
General AI
Speech restoration through silent speech interfaces (SSIs) has emerged as a promising assistive technology for individuals with impaired or absent laryngeal voice production. Among non-invasive SSI modalities, surface electromyography (sEMG) and video-based lipreading provide complementary articulatory information, yet…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-06-08 · Badr AlKhamissi, Johannes Mehrer, Lara Marinov, Ahmed Abdelaal, Abdulkadir Gokce, Martin Schrimpf
General AI
Nearby neurons in cortex share similar response profiles, producing systematic spatial organization across sensory and cognitive systems. Recent topographic models reproduce aspects of this structure but remain unimodal and spatially constrain each layer separately, yielding fragmented maps that capture neither the con…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-06-08 · Yanming Shao, Zanxin Chen, Wenwei Lin, Mingjie Zhou, Tianxing Chen, Xiaokang Yang, Yichen Chi, Yao Mu
General AI
Human hand-object interactions encode functional intent, but direct transfer to robotic hands often fails under morphology, contact, and reachability constraints. We present SynManDex, a synthetic pipeline that uses generated human pre-grasps as affordance-aware proposals and resolves the final contacts with robot-nati…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-06-09 · Hangfeng Liang, Yutao Hu, Yanhan Hu, Xiaohan Wu, Wenqi Shao, Ying Fu
General AI
Low-light video enhancement (LLVE) remains a challenging task due to severe information degradation under low-illumination conditions. Recent multimodal approaches have significantly improved enhancement performance by incorporating auxiliary modalities, such as event streams and infrared images. However, these methods…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-06-09 · Wu Yuerong, Mingni Luo
General AI
Financial named-entity recognition (NER) is essential for translating unstructured financial reports and news into structured knowledge graphs. However, general-purpose large language models (LLMs) often misclassify financial entities or ignore domain-specific patterns. This paper investigates the use of DeepSeek-R1-8B…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-06-09 · Yujie Zang, Yuhang Zheng, Xian Nie, Yupeng Zheng, Shuai Tian, Songen Gu, Chen Gao, Zining Wang, Shuicheng Yan, Wenchao Ding
General AI
Contact-rich manipulation requires robots to continuously perceive and regulate evolving physical interactions under dynamic contact transitions or complex surface geometries. Recent imitation learning methods improve contact-aware control by incorporating tactile or force feedback, but they rarely model the asymmetric…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-06-15 · Junghun Oh, Sungyong Baik, Kyoung Mu Lee
General AI
Low-Rank Adaptation (LoRA) enables efficient adaptation of large pre-trained models to downstream tasks by parameterizing weight updates with low-rank matrices. In this paper, we investigate the limitations of the LoRA parameterization from a geometric perspective. Specifically, we show that when a full fine-tuning gra…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-06-15 · Mingyang Li, Yurou Liu, Jieping Ye, Bing Su, Ji-Rong Wen, Zheng Wang
General AI
In this report, we present LOGOS (Language Of Generative Objects in Science), a scientific generative language model that unifies heterogeneous tasks across the natural sciences within a single autoregressive framework based on a shared scientific grammar. It encodes diverse scientific objects and their spatial interac…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-06-16 · Joy Bose
General AI
We introduce Darshana Graph, a corpus of over 125,000 text records spanning classical Hindu, Buddhist, and Jain philosophical traditions, drawn from public-domain and openly licensed translations of sources including the Bhagavad Gita, Brahma Sutras, principal Upanishads, the Pali Canon, and core Jain texts. Its distin…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-06-16 · Nils Morbitzer, Jonathan Evers, Artem Savkin, Thomas Stauner, Nassir Navab, Federico Tombari, Stefano Gasperini
General AI
Forecasting the evolution of dynamic environments is crucial for autonomous agents. While generative world models have recently achieved high photorealism in 2D video synthesis by mixing ego-motion and environmental dynamics within the image plane, they exhibit physical inconsistencies, such as morphing or vanishing ob…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-06-16 · Henry Bodwell, Hong Yang, John C. Simeone, Kelvin Gorospe, Bella Sullivan, Lana Huang, Jessica Gephart, Sandy Aylesworth, Molly Masterton, Naren Ramakrishnan
General AI
Illegal, unreported, and unregulated fishing (IUU) traditionally refers to fishing activities that violate applicable laws or occur in areas that lack applicable laws. We propose the term IUU+ to capture a broader suite of fisheries sector environmental and associated supply chain trade-related crimes and behaviors. Al…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-06-16 · Jiye Lee, Yonghun Choi, Jungdam Won
General AI
Collaborative human-object interaction shows dynamic and complex movements that require mutual anticipation and continuous adjustment between participants and the shared object. Modeling such collaborative multi-human object interaction (MHOI) scenarios requires high-quality data acquisition as a foundational step; how…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-06-17 · Christopher B. Womack, Shahine Bouabid, Andrei Sokolov, Popat Salunke, Glenn Flierl, Sebastian D. Eastham, Noelle E. Selin
General AI
As deep learning for physical systems continues to grow in popularity, efforts to improve generalizability have primarily focused on designing architectures that embed physical constraints. However, for machine-learning surrogate climate models (emulators), we show that the low structural diversity in existing scenario…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.3
2026-06-17 · Jisoo Kim, Sangwon Baik, Taeksoo Kim, Sungjoo Kim, Junyoung Lee, Mingi Choi, Hanbyul Joo
General AI
We present a zero-shot framework for long-horizon dexterous manipulation that grounds language instructions into executable 3D task plans from calibrated multi-view RGB images. Rather than training an end-to-end policy, our system uses a vision-language model (VLM) to produce reference-frame task grounding and primitiv…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.0
2026-05-12 · Bo Yin, Qi Li, Xinchao Wang
General AI
Tool-using LLM agents fail through trajectories rather than only final responses, as they may execute unsafe tool calls, follow injected instructions, comply with harmful requests, or over-refuse benign tasks despite producing a seemingly safe answer. Existing safety-alignment signals are largely response-level or off-…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.8
2026-05-12 · Aleksandr Bredikhin, Philippe Lalanda, German Vega
General AI
Human Activity Recognition (HAR) is a core task in pervasive computing systems, where models must operate under strict computational constraints while remaining robust to heterogeneous and evolving deployment conditions. Recent advances based on Transformer architectures have significantly improved recognition performa…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.8
2026-05-12 · Kexuan Shi, Hanxuan Li, Zeju Qiu, Yandong Wen, Simon Buchholz, Weiyang Liu
General AI
We introduce Pion, a spectrum-preserving optimizer for large language model (LLM) training based on orthogonal equivalence transformation. Unlike additive optimizers such as Adam and Muon, Pion updates each weight matrix through left and right orthogonal transformations, preserving its singular values throughout traini…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.8
2026-05-12 · Yo Ehara
General AI
Automatic generation of educational materials using large language models (LLMs) is becoming increasingly common, but assigning difficulty levels to such materials still requires substantial human effort. LLM-as-a-Judge has therefore attracted attention, yet disagreement with human raters remains a major challenge. We …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.8
2026-05-12 · Ariel Gera, Shir Ashury-Tahan, Gal Bloch, Ohad Eytan, Assaf Toledo
General AI
We explore the effectiveness of an LLM-guided query refinement paradigm for extending the usability of embedding models to challenging zero-shot search and classification tasks. Our approach refines the embedding representation of a user query using feedback from a generative LLM on a small set of documents, enabling e…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.8
2026-05-12 · Yichen Zhang, Jun Li
General AI
The efficient operation of modern cellular networks hinges on the accurate analysis of spatio-temporal traffic data. Mastering these patterns is essential for core network functions, chiefly forecasting future load to pre-empt congestion and imputing missing values caused by sensor failures or transmission errors to en…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 4.5
2026-04-06 · Asiri Dalugoda
General AI
Agentic AI systems increasingly execute consequential actions on behalf of human principals, delegating tasks through multi-step chains of autonomous agents. No existing standard addresses a fundamental accountability gap: verifying that terminal actions in a delegation chain were genuinely authorized by a human princi…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 4.5
2026-04-07 · Changxin Ke, Rui Zhang, Jiaming Guo, Yuanbo Wen, Li Ding, Shuo Wang, Xuyuan Zhu, Xiong Peng, Di Huang, Zidong Du, Xing Hu, Qi Guo, Yunji Chen
General AI
Large Language Models (LLMs) achieve strong program repair performance but often suffer from over-editing, where excessive modifications overwrite correct code and hinder bug localization. We systematically quantify its impact and introduce precise repair task, which maximizes reuse of correct code while fixing only bu…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 4.5
2026-04-16 · Natapong Nitarach
General AI
Majority voting over multiple LLM attempts improves mathematical reasoning, but correlated errors limit the effective sample size. A natural fix is to assign different reasoning strategies to different voters. The approach, Diverse Prompt Mixer, is tested on the AIMO 3 competition: 3 models, 23+ experiments, 50 IMO-lev…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 4.5
2026-04-20 · Qifan Zhang, Dongyang Ma, Tianqing Fang, Jia Li, Jing Tang, Nuo Chen, Haitao Mi, Yan Wang
General AI
Most agents today ``self-evolve'' by following rewards and rules defined by humans. However, this process remains fundamentally dependent on external supervision; without human guidance, the evolution stops. In this work, we train agents to possess an intrinsic meta-evolution capability to spontaneously learn about uns…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 4.5
2026-04-21 · Qingyang Zhang, Xinke Kong, Haitao Wu, Qinghua Hu, Minghao Wu, Baosong Yang, Yu Cheng, Yun Luo, Ganqu Cui, Changqing Zhang
General AI
Test-time training (TTT) adapts model parameters on unlabeled test instances during inference time, which continuously extends capabilities beyond the reach of offline training. Despite initial gains, existing TTT methods for LRMs plateau quickly and do not benefit from additional test-time compute. Without external ca…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 4.5
2026-04-28 · Wenqi Jia, Zekun Li, Abhay Mittal, Chengcheng Tang, Chuan Guo, Lezi Wang, James Matthew Rehg, Lingling Tao, Size An
General AI
Recent advances in text-driven human motion generation enable models to synthesize realistic motion sequences from natural language descriptions. However, most existing approaches assume identity-neutral motion and generate movements using a canonical body representation, ignoring the strong influence of body morpholog…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 4.5
2026-04-28 · Zhiyuan Fan, Tinghao Yu, Yuanjun Cai, Jiangtao Guan, Yun Yang, Dingxin Hu, Jiang Zhou, Xing Wu, Zhuo Han, Feng Zhang, Lilin Wang
General AI
Terminal agents have demonstrated strong potential for autonomous command-line execution, yet their training remains constrained by the scarcity of high-quality and diverse execution trajectories. Existing approaches mitigate this bottleneck by synthesizing large-scale terminal task instances for trajectory sampling. H…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 4.5
2026-04-29 · Hayate Iso, Tiyasa Mitra, Sudipta Mondal, Rasoul Shafipour, Venmugil Elango, Terry Kong, Yuki Huang, Seonjin Na, Izzy Putterman, Benjamin Chislett, Maor Ashkenazi, Joseph Guman, Gerald Shen, Tugrul Konuk, Ashwath Aithal, Ritika Borkar, Ran Zilberstein, Bita Rouhani
General AI
RL post-training of frontier language models is increasingly bottlenecked by autoregressive rollout generation, making rollout acceleration a central systems challenge. Many existing efficiency methods improve throughput by changing the rollout or optimization regime, for example, through off-policy execution, replay, …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 4.5
2026-04-29 · Morayo Danielle Adeyemi, Ryan A. Rossi, Franck Dernoncourt
General AI
Fashion AI systems routinely encode the aesthetic logic of specific houses, editors, and historical moments without disclosing it. We present FASH-iCNN, a multimodal system trained on 87,547 Vogue runway images across 15 fashion houses spanning 1991-2024 that makes this cultural logic inspectable. Given a photograph of…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-03-22 · Shih-Wen Liu, Yen-Chang Chen, Wei-Ta Chu, Fu-En Yang, Yu-Chiang Frank Wang
General AI
Multi-task learning (MTL) aims to enable a single model to solve multiple tasks efficiently; however, current parameter-efficient fine-tuning (PEFT) methods remain largely limited to single-task adaptation. We introduce \textbf{Free Sinewich}, a parameter-efficient multi-task learning framework that enables near-zero-c…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 4.3
2026-03-26 · Chengshuai Yang
General AI
Designing a computational imaging system -- selecting operators, setting parameters, validating consistency -- requires weeks of specialist effort per modality, creating an expertise bottleneck that excludes the broader scientific community from prototyping imaging instruments. We introduce spec.md, a structured specif…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-03-26 · Yannick Roy
General AI
Code production is now a commodity; the bottleneck is knowing what to build and proving it works. We present the Kitchen Loop, a framework for autonomous, self-evolving software built on a unified trust model: (1) a specification surface enumerating what the product claims to support; (2) 'As a User x 1000', where an L…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-03-30 · Shuang Zhou, Kai Yu, Zaifu Zhan, Huixue Zhou, Min Zeng, Feng Xie, Zhiyi Sha, Rui Zhang
General AI
Epilepsy and psychogenic non-epileptic seizures often present with similar seizure-like manifestations but require fundamentally different management strategies. Misdiagnosis is common and can lead to prolonged diagnostic delays, unnecessary treatments, and substantial patient morbidity. Although prolonged video-electr…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-03-30 · N Alex Cayco Gajic, Arthur Pellegrino
General AI
Similarity measures are widely used to interpret the representational geometries used by neural networks to solve tasks. Yet, because existing methods compare the extrinsic geometry of representations in state space, rather than their intrinsic geometry, they may fail to capture subtle yet crucial distinctions between …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-03-30 · Conrad Borchers, Valdemar Švábenský, Sandesh K. Kafle, Kevin K. Tang, Jan Vykopal
General AI
Instructional alignment, the match between intended cognition and enacted activity, is central to effective instruction but hard to operationalize at scale. We examine alignment in cybersecurity simulations using multimodal traces from 23 teams (76 students) across five exercise sessions. Study 1 codes objectives and t…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-03-30 · Liliang Ren, Yang Liu, Yelong Shen, Weizhu Chen
General AI
Scaling laws for large language models depend critically on the optimizer and parameterization. Existing hyperparameter transfer laws are mainly developed for first-order optimizers, and they do not structurally prevent training instability at scale. Recent hypersphere optimization methods constrain weight matrices to …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-03-31 · Badhan Mazumder, Sir-Lord Wiafe, Aline Kotoski, Vince D. Calhoun, Dong Hye Ye
General AI
Understanding how brain structure and function interact is key to explaining intelligence yet modeling them jointly is challenging as the structural and functional connectome capture complementary aspects of organization. We introduced Multi-scale Adaptive Graph Network (MAGNet), a Transformer-style graph neural networ…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-06 · Connor Dilgren, Sarah Wiegreffe
General AI
Latent reasoning models (LRMs) have attracted significant research interest due to their low inference cost (relative to explicit reasoning models) and theoretical ability to explore multiple reasoning paths in parallel. However, these benefits come at the cost of reduced interpretability: LRMs are difficult to monitor…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-06 · Yang Li, Qiang Sheng, Zhengjia Wang, Yehan Yang, Danding Wang, Juan Cao
General AI
The misuse of large language models (LLMs) requires precise detection of synthetic text. Existing works mainly follow binary or ternary classification settings, which can only distinguish pure human/LLM text or collaborative text at best. This remains insufficient for the nuanced regulation, as the LLM-polished human t…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-06 · Vadim Vashkelis, Natalia Trukhina
General AI
Mixture-of-Experts (MoE) architectures enable conditional computation by activating only a subset of model parameters for each input. Although sparse routing has been highly effective in language models and has also shown promise in vision, most vision MoE methods operate at the image or patch level. This granularity i…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-06 · Sudarshan Rajagopalan, Vishal M. Patel
General AI
Pre-trained diffusion models have enabled significant advancements in All-in-One Restoration (AiOR), offering improved perceptual quality and generalization. However, diffusion-based restoration methods primarily rely on fine-tuning or Control-Net style modules to leverage the pre-trained diffusion model's priors for A…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-07 · Maissam Barkeshli, Michael R. Douglas, Michael H. Freedman
General AI
Recent progress in artificial intelligence (AI) is unlocking transformative capabilities for mathematics. There is great hope that AI will help solve major open problems and autonomously discover new mathematical concepts. In this essay, we further consider how AI may open a grand perspective on mathematics by forging …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-07 · Yasmeen Saeed, Ahmed Sharshar, Mohsen Guizani
General AI
Detecting cyberattacks in photovoltaic (PV) monitoring and MPPT control signals requires models that are robust to bias, drift, and transient spikes, yet lightweight enough for resource-constrained edge controllers. While deep learning outperforms traditional physics-based diagnostics and handcrafted features, standard…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-07 · Zhengming Yu, Li Ma, Mingming He, Leo Isikdogan, Yuancheng Xu, Dmitriy Smirnov, Pablo Salamanca, Dao Mi, Pablo Delgado, Ning Yu, Julien Philip, Xin Li, Wenping Wang, Paul Debevec
General AI
Most digital videos are stored in 8-bit low dynamic range (LDR) formats, where much of the original high dynamic range (HDR) scene radiance is lost due to saturation and quantization. This loss of highlight and shadow detail precludes mapping accurate luminance to HDR displays and limits meaningful re-exposure in post-…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-07 · Hamed Jelodar, Samita Bai, Tochukwu Emmanuel Nwankwo, Parisa Hamedi, Mohammad Meymani, Roozbeh Razavi-Far, Ali A. Ghorbani
General AI
Code decompilation analysis is a fundamental yet challenging task in malware reverse engineering, particularly due to the pervasive use of sophisticated obfuscation techniques. Although recent large language models (LLMs) have shown promise in translating low-level representations into high-level source code, most exis…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-09 · Jiayuan Ye, Vitaly Feldman, Kunal Talwar
General AI
Large language models (LLMs) can struggle to memorize factual knowledge in their parameters, often leading to hallucinations and poor performance on knowledge-intensive tasks. In this paper, we formalize fact memorization from an information-theoretic perspective and study how training data distributions affect fact ac…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-13 · Junhao Chen, Kejun Gao, Yuehan Cui, Mingze Sun, Mingjin Chen, Shaohui Wang, Xiaoxiao Long, Fei Ma, Qi Tian, Ruqi Huang, Hao Zhao
General AI
Despite rapid progress in video generation, existing models are incapable of producing vector animation, a dominant and highly expressive form of multimedia on the Internet. Vector animations offer resolution-independence, compactness, semantic structure, and editable parametric motion representations, yet current gene…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-13 · Xingjian Ran, Shujie Zhang, Weipeng Zhong, Li Luo, Bo Dai
General AI
Generating high-fidelity 3D indoor scenes remains a significant challenge due to data scarcity and the complexity of modeling intricate spatial relations. Current methods often struggle to scale beyond training distribution to dense scenes or rely on LLMs/VLMs that lack the ability for precise spatial reasoning. Buildi…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-14 · Yaru Niu, Zhenlong Fang, Binghong Chen, Shuai Zhou, Revanth Senthilkumaran, Hao Zhang, Bingqing Chen, Chen Qiu, H. Eric Tseng, Jonathan Francis, Ding Zhao
General AI
Humanoid robots promise general-purpose assistance, yet real-world humanoid loco-manipulation remains challenging because it requires whole-body stability, dexterous hands, and contact-aware perception under frequent contact changes. In this work, we study dexterous, contact-rich humanoid loco-manipulation. We first de…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-14 · Megha Chakraborty, Darssan L. Eswaramoorthi, Madhur Thareja, Het Riteshkumar Shah, Finlay Palmer, Aryaman Bahl, Michelle A Ihetu, Amit Sheth
General AI
AI-driven education platforms have made some progress in personalisation, yet most remain constrained to static adaptation--predefined quizzes, uniform pacing, or generic feedback--limiting their ability to respond to learners' evolving understanding. This shortfall highlights the need for systems that are both context…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-14 · Nafis Fuad Shahid, Maroof Ahmed, Md Akib Haider, Saidur Rahman Sagor, Aashnan Rahman, Md Azam Hossain
General AI
Multimodal federated learning enables privacy-preserving collaborative model training across healthcare institutions. However, a fundamental challenge arises from modality heterogeneity: many clinical sites possess only a subset of modalities due to resource constraints or workflow variations. Existing approaches addre…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-14 · Yida Niu, Xinhai Chang, Xin Liu, Ziyuan Jiao, Yixin Zhu
General AI
Robots deployed in unstructured environments must coordinate whole-body motion -- simultaneously moving a mobile base and arm -- to interact with the physical world. This coupled mobility and dexterity yields a state space that grows combinatorially with scene and object diversity, demanding datasets far larger than th…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-16 · Mitch Adler, Matthew Russo, Michael Cafarella
General AI
In the past year, researchers have started to create agentic systems that can design real-world CAD-style objects in a training-free setting, a new variety of system that we call Agent-Aided Design. Generally speaking, these systems place an agent in a feedback loop in which it can write code, compile that code to an a…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-16 · Manan Gupta, Dhruv Kumar
General AI
LLM-as-judge frameworks are increasingly used for automatic NLG evaluation, yet their per-instance reliability remains poorly understood. We present a two-pronged diagnostic toolkit applied to SummEval: $\textbf{(1)}$ a transitivity analysis that reveals widespread per-input inconsistency masked by low aggregate violat…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-16 · Madhav Agarwal, Sotirios A. Tsaftaris, Laura Sevilla-Lara, Steven McDonagh
General AI
Understanding emotions is a fundamental ability for intelligent systems to be able to interact with humans. Vision-language models (VLMs) have made tremendous progress in the last few years for many visual tasks, potentially offering a promising solution for understanding emotions. However, it is surprising that even t…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-20 · Rui Qian, Chuanhang Deng, Qiang Huang, Jian Xiong, Mingxuan Li, Yingbo Zhou, Wei Zhai, Jintao Chen, Dejing Dou
General AI
Reasoning segmentation requires models to ground complex, implicit textual queries into precise pixel-level masks. Existing approaches rely on a single segmentation token $\texttt{<SEG>}$, whose hidden state implicitly encodes both semantic reasoning and spatial localization, limiting the model's ability to explicitly …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-20 · A. Sophia Koepke, Daniil Zverev, Shiry Ginosar, Alexei A. Efros
General AI
The Platonic Representation Hypothesis suggests that neural networks trained on different modalities (e.g., text and images) align and eventually converge toward the same representation of reality. If true, this has significant implications for whether modality choice matters at all. We show that the experimental evide…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-20 · Tatsuki Kuribayashi, Alex Warstadt, Yohei Oseki, Ethan Gotlieb Wilcox
General AI
A recent study (Kuribayashi et al., 2025) has shown that human sentence processing behavior, typically measured on syntactically unchallenging constructions, can be effectively modeled using surprisal from early layers of large language models (LLMs). This raises the question of whether such advantages of internal laye…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-20 · Anda Cao, Zhuo Gou, Yi Wang, Kaixuan Chen, Yu Wang, Can Wang, Mingli Song, Jie Song
General AI
Merging multiple Low-Rank Adaptation (LoRA) experts into a single backbone is a promising approach for efficient multi-task deployment. While existing methods strive to alleviate interference via weight interpolation or subspace alignment, they rest upon the implicit assumption that all LoRA matrices contribute constru…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-20 · Haoyu Wu, Jiwen Yu, Yingtian Zou, Xihui Liu
General AI
Video world models have achieved remarkable success in simulating environmental dynamics in response to actions by users or agents. They are modeled as action-conditioned video generation models that take historical frames and current actions as input to predict future frames. Yet, most existing approaches are limited …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-20 · Wei Yao, Haohan Ma, Hongwen Zhang, Yunlian Sun, Liangjun Xing, Zhile Yang, Yuanjun Guo, Yebin Liu, Jinhui Tang
General AI
Controllable cooperative humanoid manipulation is a fundamental yet challenging problem for embodied intelligence, due to severe data scarcity, complexities in multi-agent coordination, and limited generalization across objects. In this paper, we present SynAgent, a unified framework that enables scalable and physicall…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-21 · Isaiah Thompson, Tanmay Sen, Ritwik Bhattacharya
General AI
Modern distributed systems generate massive volumes of log data that are critical for detecting anomalies and cyber threats. However, in real world settings, these logs are often distributed across multiple organizations and cannot be centralized due to privacy and security constraints. Existing log anomaly detection m…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-21 · Nikita Kister, Pradyumna YM, István Sárándi, Jiayi Wang, Anna Khoreva, Gerard Pons-Moll
General AI
Training embodied agents to understand 3D scenes as humans do requires large-scale data of people meaningfully interacting with diverse environments, yet such data is scarce. Real-world motion capture is costly and limited to controlled settings, while existing synthetic datasets rely on simple geometric heuristics tha…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-21 · Carles Navarro, Philipp Tholke, Gianni de Fabritiis
General AI
Structure-based drug discovery faces the dual challenge of accurately capturing 3D protein-ligand interactions while navigating ultra-large chemical spaces to identify synthetically accessible candidates. In this work, we present a unified framework that addresses these challenges by combining contrastive 3D structure …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-21 · Boyu Chen, Yi Chen, Lu Qiu, Jerry Bai, Yuying Ge, Yixiao Ge
General AI
Scaling humanoid foundation models is bottlenecked by the scarcity of robotic data. While massive egocentric human data offers a scalable alternative, bridging the cross-embodiment chasm remains a fundamental challenge due to kinematic mismatches. We introduce UniT (Unified Latent Action Tokenizer via Visual Anchoring)…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-22 · William Scarbro, Ravi Mangal
General AI
Autonomous systems that rely on learned perception can make unsafe decisions when sensor readings are misclassified. We study shielding for this setting: given a proposed action, a shield blocks actions that could violate safety. We consider the common case where system dynamics are known but perception uncertainty mus…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-22 · Mohammed Zeehan Saleheen, Markus Wagner, Reza Razzaghi, Hao Wang
General AI
Reliable operation is a central motivation for deploying renewable-based microgrids. This paper presents a systematic rapid review that positions reliability as the central organizing principle for microgrid design. Specifically, this review systematically synthesizes recent literature to examine how planning assumptio…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-23 · Hao-Yu Hsu, Tianhang Cheng, Jing Wen, Alexander G. Schwing, Shenlong Wang
General AI
Understanding human activities and their surrounding environments typically relies on visual perception, yet cameras pose persistent challenges in privacy, safety, energy efficiency, and scalability. We explore an alternative: 4D perception without vision. Its goal is to reconstruct human motion and 3D scene layouts pu…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-23 · Yanran Zhang, Wenzhao Zheng, Yifei Li, Bingyao Yu, Yu Zheng, Lei Chen, Jiwen Lu, Jie Zhou
General AI
In recent years, significant progress has been made in both image generation and generated image detection. Despite their rapid, yet largely independent, development, these two fields have evolved distinct architectural paradigms: the former predominantly relies on generative networks, while the latter favors discrimin…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-27 · Mufhumudzi Muthivhi, Terence L. van Zyl
General AI
There has been growing interest in studying the complexity of Rectified Linear Unit (ReLU) based activation networks. Recent work investigates the evolution of the number of piecewise-linear partitions (linear regions) that are formed during training. However, current research is limited to examining the complexity of …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-28 · Bangzhao Shu, Arinjay Singh, Mai ElSherief
General AI
Large language models (LLMs) are increasingly used in emotionally sensitive human-AI applications, yet little is known about how emotion recognition is internally represented. In this work, we investigate the internal mechanisms of emotion recognition in LLMs using sparse autoencoders (SAEs). By analyzing sparse featur…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-29 · Catherine Liu, Tao Long, Asya Vaisberg, Chau Vu, Jiaju Ma, Jingyi Li
General AI
Creativity support tools (CSTs) aim to elevate the quality of artists' creative processes and artifacts. Yet most current CST evaluations overlook temporal and social aspects of tool use. To address this gap, we present a longitudinal, group-based CST evaluation through a three-week deployment of ArtKrit, a computation…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-29 · Evangelia Kopadi, Dimitris Kalles
General AI
Can Neural Assemblies -- groups of neurons that fire together and strengthen through co-activation -- learn the direction of causal influence between variables? While established as a computationally general substrate for classification, parsing, and planning, neural assemblies have not yet been shown to internalize ca…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-29 · Yuxuan Tian, Yurun Jin, Bin Yu, Yukun Shi, Hao Wu, Chi Harold Liu, Kai Chen, Cong Huang
General AI
Robotic manipulation critically requires reasoning about future spatial-temporal interactions, yet existing VLA policies and world-model-enhanced policies do not fully model action-relevant spatial-temporal interaction structure. We propose STARRY, a world-model-enhanced action-generation policy that aligns spatial-tem…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-29 · Zhuofan Lou, Shihang Zhang, Fangle Zhu, Shengjie Ye, Pingyu Wang
General AI
We propose UAPAR, an Uncertainty-Aware Pedestrian Attribute Recognition framework. To the best of our knowledge, this is the first EDL-based uncertainty-aware framework for pedestrian attribute recognition (PAR). Unlike conventional deterministic methods, which fail to assess prediction reliability on low-quality sampl…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-30 · Maykon Nunes, Emanuel Coutinho, Carla Bezerra, Ivan Machado
General AI
Angular is one of the most widely adopted frameworks for developing large-scale, dynamic web applications. As projects increase in scope and complexity, developers face growing challenges in managing architecture and maintaining clean, modular code. These challenges often lead to design flaws, commonly referred to as c…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-30 · Tianyuan Wu, Chaokun Chang, Lunxi Cao, Wei Gao, Wei Wang
General AI
Autonomous agents act through sandboxed containers and microVMs whose state spans filesystems, processes, and runtime artifacts. Checkpoint and restore (C/R) of this state is needed for fault tolerance, spot execution, RL rollout branching, and safe rollback-yet existing approaches fall into two extremes: application-l…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-30 · Kehong Gong, Zhengyu Wen, Dao Thien Phong, Mingxi Xu, Weixia He, Qi Wang, Ning Zhang, Zhengyu Li, Guanli Hou, Dongze Lian, Xiaoyu He, Mingyuan Zhang, Hanwang Zhang
General AI
Recent methods for arbitrary-skeleton motion capture from monocular video follow a factorized pipeline, where a Video-to-Pose network predicts joint positions and an analytical inverse-kinematics (IK) stage recovers joint rotations. While effective, this design is inherently limited, since joint positions do not fully …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 4.3
2026-04-30 · Junyoung Lee, Sookwan Han, Jeonghwan Kim, Inhee Lee, Mingi Choi, Jisoo Kim, Wonjung Woo, Hanbyul Joo
General AI
Human-robot collaboration has been studied primarily in dyadic or sequential settings. However, real homes require multiadic collaboration, where multiple humans and robots share a workspace, acting concurrently on interleaved subtasks with tight spatial and temporal coupling. This regime remains underexplored because …
- Review
- pending
- Role
- unreviewed
- Read
- soon