arxiv
Score 31.5
2026-03-12 · Jiaheng Hu, Jay Shim, Chen Tang, Yoonchang Sung, Bo Liu, Peter Stone, Roberto Martin-Martin
Research Track A · General AI
Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual learning suggests that naive Sequential Fine-Tuning (Seq. FT) leads to catastrophi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 29.4
2026-03-20 · Zijian Lu, Yiping Zuo, Yupeng Nie, Xin He, Weibei Fan, Chen Dai
Research Track B · General AI
Despite rapid progress in multimodal GUI agents, reusable skill acquisition remains difficult because on-demand generated skills often leave action semantics, state assumptions, and success criteria implicit. This makes them brittle to execution errors, hard to verify, and difficult to repair. We present ContractSkill,…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 25.9
2026-03-16 · Zhaohui Geoffrey Wang
Research Track A · General AI
A critical failure mode of current lifelong agents is not lack of knowledge, but the inability to decide how to reason. When an agent encounters "Is this coin fair?" it must recognize whether to invoke frequentist hypothesis testing or Bayesian posterior inference - frameworks that are epistemologically incompatible. M…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 24.8
2026-03-07 · Yunteng Tan, Zhi Gao, Xinxiao Wu
Research Track B · General AI
Large language model-based web agents have shown strong potential in automating web interactions through advanced reasoning and instruction following. While retrieval-based memory derived from historical trajectories enables these agents to handle complex, long-horizon tasks, current methods struggle to generalize acro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 24.2
2026-03-20 · Taiyi Wang, Sian Gooding, Florian Hartmann, Oriana Riva, Edward Grefenstette
Research Track B · General AI
Large language model (LLM)-based agents have emerged as powerful autonomous controllers for digital environments, including mobile interfaces, operating systems, and web browsers. Web navigation, for example, requires handling dynamic content and long sequences of actions, making it particularly challenging. Existing L…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 24.0
2026-03-13 · Hongyang Chen, Zhongwu Sun, Hongfei Ye, Kunchi Li, Xuemin Lin
Research Track A · General AI
Continual learning (CL) has emerged as a pivotal paradigm to enable large language models (LLMs) to dynamically adapt to evolving knowledge and sequential tasks while mitigating catastrophic forgetting-a critical limitation of the static pre-training paradigm inherent to modern LLMs. This survey presents a comprehensiv…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 23.8
2026-03-23 · Shoubin Yu, Lei Shu, Antoine Yang, Yao Fu, Srinivas Sunkara, Maria Wang, Jindong Chen, Mohit Bansal, Boqing Gong
Research Track B · General AI
Multimodal AI agents are increasingly automating complex real-world workflows that involve online web execution. However, current web-agent benchmarks suffer from a critical limitation: they focus entirely on web-based interaction and perception, lacking grounding in the user's real-world physical surroundings. This li…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 23.8
2026-03-26 · Jie Zhu, Yimin Tian, Boyang Li, Kehao Wu, Zhongzhi Liang, Junhui Li, Xianyin Zhang, Lifan Guo, Feng Chen, Yong Liu, Chi Zhang
General AI
This paper introduces FinMCP-Bench, a novel benchmark for evaluating large language models (LLMs) in solving real-world financial problems through tool invocation of financial model context protocols. FinMCP-Bench contains 613 samples spanning 10 main scenarios and 33 sub-scenarios, featuring both real and synthetic us…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.6
2026-03-26 · Ünsal Öztürk, Hatef Otroshi Shahreza, Sébastien Marcel
General AI
Multimodal Large Language Models (MLLMs) have recently been explored as face verification systems that determine whether two face images are of the same person. Unlike dedicated face recognition systems, MLLMs approach this task through visual prompting and rely on general visual and reasoning abilities. However, the d…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.6
2026-03-26 · Cristian Lupascu, Alexandru Lupascu
Research Track A · General AI
Large Language Model based agents increasingly operate in high stakes, multi turn settings where factual grounding is critical, yet their memory systems typically rely on flat key value stores or plain vector retrieval with no mechanism to track the provenance or trustworthiness of stored knowledge. We present Elephant…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.5
2026-03-13 · Orit Shahnovsky, Rotem Dror
Research Track B · General AI
Developing autonomous agents for web-based tasks is a core challenge in AI. While Large Language Model (LLM) agents can interpret complex user requests, they often operate as black boxes, making it difficult to diagnose why they fail or how they plan. This paper addresses this gap by formally treating web tasks as sequ…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 20.8
2026-03-26 · Dingjie Song, Tianlong Xu, Yi-Fan Zhang, Hang Li, Zhiling Yan, Xing Fan, Haoyang Li, Lichao Sun, Qingsong Wen
General AI
Assessing student handwritten scratchwork is crucial for personalized educational feedback but presents unique challenges due to diverse handwriting, complex layouts, and varied problem-solving approaches. Existing educational NLP primarily focuses on textual responses and neglects the complexity and multimodality inhe…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.6
2026-03-26 · Abdullah Hamdi, Changchun Yang, Xin Gao
General AI
Early screening via colonoscopy is critical for colon cancer prevention, yet developing robust AI systems for this domain is hindered by the lack of densely annotated, long-sequence video datasets. Existing datasets predominantly focus on single-class polyp detection and lack the rich spatial, temporal, and linguistic …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.6
2026-03-26 · Liang Zhang, Yu Fu, Xinyi Jin
General AI
Large Language Models (LLMs) are increasingly used in math education not only as problem solvers but also as assessors of learners' reasoning. However, it remains unclear whether stronger math problem-solving ability is associated with stronger step-level assessment performance. This study examines that relationship us…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.6
2026-03-26 · André G. Viveiros, Nuno Gonçalves, Matthias Lindemann, André Martins
General AI
While language reasoning models excel in many tasks, visual reasoning remains challenging for current large multimodal models (LMMs). As a result, most LMMs default to verbalizing perceptual content into text, a strong limitation for tasks requiring fine-grained spatial and visual understanding. While recent approaches…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.6
2026-03-26 · Jiwook Han, Geo Ahn, Youngrae Kim, Jinwoo Choi
General AI
Multimodal Large Language Models (MLLMs) have shown strong performance on Video Temporal Grounding (VTG). However, their coarse recognition capabilities are insufficient for fine-grained temporal understanding, making task-specific fine-tuning indispensable. This fine-tuning causes models to memorize dataset-specific s…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.6
2026-03-26 · Jingwei Ni, Yihao Liu, Xinpeng Liu, Yutao Sun, Mengyu Zhou, Pengyu Cheng, Dexin Wang, Xiaoxi Jiang, Guanjun Jiang
General AI
Equipping Large Language Model (LLM) agents with domain-specific skills is critical for tackling complex tasks. Yet, manual authoring creates a severe scalability bottleneck. Conversely, automated skill generation often yields fragile or fragmented results because it either relies on shallow parametric knowledge or seq…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.2
2026-03-15 · Mohamed Aghzal, Gregory J. Stein, Ziyu Yao
Research Track B · General AI
Large language model (LLM) web agents are increasingly used for web navigation but remain far from human reliability on realistic, long-horizon tasks. Existing evaluations focus primarily on end-to-end success, offering limited insight into where failures arise. We propose a hierarchical planning framework to analyze w…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.6
2026-03-26 · Vishal Narnaware, Animesh Gupta, Kevin Zhai, Zhenyi Wang, Mubarak Shah
General AI
Multimodal Diffusion Large Language Models (MDLLMs) achieve high-concurrency generation through parallel masked decoding, yet the architectures remain prone to multimodal hallucinations. This structural vulnerability stems from an algorithmic flaw: the decoder ranks candidate tokens based on textual likelihood without …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.5
2026-01-12 · Jihong Wang, Jiamu Zhou, Weiming Zhang, Weiwen Liu, Zhuosheng Zhang, Xingyu Lou, Weinan Zhang, Huarong Deng, Jun Wang
Research Track B · General AI
With the advancement of vision-language models, web automation has made significant progress. However, deploying autonomous agents in real-world settings remains challenging, primarily due to site heterogeneity, where generalist models lack domain-specific priors for diverse interfaces, and long-horizon instability, ch…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.5
2026-03-09 · Jingbo Yang, Bairu Hou, Wei Wei, Yujia Bao, Shiyu Chang
Research Track B · General AI
Modern agents powered by thinking LLMs achieve high accuracy through long chain-of-thought reasoning but incur substantial inference costs. While many LLMs now support configurable reasoning levels (e.g., high/medium/low), static strategies are often ineffective: using low-effort modes at every step leads to significan…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.4
2026-03-15 · Xudong Wang, Gan Li, Zhiyu Liu, Yao Wang, Lianqing Liu, Zhi Han
Research Track A · General AI
Deploying vision-and-language navigation (VLN) agents requires adaptation across diverse scenes and environments, but fine-tuning on a specific scenario often causes catastrophic forgetting in others, which severely limits flexible long-term deployment. We formalize this challenge as the all-day multi-scenes lifelong V…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.8
2026-03-24 · Yenchia Feng, Chirag Sharma, Karime Maamari
Research Track B · General AI
Although large language models (LLMs) have advanced rapidly, robust automation of complex software workflows remains an open problem. In long-horizon settings, agents frequently suffer from cascading errors and environmental stochasticity; a single misstep in a dynamic interface can lead to task failure, resulting in h…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 18.8
2026-03-26 · Yicheng Zou, Dongsheng Zhu, Lin Zhu, Tong Zhu, Yunhua Zhou, Peiheng Zhou, Xinyu Zhou, Dongzhan Zhou, Zhiwang Zhou, Yuhao Zhou, Bowen Zhou, Zhanping Zhong, Zhijie Zhong, Haiteng Zhao, Penghao Zhao, Xiaomeng Zhao, Zhiyuan Zhao, Yechen Zhang, Jin Zhang, Wenwei Zhang, Hongjie Zhang, Zhuo Zhang, Wenlong Zhang, Bo Zhang, Chao Zhang, Chen Zhang, Yuhang Zang, Fei Yuan, Jiakang Yuan, Jiashuo Yu, Jinhui Yin, Haochen Ye, Qian Yao, Bowen Yang, Danni Yang, Kaichen Yang, Ziang Yan, Jun Xu, Yicheng Xu, Wanghan Xu, Xuenan Xu, Chao Xu, Ruiliang Xu, Shuhao Xing, Long Xing, Xinchen Xie, Ling-I Wu, Zijian Wu, Zhenyu Wu, Lijun Wu, Yue Wu, Jianyu Wu, Wen Wu, Fan Wu, Xilin Wei, Qi Wei, Bingli Wang, Rui Wang, Ziyi Wang, Zun Wang, Yi Wang, Haomin Wang, Yizhou Wang, Lintao Wang, Yiheng Wang, Longjiang Wang, Bin Wang, Jian Tong, Zhongbo Tian, Huanze Tang, Chen Tang, Shixiang Tang, Yu Sun, Qiushi Sun, Xuerui Su, Qisheng Su, Chenlin Su, Demin Song, Jin Shi, Fukai Shang, Yuchen Ren, Pengli Ren, Xiaoye Qu, Yuan Qu, Jiantao Qiu, Yu Qiao, Runyu Peng, Tianshuo Peng, Jiahui Peng, Qizhi Pei, Zhuoshi Pan, Linke Ouyang, Wenchang Ning, Yichuan Ma, Zerun Ma, Ningsheng Ma, Runyuan Ma, Chengqi Lyu, Haijun Lv, Han Lv, Lindong Lu, Kuikun Liu, Jiangning Liu, Yuhong Liu, Kai Liu, Hongwei Liu, Zhoumianze Liu, Mengjie Liu, Ziyu Liu, Wenran Liu, Yang Liu, Liwei Liu, Kaiwen Liu, Junyao Lin, Junming Lin, Tianyang Lin, Dahua Lin, Jianze Liang, Linyang Li, Peiji Li, Zonglin Li, Zehao Li, Pengze Li, Guoyan Li, Lingkai Kong, Linglin Jing, Zhenjiang Jin, Feifei Jiang, Qian Jiang, Junhao Huang, Zixian Huang, Haian Huang, Zhouqi Hua, Han Hu, Linfeng Hou, Yinan He, Conghui He, Tianyao He, Xu Guo, Qipeng Guo, Aijia Guo, Yuzhe Gu, Lixin Gu, Jingyang Gong, Qiming Ge, Jiaye Ge, Songyang Gao, Jianfei Gao, Xinyu Fang, Caihua fan, Yue Fan, Yanhui Duan, Zichen Ding, Shengyuan Ding, Xuanlang Dai, Erfei Cui, Ganqu Cui, Pei Chu, Tao Chu, Guangran Cheng, Yu Cheng, Kai Chen, Yongkang Chen, Chiyu Chen, Guanzhou Chen, Qiaosheng Chen, Sitao Chen, Xin Chen, Haojiong Chen, Yicheng Chen, Weihan Cao, Yuhang Cao, Qinglong Cao, Lei Bai
General AI
We introduce Intern-S1-Pro, the first one-trillion-parameter scientific multimodal foundation model. Scaling to this unprecedented size, the model delivers a comprehensive enhancement across both general and scientific domains. Beyond stronger reasoning and image-text understanding capabilities, its intelligence is aug…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 18.4
2026-03-19 · Minhua Lin, Zhiwei Zhang, Hanqing Lu, Hui Liu, Xianfeng Tang, Qi He, Xiang Zhang, Suhang Wang
General AI
Memory-augmented LLM agents maintain external memory banks to support long-horizon interaction, yet most existing systems treat construction, retrieval, and utilization as isolated subroutines. This creates two coupled challenges: strategic blindness on the forward path of the memory cycle, where construction and retri…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.8
2026-03-26 · Yuqian Fu, Haohuan Huang, Kaiwen Jiang, Yuanheng Zhu, Dongbin Zhao
General AI
On-policy distillation (OPD) is appealing for large language model (LLM) post-training because it evaluates teacher feedback on student-generated rollouts rather than fixed teacher traces. In long-horizon settings, however, the common sampled-token variant is fragile: it reduces distribution matching to a one-token sig…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.4
2026-03-17 · Jian Yang, Wei Zhang, Shawn Guo, Zhengmao Ye, Lin Jing, Shark Liu, Yizhi Li, Jiajun Wu, Cening Liu, X. Ma, Yuyang Song, Siwei Wu, Yuwen Li, L. Liao, T. Zheng, Ziling Huang, Zelong Huang, Che Liu, Yan Xing, Renyuan Li, Qingsong Cai, Hanxu Yan, Siyue Wang, Shikai Li, Jason Klein Liu, An Huang, Yongsheng Kang, Jinxing Zhang, Chuan Hao, Haowen Wang, Weicheng Gu, Ran Tao, Mingjie Tang, Peihao Wu, Jianzhou Wang, Xianglong Liu, Weifeng Lv, Bryan Dai
General AI
In this report, we introduce the IQuest-Coder-V1 series-(7B/14B/40B/40B-Loop), a new family of code large language models (LLMs). Moving beyond static code representations, we propose the code-flow multi-stage training paradigm, which captures the dynamic evolution of software logic through different phases of the pipe…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.6
2026-03-26 · Abhishek Bhandwaldar, Mihir Choudhury, Ruchir Puri, Akash Srivastava
General AI
We present an empirical study of how far general-purpose coding agents -- without hardware-specific training -- can optimize hardware designs from high-level algorithmic specifications. We introduce an agent factory, a two-stage pipeline that constructs and coordinates multiple autonomous optimization agents. In Stage~…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.6
2026-03-26 · Yunus Talha Erzurumlu, Jiyong Kwag, Alper Yilmaz
General AI
Cross-view geo-localization (CVGL) estimates a camera's location by matching a street-view image to geo-referenced overhead imagery, enabling GPS-denied localization and navigation. Existing methods almost universally formulate CVGL as an image-retrieval problem in a contrastively trained embedding space. This ties per…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.5
2026-03-22 · Elif Ceren Gok Yildirim, Murat Onur Yildirim, Joaquin Vanschoren
Research Track A · General AI
The continual learning literature has rapidly shifted from traditional class incremental learning (CIL) techniques to foundation model (FM)-based CIL methods without a clear understanding of how these newer approaches compare to strong, lightweight convolutional baselines. This abrupt transition has created a substanti…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.4
2026-03-15 · Jiayuan Du, Yuebing Song, Yiming Zhao, Xianghui Pan, Jiawei Lian, Yuchu Lu, Liuyi Wang, Chengju Liu, Qijun Chen
Research Track A · General AI
End-to-End autonomous driving (E2E-AD) systems face challenges in lifelong learning, including catastrophic forgetting, difficulty in knowledge transfer across diverse scenarios, and spurious correlations between unobservable confounders and true driving intents. To address these issues, we propose DeLL, a Deconfounded…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.0
2026-03-06 · Xudong Wang, Jiahua Dong, Baichen Liu, Qi Lyu, Lianqing Liu, Zhi Han
Research Track A · General AI
Embodied navigation agents powered by large language models have shown strong performance on individual tasks but struggle to continually acquire new navigation skills, which suffer from catastrophic forgetting. We formalize this challenge as lifelong embodied navigation learning (LENL), where an agent is required to a…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.8
2026-03-25 · Risa Shinoda, Kaede Shiohara, Nakamasa Inoue, Kuniaki Saito, Hiroaki Santo, Fumio Okura
General AI
Understanding animal species from multimodal data poses an emerging challenge at the intersection of computer vision and ecology. While recent biological models, such as BioCLIP, have demonstrated strong alignment between images and textual taxonomic information for species identification, the integration of the audio …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.8
2026-03-25 · Isha Puri, Mehul Damani, Idan Shenfeld, Marzyeh Ghassemi, Jacob Andreas, Yoon Kim
General AI
Given a question, a language model (LM) implicitly encodes a distribution over possible answers. In practice, post-training procedures for LMs often collapse this distribution onto a single dominant mode. While this is generally not a problem for benchmark-style evaluations that assume one correct answer, many real-wor…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.6
2026-03-25 · Bingqing Wei, Zhongyu Xia, Dingai Liu, Xiaoyu Zhou, Zhiwei Lin, Yongtao Wang
General AI
Vision-language models (VLMs) have shown remarkable general capabilities, yet embodied agents built on them fail at complex tasks, often skipping critical steps, proposing invalid actions, and repeating mistakes. These failures arise from a fundamental gap between the static training data of VLMs and the physical inter…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.6
2026-03-26 · Zirui Zhang, Haoyu Dong, Kexin Pei, Chengzhi Mao
General AI
Robust perception and reasoning require consistency across sensory modalities. Yet current multimodal models often violate this principle, yielding contradictory predictions for visual and textual representations of the same concept. Rather than masking these failures with standard voting mechanisms, which can amplify …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.6
2026-03-26 · Sicheng Zuo, Yuxuan Li, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu
General AI
Vision-language-action models have reshaped autonomous driving to incorporate languages into the decision-making process. However, most existing pipelines only utilize the language modality for scene descriptions or reasoning and lack the flexibility to follow diverse user instructions for personalized driving. To addr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.2
2026-03-20 · Xuanwang Zhang, Yuteng Han, Jinnan Qi, Mulong Xie, Zhen Wu, Xinyu Dai
Research Track B · General AI
Despite significant advances in autonomous web navigation, current methods remain far from human-level performance in complex web environments. We argue that this limitation stems from Topological Blindness, where agents are forced to explore via trial-and-error without access to the global topological structure of the…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.0
2026-03-22 · Shenghan Chen, Yiming Liu, Yanzhen Wang, Yujia Wang, Xiankai Lu
Research Track A · General AI
Balancing performance trade-off on long-tail (LT) data distributions remains a long-standing challenge. In this paper, we posit that this dilemma stems from a phenomenon called "tail performance degradation" (the model tends to severely overfit on head classes while quickly forgetting tail classes) and pose a solution …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 14.6
2026-03-25 · Zhuoran Li, Zhiyang Li, Kaijun Zhou, Jinyu Gu
Research Track A · General AI
Despite the promise of Vision-Language-Action (VLA) models as generalist robotic controllers, their robustness against perceptual noise and environmental variations in out-of-distribution (OOD) tasks remains fundamentally limited by the absence of long-term memory, causal failure attribution, and dynamic intervention c…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.6
2026-03-26 · Zehao Wang, Huaide Jiang, Shuaiwu Dong, Yuping Wang, Hang Qiu, Jiachen Li
General AI
Human driving behavior is inherently personal, which is shaped by long-term habits and influenced by short-term intentions. Individuals differ in how they accelerate, brake, merge, yield, and overtake across diverse situations. However, existing end-to-end autonomous driving systems either optimize for generic objectiv…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.0
2026-03-22 · Liang Ding
Research Track B · General AI
LLM-as-Judge evaluation fails agent tasks because a fixed rubric cannot capture what matters for this task: code debugging demands Correctness and Error Handling; web navigation demands Goal Alignment and Action Efficiency. We present ADARUBRIC, which closes this gap by generating task-specific evaluation rubrics on th…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.6
2026-03-26 · Linyue Pan, Lexiao Zou, Shuo Guo, Jingchen Ni, Hai-Tao Zheng
General AI
Agent performance increasingly depends on \emph{harness engineering}, yet harness design is usually buried in controller code and runtime-specific conventions, making it hard to transfer, compare, and study as a scientific object. We ask whether the high-level control logic of an agent harness can instead be externaliz…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.6
2026-03-26 · Liping Yi, Zhiming Zhao, Qinghua Hu
General AI
Social learning highlights that learning agents improve not in isolation, but through interaction and structured knowledge exchange with others. When introduced into machine learning, this principle gives rise to social machine learning (SML), where multiple agents collaboratively learn by sharing abstracted knowledge.…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.6
2026-03-26 · Jinbo Xing, Zeyinzi Jiang, Yuxiang Tuo, Chaojie Mao, Xiaotang Gai, Xi Chen, Jingfeng Zhang, Yulin Pan, Zhen Han, Jie Xiao, Keyu Yan, Chenwei Xie, Chongyang Zhong, Kai Zhu, Tong Shen, Lianghua Huang, Yu Liu, Yujiu Yang
General AI
Recent unified models have made unprecedented progress in both understanding and generation. However, while most of them accept multi-modal inputs, they typically produce only single-modality outputs. This challenge of producing interleaved content is mainly due to training data scarcity and the difficulty of modeling …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.5
2026-03-24 · Xinyao Wu, Zhe Xu, Cheng Chen, Jiawei Ma, Yefeng Zheng, Raymond Kai-yu Tong
Research Track A · General AI
Class-incremental learning (CIL) in medical image-guided diagnosis requires retaining prior diagnostic knowledge while adapting to newly emerging disease categories, which is critical for scalable clinical deployment. This problem is particularly challenging due to heterogeneous data and privacy constraints that preven…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 13.4
2026-03-19 · Haochen Zhao, Shaoyang Cui
Research Track B · General AI
Autonomous web agents such as \textbf{OpenClaw} are rapidly moving into high-impact real-world workflows, but their security robustness under live network threats remains insufficiently evaluated. Existing benchmarks mainly focus on static sandbox settings and content-level prompt attacks, which leaves a practical gap …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 13.0
2026-03-04 · Hung Tran, Langston Nashold, Rayan Krishnan, Antoine Bigeard, Alex Gu
Research Track B · General AI
Code generation has emerged as one of AI's highest-impact use cases, yet existing benchmarks measure isolated tasks rather than the complete "zero-to-one" process of building a working application from scratch. We introduce Vibe Code Bench, a benchmark of 100 web application specifications (50 public validation, 50 hel…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 12.8
2026-03-24 · Qianlong Lan, Anuj Kaul
Research Track B · General AI
Deploying large language models (LLMs) as autonomous browser agents exposes a significant attack surface in the form of Indirect Prompt Injection (IPI). Cloud-based defenses can provide strong semantic analysis, but they introduce latency and raise privacy concerns. We present the Cognitive Firewall, a three-stage spli…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.8
2026-03-26 · Jiaqing Zhang, Hao Wang, Mingjia Yin, Bo Chen, Qinglin Jia, Rui Zhou, Ruiming Tang, ChaoYi Ma, Enhong Chen
Research Track A · General AI
Modern deep recommender models are trained under a continual learning paradigm, relying on massive and continuously growing streaming behavioral logs. In large-scale platforms, retraining models on full historical data for architecture comparison or iteration is prohibitively expensive, severely slowing down model deve…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 12.6
2026-03-26 · Geeyang Tay, Wentao Ma, Jaewon Lee, Yuzhi Tang, Daniel Lee, Weisu Yin, Dongming Shen, Silin Meng, Yi Zhu, Mu Li, Alex Smola
General AI
Automatic speech recognition (ASR) systems have achieved near-human accuracy on curated benchmarks, yet still fail in real-world voice agents under conditions that current evaluations do not systematically cover. Without diagnostic tools that isolate specific failure factors, practitioners cannot anticipate which condi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.6
2026-03-26 · Yan Li, Zezi Zeng, Ziwei Zhou, Xin Gao, Muzhao Tian, Yifan Yang, Mingxi Cheng, Qi Dai, Yuqing Yang, Lili Qiu, Zhendong Wang, Zhengyuan Yang, Xue Yang, Lijuan Wang, Ji Li, Chong Luo
General AI
Recent advances in image generation models have expanded their applications beyond aesthetic imagery toward practical visual content creation. However, existing benchmarks mainly focus on natural image synthesis and fail to systematically evaluate models under the structured and multi-constraint requirements of real-wo…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.6
2026-03-26 · Hai X. Pham, David T. Hoffmann, Ricardo Guerrero, Brais Martinez
General AI
Contrastive vision-language (V&L) models remain a popular choice for various applications. However, several limitations have emerged, most notably the limited ability of V&L models to learn compositional representations. Prior methods often addressed this limitation by generating custom training data to obtain hard neg…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.6
2026-03-26 · Yuxing Lu, Xukai Zhao, Wei Wu, Jinzhuo Wang
General AI
The knowledge base in a retrieval-augmented generation (RAG) system is typically assembled once and never revised, even though the facts a query requires are often fragmented across documents and buried in irrelevant content. We argue that the knowledge base should be treated as a trainable component and propose WriteB…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.5
2026-03-22 · Alfred Shen, Aaron Shen
Research Track A · General AI
Current AI agent frameworks commit early to a single interaction protocol, a fixed tool integration strategy, and static user models, limiting their deployment across diverse interaction paradigms. To address these constraints, we introduce STEM Agent (Self-adapting, Tool-enabled, Extensible, Multi-agent), a modular ar…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 12.0
2026-03-22 · Liang Ding
Research Track B · General AI
LLM agents fail on the majority of real-world tasks -- GPT-4o succeeds on fewer than 15% of WebArena navigation tasks and below 55% pass@1 on ToolBench (Zhou et al., 2024; Qin et al., 2024) -- yet every failed trajectory is routinely discarded, wasting the dominant source of collected experience. We introduce AgentHER,…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 12.0
2026-03-24 · Wanying Mo, Jijia Lai, Xiaoming Wang
Research Track B · General AI
Browser agents built on LLMs can act in web interfaces, yet most remain confined to a single chat surface (e.g., a sidebar). This mismatch with real browsing can increase context-switching and reduce user control. We introduce \textbf{IntentWeave}, a design space of ten spatial paradigms for embedding agentic assistanc…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 11.8
2026-02-01 · Alberto Castelo, Zahra Zanjani Foumani, Ailin Fan, Keat Yang Koay, Vibhor Malik, Yuanzheng Zhu, Han Li, Meysam Feghhi, Ronie Uliana, Shuang Xie, Zhaoyu Zhang, Angelo Ocana Martins, Mingyu Zhao, Francis Pelland, Jonathan Faerman, Nikolas LeBlanc, Aaron Glazer, Andrew McNamara, Lingyun Wang, Zhong Wu
Research Track B · General AI
A/B testing remains the gold standard for evaluating e-commerce UI changes, yet it diverts traffic, takes weeks to reach significance, and risks harming user experience. We introduce SimGym, a scalable system for rapid offline A/B testing using traffic-grounded synthetic buyers powered by Large Language Model agents op…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 11.6
2026-03-26 · Moiz Sadiq Awan, Muhammad Haris Noor, Muhammad Salman Munaf
Research Track A · General AI
Automated benchmarks dominate the evaluation of large language models, yet no systematic study has compared user satisfaction, adoption motivations, and frustrations across competing platforms using a consistent instrument. We address this gap with a cross-platform survey of 388 active AI chat users, comparing satisfac…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.6
2026-03-26 · Sk Miraj Ahmed, Xi Yu, Yunqi Li, Yuewei Lin, Wei Xu
General AI
Accurate biodiversity identification from large-scale field data is a foundational problem with direct impact on ecology, conservation, and environmental monitoring. In practice, the core task is taxonomic prediction - inferring order, family, genus, or species from imperfect inputs such as specimen images, DNA barcode…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.6
2026-03-26 · Lei Wang, YuXin Song, Ge Wu, Haocheng Feng, Hang Zhou, Jingdong Wang, Yaxing Wang, jian Yang
General AI
Reference-to-video (R2V) generation is a controllable video synthesis paradigm that constrains the generation process using both text prompts and reference images, enabling applications such as personalized advertising and virtual try-on. In practice, existing R2V methods typically introduce additional high-level seman…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.5
2026-03-11 · Hyungjoo Chae, Jungsoo Park, Alan Ritter
Research Track B · General AI
Training autonomous web agents is fundamentally limited by the environments they learn from: real-world websites are unsafe to explore, hard to reset, and rarely provide verifiable feedback. We propose VeriEnv, a framework that treats language models as environment creators, automatically cloning real-world websites in…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 11.4
2026-03-14 · Seokmin Lee, Yunghee Lee, Byeonghyun Pak, Byeongju Woo
General AI
For robotic agents operating in dynamic environments, learning visual state representations from streaming video observations is essential for sequential decision making. Recent self-supervised learning methods have shown strong transferability across vision tasks, but they do not explicitly address what a good visual …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 11.4
2026-03-15 · Lok-Lam Ieong, Chia-Chien Chen, Chih-Kai Yang, Yu-Han Huang, An-Yu Cheng, Hung-yi Lee
General AI
Chain-of-thought (CoT) prompting has been extended to large audio-language models (LALMs) to elicit reasoning, yet enhancing its effectiveness without training remains challenging. We study inference-time model steering as a training-free approach to improve LALM reasoning. We introduce three strategies using diverse i…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 11.0
2026-01-14 · Saber Zerhoudi, Michael Granitzer
Research Track B · General AI
A fundamental tension exists between the demand for sophisticated AI assistance in web search and the need for user data privacy. Current centralized models require users to transmit sensitive browsing data to external services, which limits user control. In this paper, we present a browser extension that provides a vi…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 11.0
2026-03-04 · Soroush Nasiriany, Sepehr Nasiriany, Abhiram Maddukuri, Yuke Zhu
Research Track A · General AI
Recent advances in robot learning have accelerated progress toward generalist robots that can perform everyday tasks in human environments. Yet it remains difficult to gauge how close we are to this vision. The field lacks a reproducible, large-scale benchmark for systematic evaluation. To fill this gap, we present Rob…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.8
2025-12-08 · Alisha Ukani, Hamed Haddadi, Ali Shahin Shamsabadi, Peter Snyder
Research Track B · General AI
This paper presents a systematic evaluation of the privacy behaviors and attributes of eight recent, popular browser agents. Browser agents are software that automate Web browsing using large language models and ancillary tooling. However, the automated capabilities that make browser agents powerful also make them high…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.8
2026-03-03 · Patrick J. Mineault, Thomas L. Griffiths, Sean Escola
Research Track A · General AI
We propose that the jagged intelligence landscape of modern AI systems arises from a missing training signal that we call "cognitive dark matter" (CDM): brain functions that meaningfully shape behavior yet are hard to infer from behavior alone. We identify key CDM domains-metacognition, cognitive flexibility, episodic …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.8
2026-03-23 · Donald Shenaj, Federico Errica, Antonio Carta
General AI
Low Rank Adaptation (LoRA) is the de facto fine-tuning strategy to generate personalized images from pre-trained diffusion models. Choosing a good rank is extremely critical, since it trades off performance and memory consumption, but today the decision is often left to the community's consensus, regardless of the pers…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.8
2026-03-24 · Wei Sun, Ting Wang, Xinran Tian, Wanshun Lan, Xuhan Feng, Haoyue Li, Fangxin Wang
General AI
Existing LLM-based Kubernetes diagnostic systems cannot learn from operational experience, operating on static knowledge bases without improving from past resolutions. We present MetaKube, an experience-aware LLM framework through three synergistic innovations: (1) an Episodic Pattern Memory Network (EPMN) that abstrac…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 10.6
2026-03-26 · Xiaofeng Mao, Shaohao Rui, Kaining Ying, Bo Zheng, Chuanhao Li, Mingmin Chi, Kaipeng Zhang
General AI
Autoregressive video diffusion models have demonstrated remarkable progress, yet they remain bottlenecked by intractable linear KV-cache growth, temporal repetition, and compounding errors during long-video generation. To address these challenges, we present PackForcing, a unified framework that efficiently manages the…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.8
2026-03-25 · Terry Chen, Zhifan Ye, Bing Xu, Zihao Ye, Timmy Liu, Ali Hassani, Tianqi Chen, Andrew Kerr, Haicheng Wu, Yang Xu, Yu-Jung Chen, Hanfeng Chen, Aditya Kane, Ronny Krashinsky, Ming-Yu Liu, Vinod Grover, Luis Ceze, Roger Bringmann, John Tran, Wei Liu, Fung Xie, Michael Lightstone, Humphrey Shi
General AI
Agentic Variation Operators (AVO) are a new family of evolutionary variation operators that replace the fixed mutation, crossover, and hand-designed heuristics of classical evolutionary search with autonomous coding agents. Rather than confining a language model to candidate generation within a prescribed pipeline, AVO…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.8
2026-03-26 · Mohamed Eltahir, Ahmed O. Ibrahim, Obada Siralkhatim, Tabarak Abdallah, Sondos Mohamed
Research Track A · General AI
Vision-Language Models (VLMs) are powerful open-set reasoners, yet their direct use as anomaly detectors in video surveillance is fragile: without calibrated anomaly priors, they alternate between missed detections and hallucinated false alarms. We argue the problem is not the VLM itself but how it is used. VLMs should…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.6
2026-03-25 · Yupei Li, Shuaijie Shao, Manuel Milling, Björn Schuller
General AI
Current audio deepfake detection has achieved remarkable performance using diverse deep learning architectures such as ResNet, and has seen further improvements with the introduction of large models (LMs) like Wav2Vec. The success of large language models (LLMs) further demonstrates the benefits of scaling model parame…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.6
2026-03-26 · Mingmeng Geng, Yuhang Dong, Thierry Poibeau
General AI
Through an analysis of arXiv papers, we report several shifts in word usage that are likely driven by large language models (LLMs) but have not previously received sufficient attention, such as the increased frequency of "beyond" and "via" in titles and the decreased frequency of "the" and "of" in abstracts. Due to the…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.6
2026-03-26 · Jiabin Hua, Hengyuan Xu, Aojie Li, Wei Cheng, Gang Yu, Xingjun Ma, Yu-Gang Jiang
General AI
Fine-grained facial expression editing has long been limited by intrinsic semantic overlap. To address this, we construct the Flex Facial Expression (FFE) dataset with continuous affective annotations and establish FFE-Bench to evaluate structural confusion, editing accuracy, linear controllability, and the trade-off b…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.6
2026-03-26 · Haoyan Yang, Mario Xerri, Solha Park, Huajian Zhang, Yiyang Feng, Sai Akhil Kogilathota, Jiawei Zhou
General AI
As large language models (LLMs) continue to advance, improving them solely through human supervision is becoming increasingly costly and limited in scalability. As models approach human-level capabilities in certain domains, human feedback may no longer provide sufficiently informative signals for further improvement. …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.5
2026-02-27 · Abisheka Pitumpe, Amir Rahmati
Research Track B · General AI
Job-based smishing scams, where victims are recruited under the guise of remote job opportunities, represent a rapidly growing and understudied threat within the broader landscape of online fraud. In this paper, we present Anansi, the first scalable, end-to-end measurement pipeline designed to systematically engage wit…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 9.0
2026-03-23 · Yuze Qin, Qingyong Li, Zhiqing Guo, Wen Wang, Yan Liu, Yangli-ao Geng
General AI
Precipitation nowcasting is critical for disaster mitigation and aviation safety. However, radar-only models frequently suffer from a lack of large-scale atmospheric context, leading to performance degradation at longer lead times. While integrating meteorological variables predicted by weather foundation models offers…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.8
2026-03-23 · Alexandra Zelenin, Alexandra Zhuravlyova
General AI
Weight-Decomposed Low-Rank Adaptation (DoRA) extends LoRA by decoupling weight magnitude from direction, but its forward pass requires the row-wise norm of W + sBA, a computation that every major framework we surveyed implements by materializing the dense [d_out, d_in] product BA. At d_in = 8192 and rank r = 384, a sin…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 8.8
2026-03-25 · Matan Ben-Yosef, Tavi Halperin, Naomi Ken Korem, Mohammad Salama, Harel Cain, Asaf Joseph, Anthony Chen, Urska Jelercic, Ofir Bibi
General AI
Controlling video and audio generation requires diverse modalities, from depth and pose to camera trajectories and audio transformations, yet existing approaches either train a single monolithic model for a fixed set of controls or introduce costly architectural changes for each new modality. We introduce AVControl, a …
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.8
2026-03-25 · Qijia He, Xunmei Liu, Hammaad Memon, Ziang Li, Zixian Ma, Jaemin Cho, Jason Ren, Daniel S Weld, Ranjay Krishna
General AI
Scalable Vector Graphics (SVG) are an essential format for technical illustration and digital design, offering precise resolution independence and flexible semantic editability. In practice, however, original vector source files are frequently lost or inaccessible, leaving only "flat" rasterized versions (e.g., PNG or …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.6
2026-03-26 · Giulio Pisaneschi, Pierpaolo Serio, Estelle Gerbier, Andrea Dan Ryals, Lorenzo Pollini, Mario G. C. A. Cimino
General AI
This paper presents an experimental platform for studying intentional-state attribution toward a non-humanoid robot. The system combines a simulated robot, realistic task environments, and large language model-based explanatory layers that can express the same behavior in mentalistic, teleological, or mechanistic terms…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.6
2026-03-26 · Cole Walsh, Rodica Ivan
General AI
Automated systems have been widely adopted across the educational testing industry for open-response assessment and essay scoring. These systems commonly achieve performance levels comparable to or superior than trained human raters, but have frequently been demonstrated to be vulnerable to the influence of construct-i…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.8
2026-03-23 · Ulugbek Shernazarov, Rostislav Svitsov, Bin Shi
General AI
Fine-tuning large language models for domain-specific tasks such as medical text summarization demands substantial computational resources. Parameter-efficient fine-tuning (PEFT) methods offer promising alternatives by updating only a small fraction of parameters. This paper compares three adaptation approaches-Low-Ran…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 7.8
2026-03-23 · Seunghan Lee, Jun Seo, Jaehoon Lee, Sungdong Yoo, Minjae Kim, Tae Yoon Lim, Dongwan Kang, Hwanil Choi, SoonYoung Lee, Wonbin Ahn
General AI
Recent advances in multimodal learning have motivated the integration of auxiliary modalities such as text or vision into time series (TS) forecasting. However, most existing methods provide limited gains, often improving performance only in specific datasets or relying on architecture-specific designs that limit gener…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.6
2026-03-26 · Chengshuai Yang
General AI
Designing a computational imaging system -- selecting operators, setting parameters, validating consistency -- requires weeks of specialist effort per modality, creating an expertise bottleneck that excludes the broader scientific community from prototyping imaging instruments. We introduce spec.md, a structured specif…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.6
2026-03-26 · Yannick Roy
General AI
Code production is now a commodity; the bottleneck is knowing what to build and proving it works. We present the Kitchen Loop, a framework for autonomous, self-evolving software built on a unified trust model: (1) a specification surface enumerating what the product claims to support; (2) 'As a User x 1000', where an L…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-03-22 · Shih-Wen Liu, Yen-Chang Chen, Wei-Ta Chu, Fu-En Yang, Yu-Chiang Frank Wang
General AI
Multi-task learning (MTL) aims to enable a single model to solve multiple tasks efficiently; however, current parameter-efficient fine-tuning (PEFT) methods remain largely limited to single-task adaptation. We introduce \textbf{Free Sinewich}, a parameter-efficient multi-task learning framework that enables near-zero-c…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 6.8
2026-03-25 · Danil Tokhchukov, Aysel Mirzoeva, Andrey Kuznetsov, Konstantin Sobolev
General AI
In this paper, we uncover the hidden potential of Diffusion Transformers (DiTs) to significantly enhance generative tasks. Through an in-depth analysis of the denoising process, we demonstrate that introducing a single learned scaling parameter can significantly improve the performance of DiT blocks. Building on this i…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 6.8
2026-03-25 · Yihan Wang, Jia Deng
General AI
We introduce WAFT-Stereo, a simple and effective warping-based method for stereo matching. WAFT-Stereo demonstrates that cost volumes, a common design used in many leading methods, are not necessary for strong performance and can be replaced by warping with improved efficiency. WAFT-Stereo ranks first on ETH3D, KITTI a…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 6.8
2026-03-26 · Niccolò Cavagnero, Narges Norouzi, Gijs Dubbelman, Daan de Geus
General AI
Vision Foundation Models (VFMs) pre-trained at scale enable a single frozen encoder to serve multiple downstream tasks simultaneously. Recent VFM-based encoder-only models for image and video segmentation, such as EoMT and VidEoMT, achieve competitive accuracy with remarkably low latency, yet they require finetuning th…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 6.8
2026-03-26 · Ligong Han, Hao Wang, Han Gao, Kai Xu, Akash Srivastava
General AI
Block-diffusion language models offer a promising path toward faster-than-autoregressive generation by combining block-wise autoregressive decoding with within-block parallel denoising. However, in the few-step regime needed for practical acceleration, standard confidence-thresholded decoding is often brittle: aggressi…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 6.6
2026-03-25 · Xiaoyu Tang, Jun Dong, Jintao Cheng, Rui Fan
General AI
Remote sensing visual grounding (RSVG) aims to localize specific targets in remote sensing images using natural language expressions. However, existing methods are restricted to single-sensor domains, i.e., either optical or synthetic aperture radar (SAR), limiting their real-world applicability. In this paper, we intr…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 6.6
2026-03-26 · Koichi Takahashi
General AI
This paper introduces Conchordal, a bio-acoustic instrument for generative composition whose sonic agents are governed by artificial life dynamics within a psychoacoustic fitness landscape. The system is built on Direct Cognitive Coupling (DCC), a design principle requiring that generative dynamics operate directly wit…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 6.6
2026-03-26 · Dingxi Zhang, Fangjinhua Wang, Marc Pollefeys, Haofei Xu
General AI
Accurate estimation of large displacement optical flow remains a critical challenge. Existing methods typically rely on iterative local search or/and domain-specific fine-tuning, which severely limits their performance in large displacement and zero-shot generalization scenarios. To overcome this, we introduce MegaFlow…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 6.6
2026-03-26 · Kubra Aksoy, Adnan Rashid, Osman Hasan, Sofiene Tahar
General AI
Network topology matrices are algebraic representations of graphs that are widely used in modeling and analysis of various applications including electrical circuits, communication networks and transportation systems. In this paper, we propose to use Higher-Order-Logic (HOL) based interactive theorem proving to formali…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 6.6
2026-03-26 · Yawen Luo, Xiaoyu Shi, Junhao Zhuang, Yutian Chen, Quande Liu, Xintao Wang, Pengfei Wan, Tianfan Xue
General AI
Multi-shot video generation is crucial for long narrative storytelling, yet current bidirectional architectures suffer from limited interactivity and high latency. We propose ShotStream, a novel causal multi-shot architecture that enables interactive storytelling and efficient on-the-fly frame generation. By reformulat…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.8
2026-03-24 · Jingchen Ni, Quan Zhang, Dan Jiang, Keyu Lv, Ke Zhang, Chun Yuan
General AI
Existing camouflage object detection (COD) methods typically rely on fully-supervised learning guided by mask annotations. However, obtaining mask annotations is time-consuming and labor-intensive. Compared to fully-supervised methods, existing weakly-supervised COD methods exhibit significantly poorer performance. Eve…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 5.4
2026-03-15 · Jaeyo Shin, Jiwook Kim, Hyunjung Shim
General AI
Representation Alignment (REPA) has emerged as a simple way to accelerate Diffusion Transformers training in latent space. At the same time, pixel-space diffusion transformers such as Just image Transformers (JiT) have attracted growing attention because they remove a dependency on a pretrained tokenizer, and then avoi…
- Review
- pending
- Role
- unreviewed
- Read
- later