arxiv
Score 36.4
2026-06-22 · Ulas Berk Karli, Tesca Fitzgerald
Research Track A · General AI
Vision-Language-Action (VLA) models are commonly fine-tuned through passive imitation learning, where additional demonstrations are collected for tasks where the policy performs poorly. This approach incurs several downsides: it requires the robot to fail before data collection is triggered, provides little guidance ab…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 24.4
2026-06-20 · Mohammed Rawhani, Dervis Karaboga, Ozkan Ufuk Nalbantoglu, Alper Basturk, Bahriye Akay
Research Track A · General AI
Pre-trained language models struggle when applied to new domains, as full fine-tuning is computationally expensive and prone to catastrophic forgetting. This study addresses this challenge by presenting a novel parameter-efficient strategy for unsupervised domain adaptation that combines custom PEFT architectures with …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 23.9
2026-06-22 · Amrita Singh, Rishabh Jha
Research Track A
Medical vision-language models (VLMs) such as BiomedCLIP generalize broadly, but adapting them to a clinical service is as much a safety problem as an accuracy one. Updating a deployed model for a new imaging modality can fail silently in two ways that harm patients: it can forget modalities it already handled (catastr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.2
2026-06-23 · Xinyu Mao, Yuhui Zeng, Xiaokun Liu, Wenyu Qin, Meng Wang, Xin Tao, Pengfei Wan, Xiaohan Xing, Max Meng
General AI
Cinematographic captioning aims to describe how a video is filmed using professional film-language concepts such as camera movement, shot size, depth of field, composition, and shooting angle. This capability is important for fine-grained video understanding and controllable movie-quality video generation, yet remains …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 21.4
2026-06-23 · Shiding Zhu, Yudi Qi, Yajie Wang, Jiaze Li, Chao Song, Yaorui Shi, Yibo Miao, Hanqi Gao, Kai Zhang
General AI
Experience-driven self-evolution is critical for large language model (LLM) agents to improve through open-world interaction. However, existing experience learning methods mostly rely on single-agent loops, where the same agent executes tasks, summarizes outcomes, and determines memory content. This setup makes agents …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 21.4
2026-06-23 · Yuxin Zuo, Zikai Xiao, Li Sheng, Fei Huang, Jianhong Tu, Yuxuan Liu, Tianyi Tang, Xiaomeng Hu, Yang Su, Qingfeng Lan, Yantao Liu, Qin Zhu, Yinger Zhang, Bowen Yu, Haiquan Zhao, Haiyang Xu, Jianxin Yang, Jiayang Cheng, Junyang Wang, Lianghao Deng, Mingfeng Xue, Tianyi Bai, Yang Fan, Yubo Ma, Yucheng Li, Zeyu Cui, Zhihai Wang, Zhihui Xie, Zhuorui Ye, An Yang, Dayiheng Liu, Jingren Zhou, Ning Ding
General AI
A world model predicts environment dynamics based on current observations and actions, serving as a core cognitive mechanism for reasoning and planning. In this work, we investigate how world modeling based on language models can further push the boundaries of general agents. (i) We first focus on building foundation m…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.2
2026-06-23 · Wei Zhou, Xuanhe Zhou, Shaokun Han, Hongming Xu, Guoliang Li, Zhiyu Li, Feiyu Xiong, Fan Wu
Research Track A · General AI
Memory for large language model (LLM) agents has rapidly evolved from simple retrieval-augmented mechanisms into a data management system that supports persistent information storage, retrieval, update, consolidation, and dynamic lifecycle governance throughout agent execution. Despite this evolution, existing evaluati…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.2
2026-06-23 · Wenxin Wang, Bo Zhang, Feng Chen, Zixuan Wang, Wen Li, Changsheng Li, Yinjie Lei
General AI
Recent advancements have explored agentic zero-shot 3D understanding by reformulating it as video keyframe understanding with Multimodal Large Language Models (MLLMs). However, existing methods face an intrinsic bottleneck due to the finite observation perspectives inherent in videos and the implicit perception of 3D s…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.8
2026-06-16 · Xuelong Dai, Jianyu Ma, Boyang Ma, Biwei Yan, Yijun Yang, Yue Zhang
Research Track B · General AI
Multimodal Large Language Model (MLLM)-based web agents provide practical, high-precision solutions for visual browser automation; however, they inherently expand the attack surface, introducing novel vision-based vulnerabilities. Existing adversarial evaluations targeting these agents frequently rely on permissive thr…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.5
2026-06-19 · Yu Luo
Research Track A · General AI
Social intelligence is a core competency for language agents, yet current research primarily focuses on static capability evaluation rather than how these skills are continuously shaped and accumulated. This gap calls for a shift toward sustainable learning paradigms. Currently, two methodological pain points exist: so…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.9
2026-06-22 · Chuangxin Zhao, Canran Xiao, Siyuan Ma, Mengyao Lyu, Yanbiao Ma, Jun Xia, Guiguang Ding, Yang Liu
Research Track A · General AI
Multimodal large language models (MLLMs) are increasingly required to adapt to non-stationary streams of visual domains, question types, and user instructions, yet continual fine-tuning often causes severe forgetting of previously acquired multimodal skills. Existing continual vision-language methods mainly preserve ou…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.2
2026-06-23 · Hovhannes Tamoyan, Sean Narenthiran, Erik Arakelyan, Mira Mezini, Boris Ginsburg
General AI
LLM agents solve repository-level coding tasks through multi-turn tool use, but utilize half their budget on locating faults before editing. Dedicated localization frameworks have emerged, yet are still evaluated as file retrieval rather than actionable diagnosis, producing locations without the diagnostic context a re…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.2
2026-06-23 · Xiaowei Gao, Pengxiang Li, Yitai Cheng, Ruihan Xu, James Haworth, Stephen Law, Yun Ye
General AI
Recent multimodal large language models (MLLMs) have shown strong potential for autonomous driving scene understanding, yet existing methods still face a fundamental trade-off between temporal reasoning and spatial precision. Models that rely on single-frame or low-resolution inputs often miss small, distant, or partia…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.5
2026-06-19 · Jianwei Lou
Research Track A · General AI
Continual learning that is gradient-free, local, online, and append-only is attractive for edge and streaming deployment, but its value is usually argued informally. We give a provable account on recurring-regime streams. Given segmentation, a warm-start library learner attains amortized recovery cost $O\!\big(KD/\vare…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.0
2026-06-19 · Jiacheng Wang, Xinjia He, Qi Ding, Yutao Yang, Jie Zhou, Liyang Yu, Liang Dou, Qin Chen
Research Track A · General AI
Continual learning (CL) is commonly studied under the assumption that sequential tasks are semantically related or structurally similar. However, in highly heterogeneous settings, where tasks differ substantially in reasoning patterns and input-output formats, existing methods often suffer from catastrophic forgetting …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.4
2026-06-23 · Chenhao Dang, Dantong Zhu, Jun Yang, Conghui He, Weijia Li
General AI
Multimodal misinformation detection is increasingly important because viral posts now combine long multilingual narratives, several images, mixed provenance, and subtle text--image framing errors. Existing benchmarks and methods remain poorly matched to this setting: they usually isolate short captions, single images, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.2
2026-06-23 · Linpeng Huang, Weixing Chen, Zexin Chen, Yang Liu, Liang Lin
General AI
Recent advances in Video Large Language Models (Video-LLMs) have yielded promising performance on video question answering (VideoQA). Nevertheless, existing benchmarks are predominantly evaluated through answer correctness, while the grounding of predictions in relevant video evidence remains largely unexamined. This d…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.2
2026-06-23 · Zixuan Li, Haokun Lin, Yicheng Xiao, Zhiwei Li, Xinyang Song, Zelong Zheng, Yong He, Heng Yao, Ke Ding, Chao Yu, Chuan Yuan, Qi Li, Zhenan Sun
General AI
Unified multi-modal large language models (MLLMs) have achieved strong text-to-image generation quality, but still struggle with structure-aware prompt following, where object counts, spatial relations, attribute bindings, and coarse layouts must be preserved. We attribute this limitation in part to the entanglement of…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.2
2026-06-23 · Ali Pourghasemi Fatideh, Wilder Baldwin, Maria Dhakal, Collin McMillan, Sepideh Ghanavati
General AI
LLM-based dialogue assistants have become mainstream tools for software developers, yet current evaluation benchmarks focus exclusively on functional correctness. This leaves a critical gap in assessing the quality and accuracy of these conversations when handling Non-Functional Requirements (NFRs), which are inherentl…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.0
2026-06-11 · Shihao Xu, Tiancheng Zhou, Jiatong Ma, Yanli Ding, Yiming Yan, Ming Xiao, Guoyi Li, Haiyang Geng, Yunyun Han, Jianhua Chen, Yafeng Deng
General AI
Mental disorders are highly prevalent worldwide, but the shortage of psychiatrists and the inherent subjectivity of interview-based diagnosis create substantial barriers to timely and consistent mental-health assessment. Progress in AI-assisted psychiatric diagnosis is constrained by the absence of benchmarks that simu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.9
2026-06-20 · Rishi Srivastava
Research Track B · General AI
We introduce CFAgentBench, a reproducible, self-hostable environment and benchmark for autonomous construction-finance agents: a CFO/controller-class agent operating across the real software stack a US construction finance team runs - ERP, project management, email, documents, pay applications, payroll, certified payro…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.8
2026-06-18 · To Eun Kim, Xuhong He, Dishank Jain, Ambuj Agrawal, Negar Arabzadeh, Fernando Diaz
Research Track B · General AI
The decentralized deployment of LLM agents with diverse capabilities across diverse tasks motivates infrastructure for knowledge sharing across heterogeneous agent populations. Just as search engines index human-generated artifacts to support human problem solving, retrieval systems can organize agent-generated artifac…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.5
2026-06-16 · Aagam Sogani, Botao Rui, Swetha Vaidyanathan, Rishi Agarwal, Minghao Yan, Shivaram Venkataraman
Research Track B · General AI
Long-horizon web agents often fail in ways hidden by final-answer evaluation: they may visit useful pages, produce a well-formed answer, and terminate confidently while still missing fields, over-including unsupported items, or relying on stale evidence. We study these failures with Parallel WebBench, a parallel web-ex…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.4
2026-06-22 · Yuting Li, Weihang Fang, Haoyuan Gao, Linghe Kong, Yexin Li, Lichao Sun, Weiran Huang
Research Track A · General AI
The rapid deployment of Vision-Language Models (VLMs) in dynamic environments necessitates the ability to learn continuously without forgetting. However, traditional continual learning (CL) settings often rely on white-box paradigms, which is increasingly invalidated by the shift toward cloud-hosted models. In this pap…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.4
2026-06-23 · Yixuan Tang, Yi Yang
General AI
Dense retrieval embedding models are a fundamental component of modern retrieval-based AI systems. Most dense retrievers are trained with contrastive objectives, which require labeled positive and negative document pairs that are often costly and difficult to obtain. In this work, we investigate whether the autoregress…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.2
2026-06-23 · Haorui Ji, Weizhe Liu, Hongdong Li, Hengkai Guo
General AI
Sparse voxel representation has emerged as a scalable foundation for image-to-3D Gaussian Splatting (3DGS) generation, yet current methods struggle to preserve high-frequency visual details of input images due to two structural bottlenecks. First, they adopt discriminative 2D features optimized for semantic abstraction…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.2
2026-06-23 · Lavinia Ghita, Dhruv Desai, Ioana Boier
General AI
Large Language Models (LLMs) achieve strong performance across a growing range of domains, yet their scale poses deployment challenges in applications where latency and cost constraints are critical. This paper derives empirical scaling laws for domain-specific LLM compression, quantifying how in-domain and general kno…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.4
2026-06-22 · Subarnaduti Paul, Yohan Jung, Mohammad Emtiyaz Khan, Siddharth Swaroop, Thomas Möllenhoff, Martin Mundt
Research Track A · General AI
Continual learning remains a major challenge for modern deep networks, partly because commonly used optimizers lack inherent mechanisms for continual adaptation. One such natural mechanism is fast and slow adaptation to balance stability and plasticity. This mechanism has deep roots in neuroscience and biology, but the…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.4
2026-06-23 · Xirui Li, Zhe Liu, Xiaoqing Ye, Wenhua Han, Yifeng Pan, Junyu Han, Hengshuang Zhao
General AI
Multimodal driving planning faces a long-standing tension between two paradigms: scoring-based methods benefit from dense reward supervision but are confined to a fixed action vocabulary, while anchor-based methods generate proposals dynamically yet suffer from sparse supervision constrained to a single ground-truth tr…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.4
2026-06-23 · Chenhao Dang, Jing Ma, Mingjie Liao
General AI
The composition of training data, governed by the diversity of sources and their mixing strategy, is a cornerstone of Large Language Model (LLM) pre-training. Online Data Mixing (ODM), the technique of adaptively adjusting data mixtures during training, has emerged as a promising direction to improve efficiency. Howeve…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.2
2026-06-23 · Anand Kamat, Daniel Blake, Brent M. Werness
General AI
Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse tasks, yet they remain prone to generating hallucinations. Detecting these hallucinations is critical for deploying LLMs reliably in high-stakes applications. We present Grad Detect, a gradient-based approach for predicting hallucinat…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.2
2026-06-23 · Simone Gallivanone, Hossein Khodadadi, Mauro Dore, Mauro Medda, Nicola Franco
General AI
We introduce a large-scale, open-source dataset of pre-generated adversarial attacks for vision-language models (VLMs). The dataset is designed to be diverse, representative, and practical, extending existing benchmarks by covering 10 high-level categories and 55 subcategories of harmful intents. Our primary goal is to…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.4
2026-06-22 · Shanhui Zhao, Jiacheng Liu, Guohong Liu, Jichao Yan, Jialei Ye, Yuhao Yang, Hao Wen, Shizuo Tian, Yizhen Yuan, Yuxuan Chen, Yunxin Liu, Ju Ren, Ya-Qin Zhang, Chao Huang, Yao Guo, Yuanchun Li
General AI
AI agents are driving a new software paradigm, with the ability to autonomously call tools, extract information, manage memory, and complete tasks that span applications and data sources. Most existing end-user operating systems, however, are designed for application-centric workflows and offer little native support fo…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.4
2026-06-23 · Jiayi Lei, Yuandong Pu, Xingyu Han, Rongpeng Zhu, Jing Xu, Jinyao Wang, Zijian Zhou, Bin Fu, Yuewen Cao, Yihao Liu, Yongsheng Li
General AI
Text-to-image (T2I) generation models have achieved remarkable progress in producing visually realistic images from natural language prompts. Yet it remains unclear whether their success reflects genuine causal understanding or sophisticated pattern matching over visual-textual correlations. Inspired by Russell's induc…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.4
2026-06-23 · Yuru Wang, Lejun Cheng, Yuxin Zuo, Sihang Zeng, Bingxiang He, Che Jiang, Junlin Yang, Yuchong Wang, Kaikai Zhao, Weifeng Huang, Kai Tian, Zhenzhao Yuan, Jincheng Zhong, Weizhi Wang, Ning Ding, Bowen Zhou, Kaiyan Zhang
General AI
We introduce NatureBench, a cross-discipline benchmark of 90 tasks distilled from peer-reviewed Nature-family publications, designed to evaluate whether AI coding agents can move beyond reproduction toward discovery on real scientific problems. NatureBench is built on NatureGym, an automated pipeline that constructs a …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.2
2026-06-23 · Qi Chen, Wenxuan Li, Pedro R. A. S. Bassi, Xinze Zhou, Jakob Wasserthal, Ibrahim Ethem Hamamci, Sezgin Er, Ashwin Kumar, Yiwen Ye, Yuhan Wang, Yuyin Zhou, Akshay S. Chaudhari, Curtis Langlotz, Kang Wang, Yang Yang, Alan L. Yuille, Zongwei Zhou
General AI
Artificial intelligence (AI) has achieved remarkable success in medical imaging, but it is widely recognized that these models often perform inconsistently across real-world clinical settings. Such inconsistencies occur when patient demographics and imaging protocols vary, for example, in detecting small tumors, analyz…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.0
2026-06-18 · Dayeon Kang, Hyejun Jeong, Jade Sheffey, Pubali Datta, Amir Houmansadr
Research Track B · General AI
As AI web agents proliferate, combining large language models with autonomous, browser-level control, indiscriminate content scraping by web agents has emerged as a privacy and security challenge. Existing defenses, such as robots.txt and active bot-blocking, are insufficient, as they are widely violated and easily cir…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.9
2026-06-23 · J. Fernando Hernandez-Garcia, Tomás Figliolia, Beren Millidge
Research Track A · General AI
The loss of plasticity - the ability of a network to learn new information after having already learned older information - is a fundamental challenge in creating artificial neural networks capable of continual learning. Although this phenomenon has been known for decades, it has mostly been studied in older, relativel…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.7
2026-06-18 · Guangyi Liu, Gao Wu, Congxiao Liu, Pengxiang Zhao, Liang Liu, Mading Li, Qi Zhang, Mengyan Wang, Liang Guo, Yong Liu
Research Track B · General AI
MLLM-based mobile GUI agents have made substantial progress on short-horizon tasks, yet remain unreliable on long-horizon tasks that require retaining intermediate facts across many steps and app transitions. We attribute this limitation to ReAct-style prompting, which passively accumulates per-step records, leading to…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 10.4
2026-06-23 · Zhihao Wang, Jianxiong Li, Yu Cui, Yuan Gao, Xianyuan Zhan, Junzhi Yu, Xiao Ma
General AI
Generalist value models play a pivotal role in scaling robotic policy learning from large-scale, mixed-quality data. Mathematically, accurate value estimation demands deep temporal understanding, requiring models to both ground the current belief using historical context and plan over future outcomes. However, most exi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.2
2026-06-23 · Xingjian Leng, Jaskirat Singh, Zhanhao Liang, Ethan Smith, Martin Bell, Aninda Saha, Yuhui Yuan, Liang Zheng
General AI
Diffusion transformer (DiT) research on image generation has converged to a single evaluation setup: class-conditional generation on ImageNet. While methods improve the FID and related metrics, it is increasingly unclear whether they reflect real progress in generative modeling. The natural alternative, i.e., text-to-i…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 10.2
2026-06-23 · Orest Kupyn, Goutam Bhat, Philipp Henzler, Fabian Manhardt, Christian Rupprecht, Federico Tombari
General AI
Generating explorable 3D scenes from a single image requires strong generative priors and accurate geometric representations suitable for downstream use. Current video diffusion models offer high-quality generation and implicitly encode multi-view geometric structure in latent space. However, existing feedforward laten…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.0
2026-06-18 · Ganlin Yang, Zhangzheng Tu, Yuqiang Yang, Sitong Mao, Junyi Dong, Tianxing Chen, Jiaqi Peng, Jing Xiong, Jiafei Cao, Jifeng Dai, Wengang Zhou, Yao Mu, Tai Wang
General AI
Memory remains a critical bottleneck for long-horizon robotic manipulation, as standard Vision-Language-Action (VLA) policies often fail when task-relevant cues become occluded or unobservable over time. While existing memory-augmented methods utilize historical context, they either suffer from severe information bottl…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.2
2026-06-20 · Colin Samplawski, Ramneet Kaur, Manoj Acharya, Anirban Roy, Adam D. Cobb
General AI
Large multi-modal language models are increasingly deployed in high-stakes domains, making well-calibrated uncertainty essential. Traditional Bayesian methods approximate posteriors over all model weights, which becomes intractable for modern large models. For this reason, recent work instead considers Bayesian low-ran…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.2
2026-06-23 · Laura Colazzo, Giuseppe Anzillotti
General AI
Agentic Web Browsers (AWBs), powered by Large Language Models (LLMs), are emerging as autonomous systems capable of navigating the Web on behalf of users. Beyond enhancing productivity, they could also offer significant promise as Assistive Technologies (ATs) for visually-impaired individuals, transforming web interact…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.2
2026-06-23 · Negin Raoof, Richard Zhuang, Marianna Nezhurina, Etash Guha, Atula Tejaswi, Ryan Marten, Charlie F. Ruan, Tyler Griggs, Alexander Glenn Shaw, Hritik Bansal, E. Kelly Buchanan, Artem Gazizov, Reinhard Heckel, Chinmay Hegde, Sankalp Jajee, Daanish Khazi, Emmanouil Koukoumidis, Xiangyi Li, Hange Liu, Shlok Natarajan, Harsh Raj, Nicholas Roberts, Ethan Shen, Nishad Singhi, Michael Siu, Ashima Suvarna, Hanwen Xing, Patrick Yubeaton, Robert Zhang, Leon Liangyu Chen, Xiaokun Chen, Steven Dillmann, Saadia Gabriel, Xunyi Jiang, Anurag Kashyap, Boxuan Li, Yein Park, Minh Pham, Sujay Sanghavi, Lin Shi, Ke Sun, Yixin Wang, Zhiwei Xu, Erica Zhang, Siyan Zhao, Wanjia Zhao, Jenia Jitsev, Alex Dimakis, Benjamin Feuer, Ludwig Schmidt
General AI
Agentic language models dramatically expand the applications of AI yet little is publicly known about how to curate training data for broadly capable agents. Existing open efforts such as SWE-Smith, SERA, and Nemotron-Terminal typically target a single benchmark, leaving open the question of how to train models that ge…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.2
2026-06-23 · Yikai Lu, Yifei Wu, Xinyu Lu, Tongxin Li
General AI
In the big-world regime, agents cannot be universally capable and their ability is inevitably specialized across a world model in pieces. Consequently, standard uniform guarantees fail to distinguish between the understanding of critical bottlenecks and irrelevant failures. We first formalize this limitation by proving…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.2
2026-06-23 · Tian Zheng, Kai-Tai Hsu
General AI
Agentic data analysis systems produce rich outputs, including code, numerical results, and verbal diagnostics. This makes them more challenging to evaluate than single-turn LLM responses. It is therefore necessary to distinguish genuine disagreement between an agent's output and a ground-truth answer from grading artif…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.2
2026-06-23 · Marvin Rüdt, Hao Pang, Constantin Enke, Zäzilia Seibold, Kai Furmans
General AI
Autonomous mobile robots operating in intralogistics environments rely on geometric maps for localization and navigation, but lack semantic understanding of objects and their contextual properties. We present a contextual semantic mapping pipeline that combines SLAM-based geometric mapping, SAM-based instance segmentat…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.2
2026-06-20 · Haidong Huang, Xixin Zhao, Yaohua Zhou, Jiayu Song, Jiayi Zhang, Jun Ma, Haiyue Zhu, Xiaocong Li
General AI
Diffusion models excel at capturing multi-modal action distributions in robot imitation learning. However, in multi-task and long-horizon scenarios, monolithic architectures lack structural generalization capabilities, suffering from gradient conflicts between distinct semantic sub-stages. While pure data-driven Mixtur…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 7.2
2026-06-23 · Anna Fang
General AI
Poorly designed interventions or those deployed without adequate safeguards can harm the communities they aim to serve, thus exacerbating existing vulnerabilities and leaving individuals unsupported. This is especially the case for the mental health context, where there is a growing trend of relying on technological in…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 7.0
2026-06-18 · Luca Zedda, Davide Antonio Mura, Cecilia Di Ruberto, Maurizio Atzori, Muhammed Furkan Dasdelen, Carsten Marr, Andrea Loddo
General AI
Attention-based Multiple Instance Learning aggregators in medical imaging are prone to attention concentration, producing overconfident and unstable predictions. We introduce QG-MIL, a gated transformer aggregator that addresses this through four synergistic architectural components: RMSNorm-based pre-normalization, pe…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.8
2026-06-18 · Jyotsna Singh, Ash Black, Jeff Larsen, Scott R. Saleska
General AI
Researchers are interested in learning about Mars so that it may eventually become habitable for humans. To achieve this, there is a need for comprehensive knowledge of the planet's atmosphere, hydrology, surface chemistry, radiation environment, and spatial features through the scientific literature. These contain val…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 6.2
2026-06-23 · Zhuoren Ye, Tianyu Wo, Dinghao Xue, Mingming Zhang, Yuchen Teng, Chunming Hu, Renyu Yang
General AI
Emerging LLM services increasingly host many sparse MoE models, yet most models receive sparse requests and remain cold. This creates a GPU memory problem: model weights are stable and model-determined, while KV-cache is transient and demand-determined. Because cold models rarely reach peak KV-cache demand at the same …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.2
2026-06-23 · Maggie Wang, Lars Osterberg, Stephen Tian, Ola Shorinwa, Jiajun Wu, Mac Schwager
General AI
Vision-language-action (VLA) models can learn manipulation skills from demonstrations, but their capabilities are bounded by the skills in the training data. We present InSight, a framework that unlocks autonomous skill acquisition by rendering VLAs steerable at the primitive-action level (e.g., "move gripper to the bo…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 5.8
2026-06-18 · Prabhjot Singh, Somnath Luitel, Manmeet Singh, Josh Durkee
General AI
AI systems for peer review fail on three fronts: they train on Computer Science and Machine Learning venues alone, ignore the iterative dialogue that validates science, and evaluate on stylistic mimicry rather than real editorial judgment. We introduce FirstPass, a dataset and fine-tuned model that addresses all three.…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 5.4
2026-06-22 · Wenlong Cheng, Yuan Gan, Yunqiu Xu, Jiaxu Miao
General AI
Training Latent Diffusion Models (LDMs) within Federated Learning (FL) has attracted increasing attention due to its ability to combine the powerful generative capacity of LDMs with the privacy-preserving properties of FL. However, FL requires sharing the global model with multiple participants, which risks unauthorize…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-06-21 · Neranjan Senarath, Rohit Muralitharan, Sadia Asif
General AI
Federated low-rank adaptation methods are attractive for fine-tuning large models under communication and privacy constraints, but heterogeneous client data can induce geometric misalignment between local low-rank updates. We study whether this subspace misalignment leads to destructive aggregation and slower convergen…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-06-23 · Jooyoung Cha, Yuya Sasaki, Nelson Matthew P. Tan
General AI
We propose methods for constructing lower bounds on the standard errors of parameters estimated from moment conditions obtained across different samples. Sharp explicit bounds are derived by exploiting geometric inequalities when no information about correlations across samples is available. Furthermore, we develop com…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-06-23 · Sumera Rehman, Ashkan Ajeer, Connor Darling, Marco Endrizzi, Alessandro Olivo, Silvia Cipiccia
General AI
Simultaneous structural and elemental characterisation of a specimen in a non-destructive manner is an instrumental approach with applications in a variety of fields including energy materials, cultural heritage and life sciences. This is routinely performed at synchrotron facilities, e.g. by combining X-ray imaging an…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-06-23 · Jason Sulskis, Sathya Ravi
General AI
Fourier Neural Operators (FNO) learn solution operators of partial differential equations by parameterizing global convolutions in the complex Fourier domain. For real-valued PDE solutions, the complex FFT carries representational redundancy through conjugate symmetry. We introduce the Hartley Neural Operator (HNO), th…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-06-23 · Lena Becker, Holger Hermanns
General AI
Markov jump linear systems (MJLSs) model dynamical phenomena subject to random switching among multiple linear modes, driven by an underlying Markov chain. Classical notions such as mean and mean-square stability characterize the long-term asymptotic behaviour of the first and second moments of an MJLS, but they can be…
- Review
- pending
- Role
- unreviewed
- Read
- later