huggingface
Score 28.4
2026-06-22 · Haggai Roitman
General AI
The Hitchhiker's Guide to Agentic AI is a comprehensive practitioner's reference for building autonomous AI systems. The book covers the full stack from first principles to production deployment, organized around a central thesis: building great agentic systems requires understanding every layer of the pipeline, not ju…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 26.4
2026-06-24 · Haoxiang Sun, Zhihang Yi, Langxuan Deng, Yuhao Zhou, Peiqi Jia, Jian Zhao, Li Yuan, Jiancheng Lv, Tao Wang
General AI
Fine-grained visual reasoning requires multimodal large language models (MLLMs) to identify task-relevant visual evidence and ground their reasoning in local image regions. Existing agentic methods typically rely on reinforcement learning with verifiable rewards or supervised fine-tuning on large-scale annotated reason…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 25.9
2026-06-24 · Luke McDermott, Robert W. Heath, Rahul Parhi
Research Track A · General AI
Lifelong continual learning remains an obstacle on the path to human-like intelligence. Modern transformers show sparks of intelligence with in-context learning. The quadratic nature of attention, however, prohibits transformers from performing this process on arbitrarily long sequences. In this work, we argue that ext…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 22.2
2026-06-23 · Tianyu Yang, Sudipta Paul, Vijay Srinivasan, Vivek Kulkarni, Srinivas Chappidi
Research Track A · General AI
Large language model (LLM) agents rely on long-term memory to support extended interactions and personalized assistance beyond finite context windows. Existing memory agents actively update external memory through generated write, revise, and delete operations, but these updates may omit important information, corrupt …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.2
2026-06-24 · Akshay Paruchuri, Sanmi Koyejo, Ehsan Adeli
General AI
Standard benchmarks for multimodal large language models (MLLMs) score each item on one canonical ordering and miss whether order-irrelevant shuffling changes the answer, a baseline reliability property called for by emerging AI evaluation guidelines. We introduce Facet-Probe, a five-facet audit (option, evidence-chunk…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 21.0
2026-06-15 · Yongjia Lei, Nedim Lipka, Zhisheng Qi, Utkarsh Sahu, Koustava Goswami, Franck Dernoncourt, Ryan A. Rossi, Yu Wang
General AI
Retrieving external knowledge is essential for solving real-world tasks, yet it remains challenging when the relationship between a query and its relevant knowledge involves implicit and complex reasoning beyond surface-level semantic or lexical matching (e.g., mathematical problems relying on the same theorem or codin…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.2
2026-06-24 · Yu-Yang Chen, Lan-Zhe Guo
General AI
Multimodal Large Language Models (MLLMs) demonstrate strong performance on standard visual question answering benchmarks, yet their scalability under controlled structural complexity remains poorly understood. We introduce TriViewBench, a controlled three-view visual reasoning benchmark constructed from synthetic 3D sc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.9
2026-06-23 · Ahmed Anwar, Andreas Wagner, Federico Raue, Tobias Nauen, Andreas Dengel
Research Track A
Accuracy degradation is the standard metric for Catastrophic Forgetting (CF), however, it records only whether forgetting occurred or not. It saturates at the extremes and collapses discretely at task boundaries, hiding the internal structure of what is being forgotten. We introduce six softmax-derived metrics spanning…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.2
2026-06-24 · Mingguang Chen, Bo Qu
General AI
Large language models are increasingly deployed as investment research assistants, yet no benchmark tests whether they can accurately reconstruct and apply the specific procedural decision frameworks of expert investors. We introduce InvestPhilBench, a multi-layer dynamic benchmark spanning eight cognitive tiers, from …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.2
2026-06-24 · Changdae Oh, Wendi Li, Seongheon Park, Samuel Yeh, Tanwi Mallick, Sharon Li
General AI
Process reward models enable fine-grained, step-level evaluation of LLMs, yet building them for agentic settings remains prohibitively difficult: long-horizon interactions, irreversible actions, and stochastic environment feedback make both human annotation and Monte Carlo estimation infeasible at scale. In this work, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 19.2
2026-06-24 · Yupu Hao, Zhuoran Jin, Huanxuan Liao, Kang Liu, Jun Zhao
General AI
Tool use enables large language models (LLMs) to perform complex tasks, and recent agentic reinforcement learning (RL) methods show promise for enhancing model capabilities. However, RL alone often leads to instability or limited gains in tool-use tasks. In our experiments, some models exhibit catastrophic collapse, wh…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.9
2026-06-23 · Beining Wu, Zihao Ding, Jun Huang, Yanxiao Zhao
Research Track A · General AI
On-device language-model agents improve by accumulating experience in retrieved memory rather than by updating weights. This memory is hard-bounded and exposed: it consumes RAM and energy, reaches peers through a thin uplink, and becomes an attack surface because it is writable by what the agent reads. Existing systems…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 16.4
2026-06-21 · Zhuoran Jin, Kejian Zhu, Hongbang Yuan, Yupu Hao, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao
General AI
Chain-of-Thought (CoT) has become a standard method for improving reasoning capabilities in large language models (LLMs) by eliciting step-by-step thinking, but its effectiveness in multimodal tasks remains unclear. In this paper, we aim to systematically investigate the key question: What can multimodal Chain-of-Thoug…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 16.2
2026-06-24 · Liang-Yuan Wu, Zih-Ching Chen, Tongshuang Wu, Chao-Han Huck Yang, Hua Shen
General AI
As multimodal conversational systems increasingly engage in spoken interaction, their ability to navigate paralinguistic social cues has become a critical bottleneck for natural human-AI communication. However, existing evaluations of machine emotional intelligence assess reasoning exclusively through isolated text or …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.9
2026-06-23 · Yujiang He, Frederic Uhrweiller, Bernhard Sick
Research Track A
Power forecasting models deployed in real-world energy markets must operate under nonstationary conditions, where data distributions continually evolve due to weather variability, infrastructure upgrades, and changing consumption behaviors. In practice, these models face strict operational constraints: historical data …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.4
2026-06-23 · Morayo Danielle Adeyemi, Ryan A. Rossi, Franck Dernoncourt
General AI
"Talk short. Drop grammar. Save token." This caveman style is widely promoted as a way to cut inference cost, but whether it actually saves anything depends on which channel (the user's prompt or the model's response) is being compressed. We present Cavewoman, a two-channel evaluation protocol that scores every generat…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 15.4
2026-06-24 · Jiayu Li, Yixiao Fang, Tianyu Hu, Wei Cheng, Ping Huang, Zheheng Fan, Gang Yu, Xingjun Ma
General AI
Real-world photography requires capture-time guidance for both camera framing and subject pose. Yet existing aesthetic cropping benchmarks mainly evaluate post-hoc crop prediction and overlook subject-side recommendations, leaving the capture-time guidance capabilities of multimodal large language models (MLLMs) undere…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.4
2026-06-23 · Fengfeng Liang, Yuechen Zhang, Jiaya Jia
General AI
Existing low-bit KV-cache quantizers often treat each cached key as a flat vector. Under RoPE, however, a key's contribution to a future attention logit decomposes into a position-dependent sum over two-dimensional frequency blocks. This makes key-cache quantization a block-wise bit-allocation problem: high-energy RoPE…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 14.4
2026-06-24 · Shen Nie, Qiyang Min, Shaoxuan Xu, Zihao Huang, Yuxuan Song, Yong Shan, Yankai Lin, Wayne Xin Zhao, Chongxuan Li, Ji-Rong Wen
General AI
Modern large language models are predominantly trained with autoregressive factorization and causal attention. We present iLLaDA, an 8B masked diffusion language model trained from scratch with fully bidirectional attention. iLLaDA keeps the masked diffusion objective throughout pre-training and supervised fine-tuning …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.2
2026-06-24 · Ilia Kulikov, Chenxi Whitehouse, Tianhao Wu, Yixin Nie, Swarnadeep Saha, Eryk Helenowski, Weizhe Yuan, Olga Golovneva, Jack Lanchantin, Yoram Bachrach, Jakob Foerster, Xian Li, Han Fang, Sainbayar Sukhbaatar, Jason Weston
General AI
We introduce Autodata, a general method that enables AI agents to act as data scientists who build high quality training and evaluation data. We show how to train (meta-optimize) such a data scientist agent, so that it learns to create even stronger data. We describe the overall formulation, and a specific practical im…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.2
2026-06-24 · Samuel Valland Lyngset, Tor Viljen Raanaas, Gard Sveipe, Eirik Møller Nilsen, Jim Torresen, Kai Olav Ellefsen, Tobias Lømo
General AI
When fine-tuning Large Language Models (LLMs), there has been success in minimizing both memory usage and computation with Parameter-Efficient Fine-Tuning (PEFT), like Low Rank Adaptation (LoRA). In this article, we have explored whether this approach is transferable to the world of robotics and Reinforcement Learning …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.2
2026-06-24 · Filippos Bellos, Andre S. Gala-Garza, Miaowei Wang, Alyssa M. Hardin, Ahmad M. Hider, Yayuan Li, Jing Bi, Susan Liang, Chenliang Xu, Donald S. Likosky, Jason J. Corso
General AI
We introduce SurgAtlas, the largest surgical video-language dataset to date, comprising 15,291 videos (2,391 hours) spanning 18 surgical specialties and over 5,000 procedure types, sourced entirely from publicly available YouTube content. SurgAtlas is also the first surgical video-language dataset to include open surge…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.4
2026-06-23 · Animesh Animesh, Satheesh K Perepu, Kaushik Dey
Research Track A · General AI
In cooperative multi-agent reinforcement learning (MARL), from a deployment perspective, it is challenging and expensive to train agents from scratch for each new environment or task. In this work, we propose GCT-MARL, a transfer learning framework that builds on the multi-view graph contrastive backbone of MAIL and au…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.2
2026-06-24 · Yves Ferstler, Adam Podoxin, Ty Brassington, Roman Grundkiewicz, Maite Taboada, Marzena Karpinska
General AI
AI translation of literary works is increasingly common. While the content may be rendered adequately, we do not know enough about how readers experience it in terms of immersiveness and literary effect, aspects poorly captured by automatic machine translation metrics or human evaluation targeting fluency and adequacy.…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.2
2026-06-24 · Yuxing Cheng, Yuan Wu, Yi Chang
Research Track A · General AI
Vision-language models (VLMs) have achieved strong performance on OCR-based benchmarks and increasingly focused on text-rich understanding, but their robustness under controlled visual degradation remains insufficiently understood. This gap is critical for OCR reasoning, where visual corruption can induce OCR errors an…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.2
2026-06-24 · Alexander Schperberg, Shivam K. Panda, Abraham P. Vinod, M. K. Jawed, Stefano Di Cairano
General AI
We present RoboAtlas, a contextual Active SLAM framework that adaptively balances geometric exploration and semantic reasoning using a scalable 3D semantic mapping system, OpenRoboVox. RoboAtlas integrates frontier exploration, global semantic-map reasoning, and egocentric VLM-based reasoning through a contextual multi…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.4
2026-06-24 · Fangzheng Li, Aimin Zhang, Chen Lv
General AI
Tool Calling and Structured Output are two core capabilities of modern Agent systems, yet their interaction under joint deployment conditions remains insufficiently understood. This paper reports a reproducible phenomenon observed in a production Agent system: when Tool Calling and JSON Schema constraints are simultane…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.2
2026-06-24 · Lea Roxanne Muth, Marian Margraf
Research Track A · General AI
This paper presents a novel approach to perform semi-automated BSI IT-Grundschutz certification using a MultiLarge Language Model system (MLS) with Hybrid RetrievalAugmented Generation (HybridRAG). Facing the challenges of the Network and Information Security Directive 2 (NIS2) directive, a shortage of specialists, and…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.2
2026-06-24 · Poojitha Thota, Shirin Nilizadeh
General AI
Training-time data poisoning during fine-tuning poses a significant threat to large language models (LLMs) deployed for abstractive text summarization, where small task-specific datasets exert disproportionate influence on model behavior. In this setting, adversaries manipulate fine-tuning data to induce persistent sum…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.2
2026-06-24 · Babak Rahmani, Sebastian Dziadzio, Joschka Strüber, Sergio Hernández-Gutiérrez, Matthias Bethge
General AI
For most of scientific history, researchers studying behavior could only infer hidden mechanisms from outward actions: an inverse problem that becomes more tractable when observation is augmented by targeted intervention. We pose a computational analogue: given only behavioral traces of an agent in a game environment, …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 11.0
2026-06-18 · Kaiyue Yang, Yuyan Bu, Jingwei Yi, Yuchi Wang, Biyu Zhou, Juntao Dai, Songlin Hu, Yaodong Yang
General AI
As LLM agents increasingly select tools autonomously, their choices among tools with different privileges become safety-relevant. However, prior tool-selection studies focus on safety-agnostic metadata preferences, leaving privilege-sensitive choices underexplored. To address this gap, we study over-privileged tool sel…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.0
2026-06-19 · Jiehui Huang, Yuechen Zhang, Bin Xia, Jiahao Wang, Xu He, Zhenchao Tang, Meng Chu, Xin Tao, Pengfei Wan, Jiaya Jia
General AI
Generating a coherent multi-shot video requires structured cross-shot memory. Subject appearance, scene context, and speaker identity must persist across cuts. Existing approaches either train end-to-end over fixed-length sequences and cannot scale, generate shot-by-shot with memory banks that grow linearly, or orchest…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.2
2026-06-24 · Aditya Singh, Gerson Kroiz, Senthooran Rajamanoharan, Neel Nanda
General AI
A central goal of safety research is determining whether a model is misaligned. Prior work has largely focused on detecting concerning behavior. But behavior alone does not establish misalignment: a concerning action can arise from benign causes such as confusion. This motivates model forensics: investigating whether t…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.2
2026-06-24 · Andrei Liviu Nicolicioiu, Mohammad Pezeshki, Aaron Courville
General AI
On-policy self-distillation achieves strong pass@1 accuracy by using a single model as both teacher and student, with the teacher conditioned on a correct demonstration to provide dense token-level feedback. We show that this could come at a hidden cost: rollout diversity decreases and pass@k curves flatten (i.e., gene…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.2
2026-06-24 · Seth Dobrin, Łukasz Chmiel
General AI
AI agents are granted access to tools, APIs, and other infrastructure, making them active principals in those systems. The dominant approach places controls inside the agent's own runtime: system prompts, output filters, and guardrail libraries. Any control in the agent's address space is reachable by inputs that influ…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.2
2026-06-24 · Fariba Tohidinezhad, Douwe J. Spaanderman, Natalia Oviedo Acosta, Kaouther Mouheb, Karthik Prathaban, David F. Hanff, Dirk J. Grünhagen, Cornelis Verhoef, Joris M. van Sabben, Evelyne Roets, Jette J. Slettenhaar, Hans Gelderblom, Ingrid M. E. Desar, Anna K. L. Reyners, Neeltje Steeghs, Stefan Klein, Martijn P. A. Starmans
General AI
Background: Response to neoadjuvant imatinib in gastrointestinal stromal tumors (GISTs) is highly variable and cannot be reliably predicted using current clinical or molecular markers. This study developed and evaluated an explainable multimodal deep learning framework integrating computed tomography (CT) imaging and c…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.2
2026-06-24 · Giulian Biolo, Michael Tezza, Yuanjun Gong, Fabio Massacci
General AI
Software vulnerability remediation is a cognitively demanding task that requires specialized security expertise often lacking in general developers. In the meantime, Large Language Models (LLMs) assisted tools show potential in vulnerability detection, location, and repair tasks. [Hypothesis:] While LLM-assistance is h…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.2
2026-06-24 · JoungBin Lee, Jaewoo Jung, Jongmin Lee, Tongmin Kim, Hyunsung Kim, Takuya Narihira, Kazumi Fukuda, Jahyeok Koo, Jisang Han, Yuki Mitsufuji, Seungryong Kim
General AI
Synthesizing a novel-view video from a monocular reference video along a target camera trajectory requires both geometric consistency and motion fidelity with respect to the reference video. Existing methods based on explicit 3D representations are limited by the accuracy of off-the-shelf reconstruction modules, which …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.2
2026-06-24 · Alexandre Bouayad
General AI
Large language models (LLMs) attain remarkable surface fluency on code, yet they neither formally guarantee the syntactic validity of their output nor leverage the hierarchical structure defining the target language. While existing constrained-decoding frameworks address the former, they operate under rigid assumptions…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.2
2026-06-24 · Eyasu Getahun Chekole, Howard Halim, Daniël Reijsbergen, Jianying Zhou
General AI
Biometric authentication systems are increasingly deployed in security-critical applications, yet existing physiological and behavioral biometrics suffer from fundamental limitations: 1) they are vulnerable to spoofing attacks due to unreliable liveness detection, 2) biometric templates may leak privacy-sensitive infor…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.2
2026-06-24 · Botao He, Zhi Wang, Linna Kuang, Ishaan Ghosh, Jitendra Malik, Cornelia Fermuller, Tingfan Wu, Jiayuan Mao, Ruoshi Liu, Haozhi Qi, Yiannis Aloimonos
General AI
Human demonstrations are a scalable data source for learning robot manipulation policies. However, common sources of human demonstration data, such as motion-capture trajectories and internet videos, capture mostly motion and appearance while missing the contact forces that are critical for force-sensitive manipulation…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.2
2026-06-24 · Qiyang Lyu, Zhenyu Wu, Wei Wang, Hongming Shen, Danwei Wang
General AI
Localization in challenging environments, such as GNSS-denied, geometrically repetitive, or textureless scenes commonly found in offices, hotels, and underground parking facilities, remains an open problem for reliable autonomous mobile robot (AMR) deployment. Single-modality localization methods are inherently limited…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.2
2026-06-24 · Lawrence S. Moss, Arthur Paul Pedersen
General AI
This theoretical note studies the finite axiomatizability of strict majority reasoning in finite social decision frames. Moss and Pedersen (2026) <doi: 10.48550/arXiv.2606.23853> introduce a coherence criterion that characterizes exactly when qualitative majority judgments are representable by a finitely additive measu…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.4
2026-06-24 · Kaiwen Zheng, Guande He, Min Zhao, Jintao Zhang, Huayu Chen, Jianfei Chen, Chen-Hsuan Lin, Ming-Yu Liu, Jun Zhu, Qianli Ma
General AI
Autoregressive video diffusion with causal diffusion transformers has emerged as a major paradigm for real-time streaming video generation and action-conditioned interactive world models. In this work, we extend rCM, an advanced diffusion distillation framework, to autoregressive video diffusion. The core philosophy of…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-06-24 · Sen Li, Haichao Cui, Chendong Shao, Yaqi Wang, Xinhua Tang
General AI
Supervised deep learning has been widely used for weld penetration state classification; however, its performance often degrades significantly under domain shift, such as when transferring models between welding processes with distinct physical mechanisms:for instance, from arc-dominated tungsten inert gas (TIG) weldin…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-06-24 · Yichuan Cao, Dakai Guo, Ruichen Qiu, Ruyong Feng, Xiao-Shan Gao
General AI
In this paper, it is proved that any nonnegative integer can be written in the following form $$ x(x+1)/2 + y(3y+1)/2 + z(5z+1)/2, \qquad x,y,z \in \mathbb{N}. $$ This settles the conjecture recorded as OEIS A287616. All parts of the proof have been formalized in Lean 4, with the exception of two results: one externall…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-06-24 · Mark Whitmeyer
General AI
Labels -- grades, credentials, scores, ratings, ranks -- do two things. They inform receivers, and they give agents something to chase. I study optimal classification when labels must be earned through costly self-selection. I show that exact certification is inefficiently fine: pooling a small bottom interval saves fi…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-06-24 · Haoyu Lin, Yiyan Liao, Jinmei Pan, Xinliao Ling, Luhua Lai, Jianfeng Pei
General AI
Molecular generation is a central challenge in drug discovery, requiring models that explore vast chemical space while satisfying diverse design constraints. We present Molexar, a unified multimodal molecular foundation model built on Fragment-SELFIES, a robust, fragment-aware molecular language with validity-preservin…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.2
2026-06-24 · Yue Gruszecki, Elliot Anshelevich
General AI
We study strategic facility location, in which $n$ agents are located in an arbitrary metric space, and the goal is to choose $k$ facilities to minimize the total agent cost. The agents can have two types of individual cost functions: max-type where the agent wants to minimize the maximum distance from themselves to an…
- Review
- pending
- Role
- unreviewed
- Read
- later