arxiv
Score 22.3
2026-06-10 · Shang Ma, Jisheng Dang, Wencan Zhang, Yifan Zhang, Bimei Wang, Hong Peng, Bin Hu, Qi Tian, Tat-Seng Chua
General AI
We propose a multi-agent collaborative framework built upon a lightweight Multimodal Large Language Model (MLLM), specifically designed for social intelligence reasoning. A key feature of our approach is that both the training and inference phases are augmented via knowledge distillation. Within this architecture, mult…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.3
2026-06-11 · Tanmoy Kanti Halder, Akash Ghosh, Subhadip Baidya, Arijit Roy, Sriparna Saha
General AI
Multimodal Large Language Models (MLLMs) have shown promising reasoning capabilities in general domains, yet their performance remains limited in specialized settings such as healthcare, especially in multilingual and low-resource scenarios. This gap is critical in regions like rural India, where patients often express…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.3
2026-06-11 · Xunhao Lai, Weiqi Xu, Yufeng Yang, Qiaorui Chen, Yang Xu, Lunbin Zeng, Xiaolong Li, Haohai Sun, Haichao Zhu, Vito Zhang, Pengyu Zhao
General AI
Ultra-long-context capability is becoming indispensable for frontier LLMs: agentic workflows, repository-scale code reasoning, and persistent memory all require the model to jointly attend over hundreds of thousands to millions of tokens, yet the quadratic cost of softmax attention makes this untenable at deployment sc…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.3
2026-06-11 · King Yeung Tsang, Zihao Zhao, Vishal Venkataramani, Haizhou Shi, Zixuan Ke, Semih Yavuz, Shafiq Joty, Hao Wang
General AI
Multi-Agent Systems (MAS) built on Large Language Models (LLMs) require effective orchestration to coordinate specialized agents, yet training such orchestrators is hindered by limited supervision and high computational cost. We propose Orchestration Reward Modeling (OrchRM), a self-supervised framework for evaluating …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 21.0
2026-06-11 · Zhibao Chen, Qian Cheng
Research Track A · General AI
Long-running LLM agents accumulate interaction histories far larger than any context window, forcing a standing decision: what to encode deeply, what to forget, and what to retrieve under a fixed memory budget. Production systems answer with semantic similarity or recency -- both mis-specified for the forgetting decisi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 20.5
2026-06-10 · Longkun Hao, Hongyu Lin, Hao Li, Zhichao Yang, Haojie Hao, Dongshuo Huang, Haitao Yang, Hongyu Ge, Ming jie Xie, Yanjun Wu, Zi Hao Yin, Yan Bai, Yihang Lou
Research Track B · General AI
Training interactive web agents through imitation learning from expert trajectories has emerged as a highly effective approach. However, determining the optimal timing for expert intervention presents a critical challenge in this context. Delayed intervention often leads to the accumulation of early-stage errors, pushi…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.5
2026-06-09 · Masoume Gholizade, Fabrizio Ruffini, Pietro Ducange, Francesco Marcelloni
Research Track A · General AI
Federated Learning (FL) enables collaborative and privacy-preserving model training across distributed clients, but most existing FL systems implicitly assume data stationarity. In real-world settings-such as healthcare, industrial IoT (IIOT), cybersecurity, and smart cities-data streams are inherently non-stationary, …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-06-11 · Zongsheng Cao, Bihao Zhan, Jinxin Shi, Jiong Wang, Fangchen Yu, Zhijie Zhong, Zijie Guo, Tianshuo Peng, Zhuo Liu, Yi Xie, Xiang Zhuang, Yue Fan, Runmin Ma, Shiyang Feng, Xiangchao Yan, Anran Liu, Peng Ye, Wenlong Zhang, Shufei Zhang, Chunfeng Song, Fenghua Ling, Jie Zhou, Liang He, Bo Zhang, Lei Bai
General AI
Current LLM-based research agents have advanced through agent orchestration, yet largely overlook scientific knowledge orchestration. Existing works often reduce papers to abstracts, surface mentions, and flat \texttt{cites} edges, omitting key entities, claims, evidence, mechanisms, and method lineages essential for s…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 18.3
2026-06-11 · Dian Zheng, Harry Lee, Manyuan Zhang, Kaituo Feng, Zoey Guo, Ray Zhang, Hongsheng Li
General AI
Recent image generators have demonstrated impressive photorealism and instruction-following capabilities in single-image generation and editing. However, constrained by their architectures, they cannot achieve interleaved generation (text-image sequence), which has crucial applications in visual narratives, guidance, a…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 17.5
2026-06-04 · Ashutosh Hathidara, Sai Shruthi Sistla, Sebastian Schreiber, Sahil Bansal
General AI
Large language models deployed as agents over large tool catalogs face a critical tool-retrieval bottleneck. As embedding-based retrieval approaches rely on compact encoders that may under-capture specialized tool semantics, parametric tool retrieval addresses this by encoding each tool as a virtual token appended to t…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.5
2026-06-10 · Ahmed Sharshar, Naveen Kumar Kummari, Mohsen Guizani
Research Track A · General AI
Continual learning (CL) models often use experience replay to reduce catastrophic forgetting, but their robustness to replay sampling interference remains underexplored. Existing CL attacks alter inputs or training pipelines (poisoning/backdoors) and rarely include explicit auditable constraints, limiting realism. Here…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.3
2026-06-11 · Jundong Xu, Qingchuan Li, Jiaying Wu, Yihuai Lan, Shuyue Stella Li, Huichi Zhou, Bowen Jiang, Lei Wang, Jun Wang, Anh Tuan Luu, Caiming Xiong, Hae Won Park, Bryan Hooi, Zhiyuan Hu
General AI
Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. In contrast, real-world deployment is inherently dynamic, requiring agents to continually align their knowledge, skills, and behavior with changing environments and updated …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 17.0
2026-06-10 · Dayananda Herurkar, Federico Raue, Joachim Folz, Jörn Hees, Andreas Dengel
Research Track A · General AI
Continual anomaly detection in tabular data is challenging and remains largely underexplored, particularly in settings with heterogeneous feature schemas, distribution shifts, and severe class imbalance. In many real-world applications, data arrive sequentially from diverse domains, rendering conventional continual lea…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.3
2026-06-11 · Seokju Cho, Ryo Hachiuma, Abhishek Badki, Hang Su, Byung-Kwan Lee, Chan Hee Song, Sifei Liu, Subhashree Radhakrishnan, Seungryong Kim, Yu-Chiang Frank Wang, Min-Hung Chen
General AI
Spatial reasoning, the ability to determine where objects are, how they relate, and how they move in 3D, remains a fundamental challenge for vision-language models (VLMs). Tool-augmented agents attempt to address this by augmenting VLMs with specialist perception modules, yet their effectiveness is bounded by the actio…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.0
2026-06-11 · Minlin Zeng, Zhipeng Zhou, Yang Qiu, Martin J. McKeown, Zhiqi Shen
Research Track A · General AI
Gait-based Parkinson's disease assessment increasingly relies on heterogeneous sensors, but clinical systems rarely collect all modalities simultaneously. New sensors may arrive through device upgrades, protocol changes, or multi-center deployment, while historical patient data are often unavailable because of privacy …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 15.0
2026-06-11 · Ayushman Trivedi, Bhavika Melwani
Research Track A · General AI
Catastrophic forgetting is often viewed as the destruction of previously learned knowledge during sequential learning. Building on the Accessibility Collapse framework, we investigate the geometric structure of recoverability in continual learning. Using Split CIFAR-100 and a sequentially trained ResNet-18, we analyze …
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-10 · Michal Chudoba, Sergey Alyaev, Petra Galuscakova, Tomasz Wiktorski
General AI
There are two main Parameter-Efficient Fine-Tuning (PEFT) techniques for Large Language Models (LLMs). While Low-Rank Adaptation (LoRA) introduces additional weights between the LLM layers, Soft Prompting introduces additional fine-tuning-specific raw tokens to an LLM input. However, both require modification to the co…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-11 · Guojun Liao
General AI
Current discussions of AI in scientific discovery are often dominated by two visible capabilities: search over existing knowledge and execution through optimization, simulation, and automation. Both are important, but neither fully captures the central act of discovery: the formation and evolution of models. This paper…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-11 · Zilin Xiao, Qi Ma, Chun-cheng Jason Chen, Xintao Chen, Avinash Atreya, Hanjie Chen, Vicente Ordonez
General AI
Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is poorly suited for complex reasoning tasks: a semantically similar problem may demand an entirely different solution strategy, wh…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-11 · Elias Lumer, Sahil Sen, Kevin Paul, Vamse Kumar Subbiah
General AI
Recursive language models (RLMs) showed that recursion over model calls is an effective strategy for long-context reasoning, and production coding agents have begun to write code that spawns subagents at scale, most recently in Anthropic's dynamic workflows. We name and study the pattern between these two lines of work…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 14.3
2026-06-11 · Zihao Wang, Yiming Li, Yutong Wu, Zheyu Liu, Kangjie Chen, Fok Kar Wai, Pin-Yu Chen, Vrizlynn L. L. Thing, Bo Li, Dacheng Tao, Tianwei Zhang
Research Track B · General AI
Web agents driven by large language models (LLMs) are increasingly deployed in real-world environments, where they operate over untrusted web content and execute actions with direct consequences. This makes them vulnerable to prompt-injection attacks, in which seemingly benign content embeds adversarial instructions th…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.5
2026-06-10 · Megha Manoj, Sue Ann Campbell
Research Track A
Neural assemblies, transiently coordinated groups of neurons, observed in the hippocampus are thought to underlie the formation of episodic memories. Acetylcholine (ACh), a neuromodulator, that is received by the hippocampus, plays a critical role in memory and learning. A well supported hypothesis suggests that high l…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-06-11 · Arnav Kumar Jain, Yilin Wu, Jesse Farebrother, Gokul Swamy, Andrea Bajcsy
General AI
The potential impacts of world models (WMs, i.e., learned simulators) on robotics are far-reaching -- policy evaluation, policy improvement, and test-time planning -- all with limited real-world interaction. To unlock these downstream capabilities, a WM needs to jointly satisfy three desiderata: $\textit{(i)}$ fidelity…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-06-11 · Charles Moslonka, Amaury de Vitry, Arthur Garnier, Hicham Randrianarivo, Emmanuel Malherbe
General AI
Finance reporting is a natural proving ground for large language models, and the very-long-context capabilities of recent models across all sizes make rigorous evaluation in this domain an increasingly pressing need. Yet most public financial resources reduce the task to plain-text SEC 10-K filings paired with a handfu…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-06-11 · Zhao-Heng Yin, Guanya Shi, Pieter Abbeel, C. Karen Liu
General AI
Articulated tool manipulation remains a major challenge in dexterous robotics due to the need to coordinate internal degrees of freedom and contact-rich interactions. While prior work has largely focused on rigid objects, articulated tool use remains underexplored because of its physical complexity and the difficulty o…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 13.3
2026-06-11 · Jiwen Liu, Shujuan Li, Zhixue Fang, Xiaohan Li, Yan Zhou, Zijie Meng, Zhimin Zhang, Yawen Luo, Guoxin Zhang, Yu-Shen Liu, Pengfei Wan
General AI
Cloning camera motion from reference videos is an important task in video generation, as videos provide intuitive and precise control. Existing methods either directly use parametric representations that fail to handle multi-shot generation or synthesize cross-paired data, which suffer from data scarcity, resulting in …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 12.5
2026-06-11 · Siyi Chen, Xiaoyan Zhang, Meng Wu, Jonathan Tremblay, Valts Blukis, Stan Birchfield, Rene Vidal, Alvaro Velasquez, Sijia Liu, Qing Qu
General AI
Multi-agent systems communicate mostly through text, paying a lossy and expensive decode and re-encode cost. KV-cache communication is a promising alternative, yet most prior work is homogeneous, using duplicate copies of the same model, and avoids the central challenge of cross-model latent alignment; existing heterog…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.3
2026-06-11 · Yaxin Du, Yifan Zhou, Yujie Ge, Jiajun Wang, Xianghe Pang, Shuo Tang, Tuney Zheng, Bryan Dai, Jian Yang, Siheng Chen
General AI
Tool-augmented LLM agents commonly rely on step-wise atomic tool calls, where each invocation, observation, and value transfer is exposed in the main reasoning trace. This creates an \emph{execution-granularity mismatch}: locally deterministic tool workflows are unfolded into repeated model-visible decisions, consuming…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 12.0
2026-06-09 · Jebacyril Arockiaraj, Dhruv Parikh, Jayashree Adivarahan, Rajgopal Kannan, Viktor Prasanna
Research Track A · General AI
Federated continual learning (FCL) must learn from distributed task streams under limited resources, such as communication, computation, memory, and label availability. Existing FCL methods often rely on repeated local optimization, replay, and full supervision. Analytic alternatives avoid iterative training and replay…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-06-10 · Haotao Xie
General AI
Recently, large language models (LLMs) have achieved promising progress in the fields of classical Chinese translation and the generation of classical poetry. However, domain-specific research on precise translation and affective-semantic understanding of classical poetry remains limited. The main challenge is that mos…
- Review
- pending
- Role
- unreviewed
- Read
- now
arxiv
Score 11.3
2026-06-11 · Zach Studdiford, Gary Lupyan
General AI
When large language models (LLMs) fail to generalize or make haphazard errors in reasoning, it is often taken as evidence that LLMs are not truly reasoning, but rather performing a kind of pattern matching. The implication is that people's behavior does not exhibit the same types of failures because human reasoning use…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-06-05 · Chung-En Sun, Linbo Liu, Tsui-Wei Weng
General AI
Are tool-calling LLM agents equally safe throughout a conversation? We discover they are not: agents are most vulnerable at the very start of a session and become substantially safer after a few regular agentic tasks -- a phenomenon we term the cold-start safety gap. To study this systematically, we introduce Safety Ov…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 10.5
2026-06-09 · Malikeh Ehghaghi, Boglárka Ecsedi, Marsha Chechik, Colin Raffel
General AI
Adversarial robustness evaluations of large language models (LLMs) typically report attack success rate (ASR) under fixed query budgets, implicitly treating all attacks as equally costly. In practice, the computational expense of different attack strategies can vary by orders of magnitude. Consequently, ASR at a fixed …
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 10.5
2026-06-11 · Yujun Zhou, Kehan Guo, Haomin Zhuang, Xiangqi Wang, Yue Huang, Zhenwen Liang, Pin-Yu Chen, Tian Gao, Nuno Moniz, Nitesh V. Chawla, Xiangliang Zhang
General AI
Interactive LLM agents are becoming part of daily work, but they do not reliably become easier to work with over time: a correction remembered in one session may still be violated in the next. We study this gap between preference access and preference compliance. In tasks derived from anonymized real-user friction case…
- Review
- pending
- Role
- unreviewed
- Read
- now
huggingface
Score 9.5
2026-06-10 · Zhuofan Shi, Mingzhe Ma, Lu Wang, Fangkai Yang, Pu Zhao, Yiming Guan, Youling Huang, Wei Zhang, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan
General AI
Deep search requires agents to answer complex questions through multi-step web search, browsing, evidence comparison, and synthesis. A central challenge is deciding how to search when several directions look plausible but only some will later lead to reliable evidence. If an agent greedily follows the current best-look…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 9.0
2026-06-11 · Xiaobin Zhang, Lefei Shen, Mouxiang Chen, Zhuo Li, Hongkai Li, Han Fu, Jianling Sun, Xiaoxue Ren, Chenghao Liu
Research Track A · General AI
Driven by conservative over-provisioning to guarantee service reliability, resource utilization in cloud data centers remains at low levels. To mitigate this, the forecast-then-optimize paradigm has emerged to optimize consolidation by anticipating future demands. While emerging time series foundation models promise to…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 8.5
2026-06-10 · Yuchen Xian, Yunqiu Xu, Yang He, Yi Yang
General AI
Multimodal image fusion aims to integrate complementary information from different modalities into a fused image that preserves rich local details while maintaining globally consistent appearance. Existing approaches build shared representations on 2D feature grids, which excel at modeling local structures but offer li…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 8.3
2026-06-11 · Amy Xin, Jiening Siow, Junjie Wang, Zijun Yao, Fanjin Zhang, Jian Song, Lei Hou, Juanzi Li
General AI
LLM-based agents have shown increasing potential in automating scientific discovery. Given an optimizable metric and an execution environment, they can propose, validate, and iterate scientific solutions, and have produced results that outperform human-designed approaches. As model capabilities continue to improve, we …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.5
2026-06-10 · Stephen Kasica, Charles Berret, Tamara Munzner
Research Track A
Data journalists routinely integrate records across multiple independently published sources to support accountability reporting, yet no existing interactive wrangling tool treats the collection of tables -- rather than the single table -- as its primary unit of work. We present OpenRoundup, an open-source, browser-bas…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.5
2026-06-11 · Víctor Blanco, J. Fernando Camacho-Vallejo, Yolanda Hinojosa
Research Track A
Urban waste management faces increasing operational and environmental challenges driven by population growth, heterogeneous waste streams, traffic congestion, and the need for sustainable collection infrastructures. We present an integrated optimization framework for the design of multi-type urban waste collection and …
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-06-11 · Tobias Holtdirk, Pietro Marcolongo, Anna Steinberg Schulten, Felix Henninger, Stefan Rose, Sarah Ball, Bolei Ma, Frauke Kreuter, Markus Weinmann, Stefan Feuerriegel
General AI
Reproducibility in the social and behavioral sciences is typically evaluated by independent researchers who reanalyze the original data to assess whether the published findings can be recovered. However, such approaches are resource-intensive and difficult to scale. Here, we show that large language models (LLMs) can a…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 7.3
2026-06-11 · Jialin Gan, Xin Qiu, Guangzhe Chen, Xue Wang
Research Track A · General AI
Large language models (LLMs) have enabled time series (TS) analysis by jointly modeling numerical observations and textual context through a shared token interface. However, TS tokens and prompt tokens exhibit fundamentally different information structures, making uniform token processing inefficient. In this paper, we…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 6.5
2026-06-04 · Huaisong Zhang, Hao Yu, Yuxuan Zhang, Jiahe Wang, Xinrui Chen, Haoxiang Cao, Feng Lu, Wendong Zhang, Changqian Yu, Chun Yuan
General AI
Despite generating increasingly photorealistic images, text-to-image (T2I) models still exhibit localized, subtle, and structurally complex failures. Diagnosing these failures requires instance-level feedback that answers where a defect occurs, what type it is, why it is defective, and its importance to overall image q…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 6.5
2026-06-10 · Niccolò Biondi, Federico Pernici, Simone Ricci, Alberto Del Bimbo
General AI
Learning compatible representations aims to learn feature representations that can be used interchangeably over time whenever a model undergoes updates. In this paper, we demonstrate that stationary representations learned by d-Simplex fixed classifiers imply compatibility as in its formal definition. This result estab…
- Review
- pending
- Role
- unreviewed
- Read
- soon
arxiv
Score 6.3
2026-06-11 · Andy Tang, William Chen, Andrew Wagenmaker, Chelsea Finn, Sergey Levine
General AI
Generalist policies can learn a wide range of skills from diverse robot datasets. In order to solve or improve on challenging news tasks, we need a way to infer and invoke the appropriate actions from the policy's rich behavioral prior, especially when directly commanding the policy fails. We focus on flow matching gen…
- Review
- pending
- Role
- unreviewed
- Read
- soon
huggingface
Score 5.5
2026-05-30 · Etienne Casanova, Rafal Kocielnik, R. Michael Alvarez
General AI
Large Language Models (LLMs) are increasingly used for zero-shot annotation and LLM-as-a-judge tasks, yet their reliability hinges on how model-internalized priors interact with user-provided instructions. We investigate three dimensions of this interaction: (1) how an LLM's familiarity with data and task definitions a…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 5.5
2026-06-11 · Guozhen Zhang, Xuerui Qiu, Yutao Cui, Tianhui Song, Changlin Li, Junzhe Li, Tao Huang, Xiao Zhang, Yang Li, Jianbing Wu, Miles Yang, Zhao Zhong, Liefeng Bo, Limin Wang
General AI
Holistic visual tokenizers are fundamental to unified multimodal models (UMMs) as they map diverse visual inputs into a unified representation space. In this paper, we present HYDRA-X, the first UMM that unifies image and video tokenization within a single Vision Transformer (ViT). Our design is driven by two core chal…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.3
2026-06-10 · Tamás Lajos Tompa, Eszter Varga-Umbrich, Ilyes Batatia, Alin M. Elena, Noam Bernstein, Gábor Csányi
General AI
Adapting machine-learned interatomic potential (MLIP) foundation models to specialised tasks through fine-tuning is an increasingly important practice, yet systematic guidance on when and how to fine-tune is currently limited. We evaluate seven fine-tuning strategies -- naive full-parameter updates, two layer-freezing …
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.3
2026-06-11 · Pengfei Liu, Gen Li, Junqiao Fan, Boyu Ma, Jindou Jia, Yang Xiao, Jianfei Yang
General AI
Human communication is inherently multimodal, where language is often accompanied by non-verbal cues such as gestures to convey intentions. However, current Vision-Language-Action (VLA) models treat robotic manipulation as a pure text-driven task, overlooking the important role of gestures in Human-Robot Interaction (H…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.3
2026-06-11 · Huyen Vo, María Martínez-García, Isabel Valera
General AI
Existing approaches for multimodal variational autoencoders (VAEs) face a trade-off between generative quality and coherence-i.e., they struggle to generate realistic and diverse samples that, at the same time, are semantically consistent across modalities. A recent work shows that using a simple approximation to Hölde…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.3
2026-06-11 · Dimitri Kachler, Damien Sileo, Pascal Denis
General AI
With the growth of LLMs' (Large Language Models) capabilities, there has been an increasing push to curate high quality datasets by filtering samples in the training data. In general, Data Attribution (DA) methods aim to estimate how individual samples in a training dataset can precondition a model to generate certain …
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 5.3
2026-06-11 · Franz Louis Cesista, Katherine Crowson, Cédric Simal, Stella Biderman
General AI
Low-Rank Adaptation (LoRA) significantly reduces compute and memory costs for finetuning Deep Learning models but is often harder to tune than dense training: when using factor-wise optimizers such as AdamW, it is sensitive to initialization choices, its optimal learning rates transfer poorly across ranks, and it often…
- Review
- pending
- Role
- unreviewed
- Read
- later
huggingface
Score 4.5
2026-06-09 · Gal Bloch, Ariel Gera, Matan Orbach, Ohad Eytan, Assaf Toledo
General AI
We present Flash-GMM, a fused Triton kernel for efficient computation of Gaussian Mixture Models (GMMs) over large-scale data in a single GPU pass. By eliminating the need to materialize the full responsibility matrix in GPU memory, Flash-GMM achieves a 20times speedup over existing implementations and enables training…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-06-10 · Sk Muhammad Asif, Orhun Aydin
General AI
Understanding spatial distribution of fallow land is important for optimizing the food-water (FW) nexus, given fallowing's role in crop rotation and water conservation. Fallow is a low accuracy class in USDA Cropland Data Layer (CDL). Geospatial foundation model (GFM), Prithvi-EO has shown strong transferability across…
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-06-11 · Junke Wang, Qihang Zhang, Shuai Yang, Yiming Luo, Yujun Shen, Zuxuan Wu, Yu-Gang Jiang, Yinghao Xu
General AI
This work presents RepWAM, a representation-centric world action model (WAM) built on representation visual-action tokenizers. Existing WAMs typically inherit reconstruction-oriented video tokenizers from pretrained video generation models. Although these tokenizers preserve visual fidelity, pixel reconstruction alone …
- Review
- pending
- Role
- unreviewed
- Read
- later
arxiv
Score 4.3
2026-06-11 · Nithya Shikarpur, Victor Arul, Anna Huang
General AI
Melodic material in Hindustani music is presented in relation to a tonic, usually sustained by the tanpura, a four-stringed drone instrument. Rooted in Hindustani music, 'The Moving Drone' sets the traditionally static drone into motion that, throughout the performance, gains increasing agency transitioning from reacti…
- Review
- pending
- Role
- unreviewed
- Read
- later