Paper Detail

Beyond Text-Dominance: Understanding Modality Preference of Omni-modal Large Language Models

Xinru Yan, Boxi Cao, Yaojie Lu, Hongyu Lin, Weixiang Zhou, Le Sun, Xianpei Han

Browse

Workflow Queues

huggingface Score 9.5

Published 2026-04-18 · First seen 2026-04-21

General AI

Open paper source

Abstract

Native Omni-modal Large Language Models (OLLMs) have shifted from pipeline architectures to unified representation spaces. However, this native integration gives rise to a critical yet underexplored phenomenon: modality preference. To bridge this gap, we first systematically quantify modality preference of OLLMs using a newly-curated conflict-based benchmark and the modality selection rate metric. Our evaluation of ten representative OLLMs reveals a notable paradigm shift: unlike the ``text-dominance'' of traditional VLMs, most OLLMs exhibit a pronounced visual preference. To further understand the underlying mechanism, we conduct layer-wise probing and demonstrate that such modality preference is not static but emerges progressively in the mid-to-late layers. Building upon these insights, we leverage these internal signals to diagnose cross-modal hallucinations, achieving competitive performance across three downstream multi-modal benchmarks without task-specific data. Our work provides both a mechanistic understanding and a practical tool for building more trustworthy OLLMs. Our code and related resources are publicly available at: https://github.com/icip-cas/OmniPreference

Workflow Status

Review status: pending
Role: unreviewed
Read priority: now
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@misc{yan2026beyond,
  title = {Beyond Text-Dominance: Understanding Modality Preference of Omni-modal Large Language Models},
  author = {Xinru Yan and Boxi Cao and Yaojie Lu and Hongyu Lin and Weixiang Zhou and Le Sun and Xianpei Han},
  year = {2026},
  abstract = {Native Omni-modal Large Language Models (OLLMs) have shifted from pipeline architectures to unified representation spaces. However, this native integration gives rise to a critical yet underexplored phenomenon: modality preference. To bridge this gap, we first systematically quantify modality preference of OLLMs using a newly-curated conflict-based benchmark and the modality selection rate metric. Our evaluation of ten representative OLLMs reveals a notable paradigm shift: unlike the ``text-domi},
  url = {https://huggingface.co/papers/2604.16902},
  keywords = {omni-modal large language models, modality preference, conflict-based benchmark, modality selection rate, cross-modal hallucinations, layer-wise probing, code available, huggingface daily},
  eprint = {2604.16902},
  archiveprefix = {arXiv},
}

Metadata

{}