Paper Detail

Xetrieval: Mechanistically Explaining Dense Retrieval

Zhixin Cai, Jun Bai, Yang Liu, Jiaqi Li, Yichi Zhang, Taichuan Li, Zhuofan Chen, Zixia Jia, Zilong Zheng, Wenge Rong

Browse

Workflow Queues

huggingface Score 13.0

Published 2026-05-28 · First seen 2026-05-31

General AI

Open paper source

Abstract

Explaining why dense retrievers assign high relevance scores remains challenging because retrieval decisions are made through opaque high-dimensional embeddings. Existing explanations often focus on surface signals, such as lexical matches, token alignments, or post-hoc textual rationales, and thus provide limited insight into the latent factors that shape dense retrieval behavior at the embedding level. We propose Xetrieval, an embedding-level mechanistic framework for explaining dense retrieval. Xetrieval first introduces a lightweight reasoning internalizer that approximates Chain-of-Thought reasoning directly in the embedding space with a single forward pass, enriching sentence embeddings with reasoning-oriented information while avoiding expensive autoregressive generation. It then decomposes these reasoning-enhanced embeddings into sparse, human-interpretable features, each associated with a coherent natural language description. By aggregating sparse feature overlaps across multiple document-side views, Xetrieval provides feature-level explanations of individual retrieval decisions. Experiments on diverse retrievers and benchmarks show that Xetrieval uncovers coherent interpretable features, yields stronger pair-level intervention effects, and supports task-level feature steering. The project page and source code are available at https://hihiczx.github.io/Xetrieval .

Workflow Status

Review status: pending
Role: unreviewed
Read priority: now
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@misc{cai2026xetrieval,
  title = {Xetrieval: Mechanistically Explaining Dense Retrieval},
  author = {Zhixin Cai and Jun Bai and Yang Liu and Jiaqi Li and Yichi Zhang and Taichuan Li and Zhuofan Chen and Zixia Jia and Zilong Zheng and Wenge Rong},
  year = {2026},
  abstract = {Explaining why dense retrievers assign high relevance scores remains challenging because retrieval decisions are made through opaque high-dimensional embeddings. Existing explanations often focus on surface signals, such as lexical matches, token alignments, or post-hoc textual rationales, and thus provide limited insight into the latent factors that shape dense retrieval behavior at the embedding level. We propose Xetrieval, an embedding-level mechanistic framework for explaining dense retrieva},
  url = {https://huggingface.co/papers/2605.29507},
  keywords = {dense retrievers, high-dimensional embeddings, Chain-of-Thought reasoning, embedding space, reasoning internalizer, sparse features, human-interpretable features, feature-level explanations, retrieval decisions, pair-level intervention effects, task-level feature steering, code available, huggingface daily},
  eprint = {2605.29507},
  archiveprefix = {arXiv},
}

Metadata

{}