Paper Detail

Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads

Aryo Pradipta Gema, Beatrice Alex, Pasquale Minervini

huggingface Score 19.8

Published 2026-07-01 · First seen 2026-07-04

General AI

Abstract

In long-context use, large language models frequently synthesize answers from the meaning of a relevant context span rather than literally copy-pasting them. Identifying which attention heads perform this synthesis matters for interpreting long-context model behavior. Yet existing detectors miss these heads by construction: they reward heads whose attended token matches the generated token, a literal-copy criterion that captures where a head reads but not what it writes through its output-value (OV) circuit, the very mechanism that carries non-literal retrieval. We introduce Logit-Contribution Scoring (LOCOS), a write-aware detector that scores each head by the projection of its OV-circuit output onto the answer-token unembedding direction, contrasting needle and off-needle source positions in a single forward pass. Across three model families (Qwen3, Gemma-3, OLMo-3.1), mean-ablating the top LOCOS heads on the NoLiMa non-literal retrieval benchmark collapses ROUGE-L at lower head counts than prior attention-based detections; on Qwen3-8B, ablating 50 heads drives ROUGE-L from 0.401 to 0.000 while the strongest baseline still retains 0.292. The selected heads are retrieval-specific: parametric recall and arithmetic reasoning stay at baseline under the same ablation. On Qwen3-8B, the same ablation also drops MuSiQue from 0.55 to 0.08 and BABI-Long from 0.62 to 0.20, while a random-heads control stays within 0.05 of baseline.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@misc{gema2026logit,
  title = {Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads},
  author = {Aryo Pradipta Gema and Beatrice Alex and Pasquale Minervini},
  year = {2026},
  abstract = {In long-context use, large language models frequently synthesize answers from the meaning of a relevant context span rather than literally copy-pasting them. Identifying which attention heads perform this synthesis matters for interpreting long-context model behavior. Yet existing detectors miss these heads by construction: they reward heads whose attended token matches the generated token, a literal-copy criterion that captures where a head reads but not what it writes through its output-value },
  url = {https://huggingface.co/papers/2607.01002},
  keywords = {attention heads, logit-contribution scoring, OV-circuit, answer-token unembedding, non-literal retrieval, ROUGE-L, parametric recall, arithmetic reasoning, MuSiQue, BABI-Long, code available, huggingface daily},
  eprint = {2607.01002},
  archiveprefix = {arXiv},
}

Metadata

{}