Paper Detail

Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation

Ziwen Zhao, Menglin Yang

huggingface Score 18.4

Published 2026-05-01 · First seen 2026-05-05

General AI

Abstract

Retrieval-augmented generation (RAG) enhances large language models with external knowledge, and tree-based RAG organizes documents into hierarchical indexes to support queries at multiple granularities. However, existing Tree-RAG methods designed for single-document retrieval face critical challenges in scaling to cross-document multi-hop questions: (1) poor distribution adaptability, where k-means clustering introduces noise due to rigid distribution assumptions; (2) structural isolation, as tree indexes lack explicit cross-document connections; and (3) coarse abstraction, which obscures fine-grained details. To address these limitations, we propose Ψ-RAG, a tree-RAG framework with two key components. First, a hierarchical abstract tree index built through an iterative "merging and collapse" process that adapts to data distributions without a priori assumption. Second, a multi-granular retrieval agent that intelligently interacts with the knowledge base with reorganized queries and an agent-powered hybrid retriever. Ψ-RAG supports diverse tasks from token-level question answering to document-level summarization. On cross-document multi-hop QA benchmarks, it outperforms RAPTOR by 25.9% and HippoRAG 2 by 7.4% in average F1 score. Code is available at https://github.com/Newiz430/Psi-RAG.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@misc{zhao2026hierarchical,
  title = {Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation},
  author = {Ziwen Zhao and Menglin Yang},
  year = {2026},
  abstract = {Retrieval-augmented generation (RAG) enhances large language models with external knowledge, and tree-based RAG organizes documents into hierarchical indexes to support queries at multiple granularities. However, existing Tree-RAG methods designed for single-document retrieval face critical challenges in scaling to cross-document multi-hop questions: (1) poor distribution adaptability, where k-means clustering introduces noise due to rigid distribution assumptions; (2) structural isolation, as t},
  url = {https://huggingface.co/papers/2605.00529},
  keywords = {retrieval-augmented generation, tree-based RAG, k-means clustering, hierarchical abstract tree index, multi-granular retrieval, cross-document multi-hop questions, document-level summarization, token-level question answering, code available, huggingface daily},
  eprint = {2605.00529},
  archiveprefix = {arXiv},
}

Metadata

{}