Paper Detail

Semantic Triplet Restoration: A Novel Protocol for Hierarchical Table Understanding in Large Language Models

Yibin Zhao, Fangxin Shang, Dingrui Yang, Yuqi Wang

arxiv Score 7.8

Published 2026-05-29 · First seen 2026-06-01

General AI

Abstract

Table question answering requires models to recover semantic relations encoded implicitly by two-dimensional layout, merged cells, and hierarchical headers. Current pipelines typically use HTML or Markdown as intermediate table representations, but these layout-oriented serializations introduce markup overhead and require large language models to infer header-cell alignments from row and column spans. We propose Semantic Triplet Restoration (STR), a protocol that rewrites each cell as an atomic fact <item path, feature path, value>, where the item path specifies the row-wise entity, the feature path specifies the hierarchical attribute, and the value contains the cell content. We also present TripletQL, a lightweight query-aware router that uses STR to select an appropriate rendering or filtered subset of triplets for each question. Across four Chinese and English table-QA benchmarks, STR matches or improves upon HTML-based baselines while reducing input tokens. The relative benefit grows for smaller language models and longer table contexts, suggesting that explicit semantic representations are especially useful under constrained inference budgets. Code and data are available at https://github.com/Phoenix-ni/STR.git .

Workflow Status

Review status
pending
Role
unreviewed
Read priority
soon
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@article{zhao2026semantic,
  title = {Semantic Triplet Restoration: A Novel Protocol for Hierarchical Table Understanding in Large Language Models},
  author = {Yibin Zhao and Fangxin Shang and Dingrui Yang and Yuqi Wang},
  year = {2026},
  abstract = {Table question answering requires models to recover semantic relations encoded implicitly by two-dimensional layout, merged cells, and hierarchical headers. Current pipelines typically use HTML or Markdown as intermediate table representations, but these layout-oriented serializations introduce markup overhead and require large language models to infer header-cell alignments from row and column spans. We propose Semantic Triplet Restoration (STR), a protocol that rewrites each cell as an atomic },
  url = {https://arxiv.org/abs/2605.31550},
  keywords = {cs.CL},
  eprint = {2605.31550},
  archiveprefix = {arXiv},
}

Metadata

{}