Paper Detail

PIPE-Cypher: Automatic Enterprise Benchmark Generation for Text-to-Cypher Systems

Suraj Ranganath, Anish Raghavendra

huggingface Score 10.5

Published 2026-06-07 · First seen 2026-06-09

General AI

Abstract

Enterprise property graphs vary widely in schema structure, internal terminology, domain assumptions, governance constraints, and user interaction patterns. A deployment-relevant Text2Cypher benchmark therefore reflects the questions users and agents actually ask of that graph. Creating such a benchmark is difficult because schemas and values are unique, and graph structure changes over time. Each NL-query pair must also be executable, use real graph entities, preserve diversity, and remain balanced across query types and difficulty levels. We present PIPE-Cypher, a local benchmark-generation pipeline that turns a live property graph and optional seed queries from customer questions, analyst logs, or agent tool calls into balanced NL-to-Cypher benchmarks. PIPE-Cypher combines schema profiling, reverse-query grounding, constrained generation, deterministic Cypher governance, execution validation, redaction, diversity controls, and a calibrated local LLM judge. Using local Qwen3.5-9B generation and judging, PIPE-Cypher exports 3,000 accepted FinBench/SNB examples, completes three audited ablation suites, calibrates judge behavior with human labels, and evaluates 11 local downstream models. The resulting benchmark is deliberately discriminative: zero-shot transfer is weak, while a few-shot control shows that schema-specific example banks can help compatible model families. Together, PIPE-Cypher makes Text2Cypher benchmarking a repeatable process that evolves with the graph, its users, and its target workloads.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@misc{ranganath2026pipe,
  title = {PIPE-Cypher: Automatic Enterprise Benchmark Generation for Text-to-Cypher Systems},
  author = {Suraj Ranganath and Anish Raghavendra},
  year = {2026},
  abstract = {Enterprise property graphs vary widely in schema structure, internal terminology, domain assumptions, governance constraints, and user interaction patterns. A deployment-relevant Text2Cypher benchmark therefore reflects the questions users and agents actually ask of that graph. Creating such a benchmark is difficult because schemas and values are unique, and graph structure changes over time. Each NL-query pair must also be executable, use real graph entities, preserve diversity, and remain bala},
  url = {https://huggingface.co/papers/2606.08481},
  keywords = {Text2Cypher, property graphs, benchmark-generation pipeline, schema profiling, reverse-query grounding, constrained generation, Cypher governance, execution validation, redaction, diversity controls, local LLM judge, FinBench, SNB, zero-shot transfer, few-shot learning, code available, huggingface daily},
  eprint = {2606.08481},
  archiveprefix = {arXiv},
}

Metadata

{}