The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary

Abstract

Extended chain-of-thought reasoning can degrade performance on deterministic state-tracking tasks, not due to preference biases, but limits rooted in the information-theoretic capacity of decoder-only attention. We establish: (1) an Attention Bottleneck Theorem with a complementary achievability construction, bounding state-tracking capacity as $O(H \cdot \log(L/H) \cdot \sqrt{d_h})$; (2) a context-dependent error model yielding super-exponential accuracy decay; (3) the State-Space Jaccard metric distinguishing capability from preference failures; (4) a Deterministic Horizon $d^* \in [19, 31]$ beyond which tool delegation becomes necessary. Across 12 models and 8 task domains (including SWE-Bench, WebArena, and SQL-Multi), tool-integrated reasoning consistently outperforms neural chain-of-thought; on the primary model suite it reaches 86-94% accuracy versus 24-42% for neural chain-of-thought. Fine-tuning on optimal-length traces yields $<$5% improvement, confirming an architectural ceiling, and high cross-model correlation ($r = 0.81$-$0.91$) indicates these failures are architectural rather than training-specific. Our results provide principled guidance for when pure neural reasoning should yield to hybrid approaches in agentic systems.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: now
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@article{guo2026deterministic,
  title = {The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary},
  author = {Dongxin Guo and Jikun Wu and Siu Ming Yiu},
  year = {2026},
  abstract = {Extended chain-of-thought reasoning can degrade performance on deterministic state-tracking tasks, not due to preference biases, but limits rooted in the information-theoretic capacity of decoder-only attention. We establish: (1) an Attention Bottleneck Theorem with a complementary achievability construction, bounding state-tracking capacity as \$O(H \textbackslash{}cdot \textbackslash{}log(L/H) \textbackslash{}cdot \textbackslash{}sqrt\{d\_h\})\$; (2) a context-dependent error model yielding super-exponential accuracy decay; (3) the State-Space Jaccard metri},
  url = {https://arxiv.org/abs/2606.00376},
  keywords = {cs.AI, cs.CL, cs.LG},
  eprint = {2606.00376},
  archiveprefix = {arXiv},
}

Metadata

{}