Paper Detail

OptArgus: A Multi-Agent System to Detect Hallucinations in LLM-based Optimization Modeling

Zhong Li, Zihan Guo, Xiaohan Lu, Juntao Wang, Jie Song, Chao Shen, Jiageng Wu, Mingyang Sun

Browse

Workflow Queues

arxiv Score 13.8

Published 2026-05-12 · First seen 2026-05-13

Research Track A · General AI

Open paper source

Abstract

Large language models (LLMs) are increasingly used to translate natural-language optimization problems into mathematical formulations and solver code, but matching the reference objective value is not a reliable test of correctness: an artifact may agree numerically while still changing the underlying optimization semantics. We formulate this issue as \emph{optimization-modeling hallucination detection}, namely structural consistency auditing over the problem description, symbolic model, and solver implementation. We develop, to our knowledge, the first fine-grained hallucination taxonomy specifically for optimization modeling, spanning objective, variable, constraint, and implementation failures. We use this taxonomy to design OptArgus, a multi-agent detector with conductor routing, specialist auditors, and evidence consolidation. To evaluate this setting, we introduce a three-part benchmark suite with $484$ clean artifacts, $1266$ controlled injected artifacts, and $6292$ natural LLM-generated artifacts. Against a matched single-agent baseline, OptArgus produces fewer false alarms on clean artifacts, more accurate top-ranked localization on controlled single-error cases, and stronger detection on natural model outputs. Together, these contributions turn optimization-modeling hallucination detection into a concrete empirical problem and suggest that modular, taxonomy-grounded auditing is a practical route to more reliable optimization modeling.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: now
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@article{li2026optargus,
  title = {OptArgus: A Multi-Agent System to Detect Hallucinations in LLM-based Optimization Modeling},
  author = {Zhong Li and Zihan Guo and Xiaohan Lu and Juntao Wang and Jie Song and Chao Shen and Jiageng Wu and Mingyang Sun},
  year = {2026},
  abstract = {Large language models (LLMs) are increasingly used to translate natural-language optimization problems into mathematical formulations and solver code, but matching the reference objective value is not a reliable test of correctness: an artifact may agree numerically while still changing the underlying optimization semantics. We formulate this issue as \textbackslash{}emph\{optimization-modeling hallucination detection\}, namely structural consistency auditing over the problem description, symbolic model, and sol},
  url = {https://arxiv.org/abs/2605.11738},
  keywords = {cs.AI},
  eprint = {2605.11738},
  archiveprefix = {arXiv},
}

Metadata

{}