Paper Detail

OptArgus: A Multi-Agent System to Detect Hallucinations in LLM-based Optimization Modeling

Zhong Li, Zihan Guo, Xiaohan Lu, Juntao Wang, Jie Song, Chao Shen, Jiageng Wu, Mingyang Sun

arxiv Score 17.3

Published 2026-05-12 · First seen 2026-05-13

Research Track A · General AI

Abstract

Large language models (LLMs) are increasingly used to translate natural-language optimization problems into mathematical formulations and solver code, but matching the reference objective value is not a reliable test of correctness: an artifact may agree numerically while still changing the underlying optimization semantics. We formulate this issue as \emph{optimization-modeling hallucination detection}, namely structural consistency auditing over the problem description, symbolic model, and solver implementation. We develop, to our knowledge, the first fine-grained hallucination taxonomy specifically for optimization modeling, spanning objective, variable, constraint, and implementation failures. We use this taxonomy to design OptArgus, a multi-agent detector with conductor routing, specialist auditors, and evidence consolidation. To evaluate this setting, we introduce a three-part benchmark suite with $484$ clean artifacts, $1266$ controlled injected artifacts, and $6292$ natural LLM-generated artifacts. Against a matched single-agent baseline, OptArgus produces fewer false alarms on clean artifacts, more accurate top-ranked localization on controlled single-error cases, and stronger detection on natural model outputs. Together, these contributions turn optimization-modeling hallucination detection into a concrete empirical problem and suggest that modular, taxonomy-grounded auditing is a practical route to more reliable optimization modeling.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@article{li2026optargus,
  title = {OptArgus: A Multi-Agent System to Detect Hallucinations in LLM-based Optimization Modeling},
  author = {Zhong Li and Zihan Guo and Xiaohan Lu and Juntao Wang and Jie Song and Chao Shen and Jiageng Wu and Mingyang Sun},
  year = {2026},
  abstract = {Large language models (LLMs) are increasingly used to translate natural-language optimization problems into mathematical formulations and solver code, but matching the reference objective value is not a reliable test of correctness: an artifact may agree numerically while still changing the underlying optimization semantics. We formulate this issue as \textbackslash{}emph\{optimization-modeling hallucination detection\}, namely structural consistency auditing over the problem description, symbolic model, and sol},
  url = {https://arxiv.org/abs/2605.11738},
  keywords = {cs.AI},
  eprint = {2605.11738},
  archiveprefix = {arXiv},
}

Metadata

{}