Paper Detail

The Price of Agreement: Measuring LLM Sycophancy in Agentic Financial Applications

Zhenyu Zhao, Aparna Balagopalan, Adi Agrawal, Dilshoda Yergasheva, Waseem Alshikh, Daniel M. Bikel

Browse

Workflow Queues

arxiv Score 11.3

Published 2026-04-27 · First seen 2026-04-28

General AI

Open paper source

Abstract

Given the increased use of LLMs in financial systems today, it becomes important to evaluate the safety and robustness of such systems. One failure mode that LLMs frequently display in general domain settings is that of sycophancy. That is, models prioritize agreement with expressed user beliefs over correctness, leading to decreased accuracy and trust. In this work, we focus on evaluating sycophancy that LLMs display in agentic financial tasks. Our findings are three-fold: first, we find the models show only low to modest drops in performance in the face of user rebuttals or contradictions to the reference answer, which distinguishes sycophancy that models display in financial agentic settings from findings in prior work. Second, we introduce a suite of tasks to test for sycophancy by user preference information that contradicts the reference answer and find that most models fail in the presence of such inputs. Lastly, we benchmark different modes of recovery such as input filtering with a pretrained LLM.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: now
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@article{zhao2026price,
  title = {The Price of Agreement: Measuring LLM Sycophancy in Agentic Financial Applications},
  author = {Zhenyu Zhao and Aparna Balagopalan and Adi Agrawal and Dilshoda Yergasheva and Waseem Alshikh and Daniel M. Bikel},
  year = {2026},
  abstract = {Given the increased use of LLMs in financial systems today, it becomes important to evaluate the safety and robustness of such systems. One failure mode that LLMs frequently display in general domain settings is that of sycophancy. That is, models prioritize agreement with expressed user beliefs over correctness, leading to decreased accuracy and trust. In this work, we focus on evaluating sycophancy that LLMs display in agentic financial tasks. Our findings are three-fold: first, we find the mo},
  url = {https://arxiv.org/abs/2604.24668},
  keywords = {cs.AI, cs.LG},
  eprint = {2604.24668},
  archiveprefix = {arXiv},
}

Metadata

{}