Latent Preference Modeling for Cross-Session Personalized Tool Calling

Abstract

Users often omit essential details in their requests to LLM-based agents, resulting in under-specified inputs for tool use. This poses a fundamental challenge for tool-augmented agents, as API execution typically requires complete arguments, highlighting the need for personalized tool calling. To study this problem, we introduce MPT, a benchmark comprising 265 multi-session dialogues that cover three challenges: Preference Recall, Preference Induction, and Preference Transfer. We also propose PRefine, a test-time memory-augmented method that represents user preferences as evolving hypotheses. Through a generate--verify--refine loop, it extracts reusable constraints from history and improves tool-calling accuracy while using only 1.24% of the tokens required by full-history prompting. These results indicate that robust personalization in agentic systems depends on memory that captures the reasons behind user choices, not just the choices themselves.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: now
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@misc{yoon2026latent,
  title = {Latent Preference Modeling for Cross-Session Personalized Tool Calling},
  author = {Yejin Yoon and Minseo Kim and Taeuk Kim},
  year = {2026},
  abstract = {Users often omit essential details in their requests to LLM-based agents, resulting in under-specified inputs for tool use. This poses a fundamental challenge for tool-augmented agents, as API execution typically requires complete arguments, highlighting the need for personalized tool calling. To study this problem, we introduce MPT, a benchmark comprising 265 multi-session dialogues that cover three challenges: Preference Recall, Preference Induction, and Preference Transfer. We also propose PR},
  url = {https://huggingface.co/papers/2604.17886},
  keywords = {tool-augmented agents, API execution, personalized tool calling, MPT benchmark, PRefine, test-time memory augmentation, generate--verify--refine loop, user preferences, multi-session dialogues, preference recall, preference induction, preference transfer, huggingface daily},
  eprint = {2604.17886},
  archiveprefix = {arXiv},
}

Metadata

{}