Paper Detail

LLM Anonymization Against Agentic Re-Identification

Ziwen Li, Jianing Wen, Tianshi Li

huggingface Score 7.5

Published 2026-06-01 · First seen 2026-06-06

General AI

Abstract

Agentic LLMs with web search change the threat model for text anonymization: weak contextual cues can become cross-referenceable evidence for re-identification, yet those same details also carry downstream analytic value of the text. Existing defenses either remove explicit identifiers, perturb text for formal privacy, or test rewritten text against non-web inference models, leaving underexplored the operating region between resistance to agentic web-search re-identification and utility retention. We introduce AURA (Anonymization with Utility-Retention Adaptation), an LLM-powered mask-reconstruct framework that decouples privacy localization from utility-preserving reconstruction and selects candidates with adversarial privacy and utility-retention checks. We evaluate AURA on real-user interview transcripts using re-identification attacks carried out by web-search agents, along with a utility evaluation based on interviewee-profile facts, codebook facts, and the joint contextual utility grid. Our results show that AURA improves the privacy-utility frontier by using adaptive privacy scope to strengthen resistance to agentic re-identification and using a mask-reconstruct anonymization method to better preserve contextual utility under fixed privacy scope.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
soon
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@misc{li2026llm,
  title = {LLM Anonymization Against Agentic Re-Identification},
  author = {Ziwen Li and Jianing Wen and Tianshi Li},
  year = {2026},
  abstract = {Agentic LLMs with web search change the threat model for text anonymization: weak contextual cues can become cross-referenceable evidence for re-identification, yet those same details also carry downstream analytic value of the text. Existing defenses either remove explicit identifiers, perturb text for formal privacy, or test rewritten text against non-web inference models, leaving underexplored the operating region between resistance to agentic web-search re-identification and utility retentio},
  url = {https://huggingface.co/papers/2605.30848},
  keywords = {LLM-powered, mask-reconstruct, anonymization, agentic web-search, re-identification, privacy-utility frontier, adaptive privacy scope, contextual utility, code available, huggingface daily},
  eprint = {2605.30848},
  archiveprefix = {arXiv},
}

Metadata

{}