Paper Detail

Auditing Training Data in Domain-adapted LLMs: LoRA-MINT

Gonzalo Mancera, Daniel DeAlcala, Aythami Morales, Julian Fierrez, Ruben Tolosana, Francisco Jurado

Browse

Workflow Queues

arxiv Score 9.3

Published 2026-06-05 · First seen 2026-06-09

General AI

Open paper source

Abstract

We present LoRA-MINT, a new methodology for Membership Inference Test (MINT) applied to recent Large Language Models (LLMs) fine-tuned for specific Natural Language Processing (NLP) tasks through Low-Rank Adaptation (LoRA). The primary goal is to assess whether individual samples were part of the training data of these adapted models, providing a useful auditing tool for the management of intellectual property and sensitive data. Our analysis explores the relationship between model perplexity and membership status, providing a systematic framework for estimating data exposure in fine-tuned LLMs. We conducted experiments on four models and three benchmark datasets, obtaining precision values in determining if given data were used for training ranging from 0.77 to 0.92, which outperform state-of-the-art baselines and demonstrate the robustness and generality of the proposed method. In general, our findings underscore the potential of LoRA-MINT as an effective and scalable framework for auditing LLMs, improving transparency, and fostering the ethical and responsible deployment of AI and NLP technologies. For the sake of concreteness and current relevance, our discussion and experiments are centered on LoRAadjusted LLMs, but note that most of the presented methodology is easily applicable for auditing training data given any other technique for adapting LLMs or, more generally, any other domain-adapted AI models.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: soon
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@article{mancera2026auditing,
  title = {Auditing Training Data in Domain-adapted LLMs: LoRA-MINT},
  author = {Gonzalo Mancera and Daniel DeAlcala and Aythami Morales and Julian Fierrez and Ruben Tolosana and Francisco Jurado},
  year = {2026},
  abstract = {We present LoRA-MINT, a new methodology for Membership Inference Test (MINT) applied to recent Large Language Models (LLMs) fine-tuned for specific Natural Language Processing (NLP) tasks through Low-Rank Adaptation (LoRA). The primary goal is to assess whether individual samples were part of the training data of these adapted models, providing a useful auditing tool for the management of intellectual property and sensitive data. Our analysis explores the relationship between model perplexity an},
  url = {https://arxiv.org/abs/2606.06946},
  keywords = {cs.CL, cs.AI},
  eprint = {2606.06946},
  archiveprefix = {arXiv},
}

Metadata

{}