Paper Detail

Helpful or Harmful? Evaluating LLM-Assisted Vulnerability Patching via a Human Study

Giulian Biolo, Michael Tezza, Yuanjun Gong, Fabio Massacci

Browse

Workflow Queues

arxiv Score 8.2

Published 2026-06-24 · First seen 2026-06-25

General AI

Open paper source

Abstract

Software vulnerability remediation is a cognitively demanding task that requires specialized security expertise often lacking in general developers. In the meantime, Large Language Models (LLMs) assisted tools show potential in vulnerability detection, location, and repair tasks. [Hypothesis:] While LLM-assistance is hypothesized to accelerate patching, it also risks introducing hallucinations or insecure code, leading to a higher likelihood of generating superficial repairs that bypass the standard functionality checks but fail the security validation. [Objective:] We aim to present an empirical experiment, unveiling the capability of LLM-assisted vulnerability patching compared to manual debugging on human participants in real-world scenarios. [Method:] We plan to conduct a controlled experiment using a Balanced Crossover design. For that, we have developed a WebApp for code execution and integrated hidden Ghost Tests to verify patch integrity beyond visible functional requirements. The experiment involves training and evaluation scenarios. The remediation speed, remediation efficacy for both standard functionality tests and security tests, and participant perception will be evaluated. [Pilot Study:] A pilot experiment with a small sample of participants has been conducted, providing insights for the following study.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: soon
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@article{biolo2026helpful,
  title = {Helpful or Harmful? Evaluating LLM-Assisted Vulnerability Patching via a Human Study},
  author = {Giulian Biolo and Michael Tezza and Yuanjun Gong and Fabio Massacci},
  year = {2026},
  abstract = {Software vulnerability remediation is a cognitively demanding task that requires specialized security expertise often lacking in general developers. In the meantime, Large Language Models (LLMs) assisted tools show potential in vulnerability detection, location, and repair tasks. [Hypothesis:] While LLM-assistance is hypothesized to accelerate patching, it also risks introducing hallucinations or insecure code, leading to a higher likelihood of generating superficial repairs that bypass the stan},
  url = {https://arxiv.org/abs/2606.25973},
  keywords = {cs.SE, cs.AI},
  eprint = {2606.25973},
  archiveprefix = {arXiv},
}

Metadata

{}