Paper Detail

Discrete Diffusion Language Models for Interactive Radiology Report Drafting

Max Van Puyvelde, Halil Ibrahim Gulluk, Wim Van Criekinge, Olivier Gevaert

Browse

Workflow Queues

huggingface Score 9.8

Published 2026-07-01 · First seen 2026-07-03

General AI

Open paper source

Abstract

Diffusion language models, which generate text by denoising a token canvas bidirectionally instead of emitting tokens left to right, have become competitive with autoregressive (AR) generation. Medical foundation models, however, remain almost entirely autoregressive. We adapt a mixture-of-experts diffusion language model, DiffusionGemma-26B, and benchmark it against its same-size AR sibling Gemma-4-26B under an identical LoRA recipe on medical visual question answering datasets, scored by a verbosity-robust LLM judge. Diffusion matches or exceeds AR on all of them, and the finetuned model (3.8B active) is competitive with frontier vision-language models; its decoding is also 3.5-4.4x faster. Beyond this parity, the diffusion model offers a drafting capability AR lacks: any-order infill. Because the canvas is denoised bidirectionally, a radiologist can fix report fragments and have the model fill the text between them, an operation inherent to diffusion but not to autoregression, which is subpar at it. This suits real reports, which are often terse or inconsistent across clinicians and institutions.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: soon
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@misc{puyvelde2026discrete,
  title = {Discrete Diffusion Language Models for Interactive Radiology Report Drafting},
  author = {Max Van Puyvelde and Halil Ibrahim Gulluk and Wim Van Criekinge and Olivier Gevaert},
  year = {2026},
  abstract = {Diffusion language models, which generate text by denoising a token canvas bidirectionally instead of emitting tokens left to right, have become competitive with autoregressive (AR) generation. Medical foundation models, however, remain almost entirely autoregressive. We adapt a mixture-of-experts diffusion language model, DiffusionGemma-26B, and benchmark it against its same-size AR sibling Gemma-4-26B under an identical LoRA recipe on medical visual question answering datasets, scored by a ver},
  url = {https://huggingface.co/papers/2607.01436},
  keywords = {diffusion language models, autoregressive generation, medical foundation models, mixture-of-experts, DiffusionGemma-26B, Gemma-4-26B, LoRA, medical visual question answering, LLM judge, bidirectional denoising, drafting capability, infill, radiologist, clinical reports, huggingface daily},
  eprint = {2607.01436},
  archiveprefix = {arXiv},
}

Metadata

{}