Paper Detail

Generating Synthetic Doctor-Patient Conversations for Long-form Audio Summarization

Yanis Labrak, David Grünert, Séverin Baroudi, Jiyun Chun, Pawel Cyrta, Sergio Burdisso, Ahmed Hassoon, David Liu, Adam Rothschild, Reed Van Deusen, Petr Motlicek, Andrew Perrault, Ricard Marxer, Thomas Schaaf

arxiv Score 9.8

Published 2026-04-07 · First seen 2026-04-08

General AI

Abstract

Long-context audio reasoning is underserved in both training data and evaluation. Existing benchmarks target short-context tasks, and the open-ended generation tasks most relevant to long-context reasoning pose well-known challenges for automatic evaluation. We propose a synthetic data generation pipeline designed to serve both as a training resource and as a controlled evaluation environment, and instantiate it for first-visit doctor-patient conversations with SOAP note generation as the task. The pipeline has three stages, persona-driven dialogue generation, multi-speaker audio synthesis with overlap/pause modeling, room acoustics, and sound events, and LLM-based reference SOAP note production, built entirely on open-weight models. We release 8,800 synthetic conversations with 1.3k hours of corresponding audio and reference notes. Evaluating current open-weight systems, we find that cascaded approaches still substantially outperform end-to-end models.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@article{labrak2026generating,
  title = {Generating Synthetic Doctor-Patient Conversations for Long-form Audio Summarization},
  author = {Yanis Labrak and David Grünert and Séverin Baroudi and Jiyun Chun and Pawel Cyrta and Sergio Burdisso and Ahmed Hassoon and David Liu and Adam Rothschild and Reed Van Deusen and Petr Motlicek and Andrew Perrault and Ricard Marxer and Thomas Schaaf},
  year = {2026},
  abstract = {Long-context audio reasoning is underserved in both training data and evaluation. Existing benchmarks target short-context tasks, and the open-ended generation tasks most relevant to long-context reasoning pose well-known challenges for automatic evaluation. We propose a synthetic data generation pipeline designed to serve both as a training resource and as a controlled evaluation environment, and instantiate it for first-visit doctor-patient conversations with SOAP note generation as the task. },
  url = {https://arxiv.org/abs/2604.06138},
  keywords = {cs.SD, cs.AI},
  eprint = {2604.06138},
  archiveprefix = {arXiv},
}

Metadata

{}