Paper Detail

Multimodal LLMs are not all you need for Pediatric Speech Language Pathology

Darren Fürst, Sebastian Steindl, Ulrich Schäfer

arxiv Score 11.2

Published 2026-04-29 · First seen 2026-04-30

General AI

Abstract

Speech Sound Disorders (SSD) affect roughly five percent of children, yet speech-language pathologists face severe staffing shortages and unmanageable caseloads. We test a hierarchical approach to SSD classification on the granular multi-task SLPHelmUltraSuitePlus benchmark. We propose a cascading approach from binary classification to type, and symptom classification. By fine-tuning Speech Representation Models (SRM), and using targeted data augmentation we mitigate biases found by previous works, and improve upon all clinical tasks in the benchmark. We also treat Automatic Speech Recognition (ASR) with our data augmentation approach. Our results demonstrate that SRM consistently outperform the LLM-based state-of-the-art across all evaluated tasks by a large margin. We publish our models and code to foster future research.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@article{frst2026multimodal,
  title = {Multimodal LLMs are not all you need for Pediatric Speech Language Pathology},
  author = {Darren Fürst and Sebastian Steindl and Ulrich Schäfer},
  year = {2026},
  abstract = {Speech Sound Disorders (SSD) affect roughly five percent of children, yet speech-language pathologists face severe staffing shortages and unmanageable caseloads. We test a hierarchical approach to SSD classification on the granular multi-task SLPHelmUltraSuitePlus benchmark. We propose a cascading approach from binary classification to type, and symptom classification. By fine-tuning Speech Representation Models (SRM), and using targeted data augmentation we mitigate biases found by previous wor},
  url = {https://arxiv.org/abs/2604.26568},
  keywords = {cs.CL},
  eprint = {2604.26568},
  archiveprefix = {arXiv},
}

Metadata

{}