Paper Detail

Dynamic Summary Generation for Interpretable Multimodal Depression Detection

Shiyu Teng, Jiaqing Liu, Hao Sun, Yu Li, Shurong Chai, Ruibo Hou, Tomoko Tateyama, Lanfen Lin, Yen-Wei Chen

Browse

Workflow Queues

arxiv Score 8.3

Published 2026-04-13 · First seen 2026-04-14

General AI

Open paper source

Abstract

Depression remains widely underdiagnosed and undertreated because stigma and subjective symptom ratings hinder reliable screening. To address this challenge, we propose a coarse-to-fine, multi-stage framework that leverages large language models (LLMs) for accurate and interpretable detection. The pipeline performs binary screening, five-class severity classification, and continuous regression. At each stage, an LLM produces progressively richer clinical summaries that guide a multimodal fusion module integrating text, audio, and video features, yielding predictions with transparent rationale. The system then consolidates all summaries into a concise, human-readable assessment report. Experiments on the E-DAIC and CMDC datasets show significant improvements over state-of-the-art baselines in both accuracy and interpretability.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: now
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@article{teng2026dynamic,
  title = {Dynamic Summary Generation for Interpretable Multimodal Depression Detection},
  author = {Shiyu Teng and Jiaqing Liu and Hao Sun and Yu Li and Shurong Chai and Ruibo Hou and Tomoko Tateyama and Lanfen Lin and Yen-Wei Chen},
  year = {2026},
  abstract = {Depression remains widely underdiagnosed and undertreated because stigma and subjective symptom ratings hinder reliable screening. To address this challenge, we propose a coarse-to-fine, multi-stage framework that leverages large language models (LLMs) for accurate and interpretable detection. The pipeline performs binary screening, five-class severity classification, and continuous regression. At each stage, an LLM produces progressively richer clinical summaries that guide a multimodal fusion },
  url = {https://arxiv.org/abs/2604.11334},
  keywords = {cs.AI},
  eprint = {2604.11334},
  archiveprefix = {arXiv},
}

Metadata

{}