Paper Detail

Dynamic Summary Generation for Interpretable Multimodal Depression Detection

Shiyu Teng, Jiaqing Liu, Hao Sun, Yu Li, Shurong Chai, Ruibo Hou, Tomoko Tateyama, Lanfen Lin, Yen-Wei Chen

arxiv Score 9.3

Published 2026-04-13 · First seen 2026-04-14

General AI

Abstract

Depression remains widely underdiagnosed and undertreated because stigma and subjective symptom ratings hinder reliable screening. To address this challenge, we propose a coarse-to-fine, multi-stage framework that leverages large language models (LLMs) for accurate and interpretable detection. The pipeline performs binary screening, five-class severity classification, and continuous regression. At each stage, an LLM produces progressively richer clinical summaries that guide a multimodal fusion module integrating text, audio, and video features, yielding predictions with transparent rationale. The system then consolidates all summaries into a concise, human-readable assessment report. Experiments on the E-DAIC and CMDC datasets show significant improvements over state-of-the-art baselines in both accuracy and interpretability.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@article{teng2026dynamic,
  title = {Dynamic Summary Generation for Interpretable Multimodal Depression Detection},
  author = {Shiyu Teng and Jiaqing Liu and Hao Sun and Yu Li and Shurong Chai and Ruibo Hou and Tomoko Tateyama and Lanfen Lin and Yen-Wei Chen},
  year = {2026},
  abstract = {Depression remains widely underdiagnosed and undertreated because stigma and subjective symptom ratings hinder reliable screening. To address this challenge, we propose a coarse-to-fine, multi-stage framework that leverages large language models (LLMs) for accurate and interpretable detection. The pipeline performs binary screening, five-class severity classification, and continuous regression. At each stage, an LLM produces progressively richer clinical summaries that guide a multimodal fusion },
  url = {https://arxiv.org/abs/2604.11334},
  keywords = {cs.AI},
  eprint = {2604.11334},
  archiveprefix = {arXiv},
}

Metadata

{}