Paper Detail

MAJIC: Leveraging Articulatory Motion for Speech-based Emotion Recognition

Tanmay Srivastava, Paras Bhavnani, Benjir Alvee Islam, Shubham Jain

arxiv Score 6.3

Published 2026-06-16 · First seen 2026-06-17

General AI

Abstract

We introduce MAJIC, a multimodal emotion recognition system that leverages articulatory motion of the jaw and facial muscles for speech-based emotion recognition (SER). While most SER systems perform well on datasets with strongly expressed emotional speech of trained actors, their performance often degrades when emotional expressions become more subtle. We explore this challenge by engineering features from articulatory motion and integrating them with audio features using a multi-task learning framework. Our key insight is that emotion in speech manifests not only through vocal characteristics but also through distinct articulatory motions: jaw movements, facial muscle vibrations, and speech-induced vibrations. While audio captures features such as pitch and prosody, articulatory motion contains complementary information that is not present in audio alone. We evaluate our system on data collected from 20 participants across multiple sessions, 10 languages, and diverse scenarios, including prompted and conversational speech, showing its robustness across users and settings. MAJIC achieves 93% accuracy and 91% F1 score for emotion classification, outperforming strong audio-based baselines on our dataset.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
soon
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@article{srivastava2026majic,
  title = {MAJIC: Leveraging Articulatory Motion for Speech-based Emotion Recognition},
  author = {Tanmay Srivastava and Paras Bhavnani and Benjir Alvee Islam and Shubham Jain},
  year = {2026},
  abstract = {We introduce MAJIC, a multimodal emotion recognition system that leverages articulatory motion of the jaw and facial muscles for speech-based emotion recognition (SER). While most SER systems perform well on datasets with strongly expressed emotional speech of trained actors, their performance often degrades when emotional expressions become more subtle. We explore this challenge by engineering features from articulatory motion and integrating them with audio features using a multi-task learning},
  url = {https://arxiv.org/abs/2606.18228},
  keywords = {cs.HC},
  eprint = {2606.18228},
  archiveprefix = {arXiv},
}

Metadata

{}