Paper Detail

Training a Student Expert via Semi-Supervised Foundation Model Distillation

Pardis Taghavi, Tian Liu, Renjie Li, Reza Langari, Zhengzhong Tu

Browse

Workflow Queues

huggingface Score 3.5

Published 2026-04-04 · First seen 2026-04-11

General AI

Open paper source

Abstract

Foundation models deliver strong perception but are often too computationally heavy to deploy, and adapting them typically requires costly annotations. We introduce a semi-supervised knowledge distillation (SSKD) framework that compresses pre-trained vision foundation models (VFMs) into compact experts using limited labeled and abundant unlabeled data, and instantiate it for instance segmentation where per-pixel labels are particularly expensive. The framework unfolds in three stages: (1) domain adaptation of the VFM(s) via self-training with contrastive calibration, (2) knowledge transfer through a unified multi-objective loss, and (3) student refinement to mitigate residual pseudo-label bias. Central to our approach is an instance-aware pixel-wise contrastive loss that fuses mask and class scores to extract informative negatives and enforce clear inter-instance margins. By maintaining this contrastive signal across both adaptation and distillation, we align teacher and student embeddings and more effectively leverage unlabeled images. On Cityscapes and ADE20K, our approx 11times smaller student improves over its zero-shot VFM teacher(s) by +11.9 and +8.6 AP, surpasses adapted teacher(s) by +3.4 and +1.5 AP, and outperforms state-of-the-art SSKD methods on benchmarks.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: later
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@misc{taghavi2026training,
  title = {Training a Student Expert via Semi-Supervised Foundation Model Distillation},
  author = {Pardis Taghavi and Tian Liu and Renjie Li and Reza Langari and Zhengzhong Tu},
  year = {2026},
  abstract = {Foundation models deliver strong perception but are often too computationally heavy to deploy, and adapting them typically requires costly annotations. We introduce a semi-supervised knowledge distillation (SSKD) framework that compresses pre-trained vision foundation models (VFMs) into compact experts using limited labeled and abundant unlabeled data, and instantiate it for instance segmentation where per-pixel labels are particularly expensive. The framework unfolds in three stages: (1) domain},
  url = {https://huggingface.co/papers/2604.03841},
  keywords = {vision foundation models, knowledge distillation, self-training, contrastive calibration, multi-objective loss, pseudo-label bias, instance-aware pixel-wise contrastive loss, mask scores, class scores, embedding alignment, huggingface daily},
  eprint = {2604.03841},
  archiveprefix = {arXiv},
}

Metadata

{}