Paper Detail
Tristan Kirscher, Alexandra Ertl, Klaus Maier-Hein, Xavier Coubez, Philippe Meyer, Sylvain Faisan
Pancreatic ductal adenocarcinoma (PDAC) segmentation on contrast-enhanced CT is inherently ambiguous: inter-rater disagreement among experts reflects genuine uncertainty rather than annotation noise. Standard deep learning approaches assume a single ground truth, producing probabilistic outputs that can be poorly calibrated and difficult to interpret under such ambiguity. We present TwinTrack, a framework that addresses this gap through post-hoc calibration of ensemble segmentation probabilities to the empirical mean human response (MHR) -the fraction of expert annotators labeling a voxel as tumor. Calibrated probabilities are thus directly interpretable as the expected proportion of annotators assigning the tumor label, explicitly modeling inter-rater disagreement. The proposed post-hoc calibration procedure is simple and requires only a small multi-rater calibration set. It consistently improves calibration metrics over standard approaches when evaluated on the MICCAI 2025 CURVAS-PDACVI multi-rater benchmark.
No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.
No ranking explanation is available yet.
No tags.
@misc{kirscher2026twintrack,
title = {TwinTrack: Post-hoc Multi-Rater Calibration for Medical Image Segmentation},
author = {Tristan Kirscher and Alexandra Ertl and Klaus Maier-Hein and Xavier Coubez and Philippe Meyer and Sylvain Faisan},
year = {2026},
abstract = {Pancreatic ductal adenocarcinoma (PDAC) segmentation on contrast-enhanced CT is inherently ambiguous: inter-rater disagreement among experts reflects genuine uncertainty rather than annotation noise. Standard deep learning approaches assume a single ground truth, producing probabilistic outputs that can be poorly calibrated and difficult to interpret under such ambiguity. We present TwinTrack, a framework that addresses this gap through post-hoc calibration of ensemble segmentation probabilities},
url = {https://huggingface.co/papers/2604.15950},
keywords = {ensemble segmentation, post-hoc calibration, empirical mean human response, inter-rater disagreement, probabilistic outputs, calibration metrics, huggingface daily},
eprint = {2604.15950},
archiveprefix = {arXiv},
}
{}