Paper Detail
Zhongying Deng, Cheng Tang, Ziyan Huang, Jiashi Lin, Ying Chen, Junzhi Ning, Chenglong Ma, Jiyao Liu, Wei Li, Yinghao Zhu, Shujian Gao, Yanyan Huang, Sibo Ju, Yanzhou Su, Pengcheng Chen, Wenhao Tang, Tianbin Li, Haoyu Wang, Yuanfeng Ji, Hui Sun, Shaobo Min, Liang Peng, Feilong Tang, Haochen Xue, Rulin Zhou, Chaoyang Zhang, Wenjie Li, Shaohao Rui, Weijie Ma, Xingyue Zhao, Yibin Wang, Kun Yuan, Zhaohui Lu, Shujun Wang, Jinjie Wei, Lihao Liu, Dingkang Yang, Lin Wang, Yulong Li, Haolin Yang, Yiqing Shen, Lequan Yu, Xiaowei Hu, Yun Gu, Yicheng Wu, Benyou Wang, Minghui Zhang, Angelica I. Aviles-Rivero, Qi Gao, Hongming Shan, Xiaoyu Ren, Fang Yan, Hongyu Zhou, Haodong Duan, Maosong Cao, Shanshan Wang, Bin Fu, Xiaomeng Li, Zhi Hou, Chunfeng Song, Lei Bai, Yuan Cheng, Yuandong Pu, Xiang Li, Wenhai Wang, Hao Chen, Jiaxin Zhuang, Songyang Zhang, Huiguang He, Mengzhang Li, Bohan Zhuang, Zhian Bai, Rongshan Yu, Liansheng Wang, Yukun Zhou, Xiaosong Wang, Xin Guo, Guanbin Li, Xiangru Lin, Dakai Jin, Mianxin Liu, Wenlong Zhang, Qi Qin, Conghui He, Yuqiang Li, Ye Luo, Nanqing Dong, Jie Xu, Wenqi Shao, Bo Zhang, Qiujuan Yan, Yihao Liu, Jun Ma, Zhi Lu, Yuewen Cao, Zongwei Zhou, Jianming Liang, Shixiang Tang, Qi Duan, Dongzhan Zhou, Chen Jiang, Yuyin Zhou, Yanwu Xu, Jiancheng Yang, Shaoting Zhang, Xiaohong Liu, Siqi Luo, Yi Xin, Chaoyu Liu, Haochen Wen, Xin Chen, Alejandro Lozano, Min Woo Sun, Yuhui Zhang, Yue Yao, Xiaoxiao Sun, Serena Yeung-Levy, Xia Li, Jing Ke, Chunhui Zhang, Zongyuan Ge, Ming Hu, Jin Ye, Zhifeng Li, Yirong Chen, Yu Qiao, Junjun He
Foundation models have demonstrated remarkable success across diverse domains and tasks, primarily due to the thrive of large-scale, diverse, and high-quality datasets. However, in the field of medical imaging, the curation and assembling of such medical datasets are highly challenging due to the reliance on clinical expertise and strict ethical and privacy constraints, resulting in a scarcity of large-scale unified medical datasets and hindering the development of powerful medical foundation models. In this work, we present the largest survey to date of medical image datasets, covering over 1,000 open-access datasets with a systematic catalog of their modalities, tasks, anatomies, annotations, limitations, and potential for integration. Our analysis exposes a landscape that is modest in scale, fragmented across narrowly scoped tasks, and unevenly distributed across organs and modalities, which in turn limits the utility of existing medical image datasets for developing versatile and robust medical foundation models. To turn fragmentation into scale, we propose a metadata-driven fusion paradigm (MDFP) that integrates public datasets with shared modalities or tasks, thereby transforming multiple small data silos into larger, more coherent resources. Building on MDFP, we release an interactive discovery portal that enables end-to-end, automated medical image dataset integration, and compile all surveyed datasets into a unified, structured table that clearly summarizes their key characteristics and provides reference links, offering the community an accessible and comprehensive repository. By charting the current terrain and offering a principled path to dataset consolidation, our survey provides a practical roadmap for scaling medical imaging corpora, supporting faster data discovery, more principled dataset creation, and more capable medical foundation models.
No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.
No ranking explanation is available yet.
No tags.
@article{deng2026project,
title = {Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development},
author = {Zhongying Deng and Cheng Tang and Ziyan Huang and Jiashi Lin and Ying Chen and Junzhi Ning and Chenglong Ma and Jiyao Liu and Wei Li and Yinghao Zhu and Shujian Gao and Yanyan Huang and Sibo Ju and Yanzhou Su and Pengcheng Chen and Wenhao Tang and Tianbin Li and Haoyu Wang and Yuanfeng Ji and Hui Sun and Shaobo Min and Liang Peng and Feilong Tang and Haochen Xue and Rulin Zhou and Chaoyang Zhang and Wenjie Li and Shaohao Rui and Weijie Ma and Xingyue Zhao and Yibin Wang and Kun Yuan and Zhaohui Lu and Shujun Wang and Jinjie Wei and Lihao Liu and Dingkang Yang and Lin Wang and Yulong Li and Haolin Yang and Yiqing Shen and Lequan Yu and Xiaowei Hu and Yun Gu and Yicheng Wu and Benyou Wang and Minghui Zhang and Angelica I. Aviles-Rivero and Qi Gao and Hongming Shan and Xiaoyu Ren and Fang Yan and Hongyu Zhou and Haodong Duan and Maosong Cao and Shanshan Wang and Bin Fu and Xiaomeng Li and Zhi Hou and Chunfeng Song and Lei Bai and Yuan Cheng and Yuandong Pu and Xiang Li and Wenhai Wang and Hao Chen and Jiaxin Zhuang and Songyang Zhang and Huiguang He and Mengzhang Li and Bohan Zhuang and Zhian Bai and Rongshan Yu and Liansheng Wang and Yukun Zhou and Xiaosong Wang and Xin Guo and Guanbin Li and Xiangru Lin and Dakai Jin and Mianxin Liu and Wenlong Zhang and Qi Qin and Conghui He and Yuqiang Li and Ye Luo and Nanqing Dong and Jie Xu and Wenqi Shao and Bo Zhang and Qiujuan Yan and Yihao Liu and Jun Ma and Zhi Lu and Yuewen Cao and Zongwei Zhou and Jianming Liang and Shixiang Tang and Qi Duan and Dongzhan Zhou and Chen Jiang and Yuyin Zhou and Yanwu Xu and Jiancheng Yang and Shaoting Zhang and Xiaohong Liu and Siqi Luo and Yi Xin and Chaoyu Liu and Haochen Wen and Xin Chen and Alejandro Lozano and Min Woo Sun and Yuhui Zhang and Yue Yao and Xiaoxiao Sun and Serena Yeung-Levy and Xia Li and Jing Ke and Chunhui Zhang and Zongyuan Ge and Ming Hu and Jin Ye and Zhifeng Li and Yirong Chen and Yu Qiao and Junjun He},
year = {2026},
abstract = {Foundation models have demonstrated remarkable success across diverse domains and tasks, primarily due to the thrive of large-scale, diverse, and high-quality datasets. However, in the field of medical imaging, the curation and assembling of such medical datasets are highly challenging due to the reliance on clinical expertise and strict ethical and privacy constraints, resulting in a scarcity of large-scale unified medical datasets and hindering the development of powerful medical foundation mo},
url = {https://arxiv.org/abs/2603.27460},
keywords = {cs.CV, cs.AI},
eprint = {2603.27460},
archiveprefix = {arXiv},
}
{}