综述文章

Self-Supervised Vision Transformers for Medical Image Segmentation with Limited Annotations

Views Downloads

摘要

Annotating medical images for segmentation is expensive and requires domain expertise. We propose MedSSL-ViT, a self-supervised pre-training framework for Vision Transformers (ViT) tailored to medical imaging. MedSSL-ViT combines masked image modeling with anatomical-aware contrastive learning, leveraging the structured nature of medical images. Pre-trained on 850K unlabeled chest X-rays and CT slices, the model achieves state-of-the-art segmentation performance on four downstream tasks using only 10% of annotations: lung segmentation (Dice: 97.2%), cardiac chamber segmentation (Dice: 93.5%), liver tumor segmentation (Dice: 78.8%), and retinal vessel segmentation (Dice: 82.1%). With just 1% labels, MedSSL-ViT still outperforms fully supervised baselines trained on 100% labels by 2-5% Dice score.

作者简介

  • Priya Patel Department of Biomedical Informatics, Stanford University, Stanford, CA 94305, USA
    Priya Patel is an associate professor at Department of Biomedical Informatics, Stanford University, Stanford, CA 94305, USA. Their research focuses on biomedical engineering, with over 31 publications in peer-reviewed journals.
  • Xiaofeng Liu School of Computer Science, Fudan University, Shanghai 200433, China
    Xiaofeng Liu is an associate professor at School of Computer Science, Fudan University, Shanghai 200433, China. Their research focuses on energy systems, with over 70 publications in peer-reviewed journals.
  • Thomas Müller German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
    Thomas Müller is a research fellow at German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany. Their research focuses on energy systems, with over 71 publications in peer-reviewed journals.