A Survey of Generalization and Adaptation in Medical Imaging Foundation Models

Haoran Ying
Yijun Lia
Zedong Fu

0 evaluations Published on Jul 11, 2025

This article on Sciety

Abstract

Medical imaging is a cornerstone of modern healthcare, playing a critical role in diagnosis, treatment planning, and disease monitoring across a wide range of clinical applications. However, the development and deployment of machine learning models in this domain face persistent challenges stemming from domain shift—the divergence in data distributions across imaging centers, patient populations, devices, and acquisition protocols. These shifts degrade model performance when applied outside the training domain, thereby undermining the generalizability and reliability of traditional supervised learning approaches in real-world clinical environments. Historically, research in domain adaptation and generalization has sought to mitigate this issue through techniques such as adversarial training, domain-invariant feature learning, and data augmentation. While these methods have shown moderate success, they often rely on access to labeled data, tailored adaptation procedures, or knowledge of the target domain, limiting their scalability and practicality. The recent advent of foundation models—large-scale, pre-trained models capable of zero-shot and few-shot inference—has introduced a paradigm shift in addressing these challenges. Leveraging vast and heterogeneous datasets, often in a self-supervised or weakly supervised manner, foundation models in medical imaging exhibit emergent properties that enable superior transferability and robustness across unseen domains and tasks. These models, including vision-language models (e.g., MedCLIP, GLoRIA, CheXzero) and large-scale unimodal encoders (e.g., Swin UNETR, TransUNet), encode rich semantic representations that are less sensitive to superficial domain-specific artifacts. As a result, they show state-of-the-art performance in tasks such as classification, segmentation, and report generation, even under substantial domain shifts and with minimal supervision. In this survey, we provide a comprehensive review of domain adaptation and generalization in the era of foundation models for medical imaging. We begin by tracing the historical evolution of domain robustness techniques, outlining their theoretical foundations and empirical limitations. We then detail the architectural and training paradigms that underpin foundation models, highlighting their ability to learn transferable, multi-modal, and clinically aligned representations. Through a comparative analysis, we evaluate the performance of foundation models against traditional domain adaptation techniques across a range of benchmarks, imaging modalities, and clinical settings. Furthermore, we explore the interpretability, label efficiency, and deployment implications of this emerging class of models. We also identify and discuss key open challenges—including ethical considerations, computational constraints, interpretability, and the lack of standardized benchmarks—and propose future research directions to ensure responsible and equitable advancement. These include developing efficient training strategies, designing clinically meaningful evaluation protocols, supporting multilingual and multimodal understanding, and integrating continual learning frameworks to adapt to evolving clinical practices. Overall, this survey aims to bridge the gap between traditional domain adaptation techniques and the emerging capabilities of foundation models, offering a unified perspective for researchers, practitioners, and policymakers seeking to develop robust, generalizable, and clinically trustworthy AI systems in medical imaging.

Related articles are currently not available for this article.