STPath: A Generative Foundation Model for Integrating Spatial Transcriptomics and Whole Slide Images
Abstract
Spatial transcriptomics (ST) has shown remarkable promise in pathology applications, shedding light on the spatial organization of gene expression and its relationship to the tumor microenvironment. However, its clinical adoption remains constrained due to the limited scalability of current sequencing technologies. While recent methods attempt to infer ST from whole slide images (WSIs) using pretrained image encoders, they remain restricted by limited gene coverage, organ-specific training, and the need for dataset-specific fine-tuning. In light of this, we introduceSTPath, a generative foundation model pretrained on a large-scale collection of WSIs paired with ST profiles. This extensive pretraining enables STPath to directly predict gene expression across 38,984 genes and 17 organs without requiring downstream fine-tuning. STPath integrates multiple data modalities, including histological images, gene expressions, organ type, and sequencing technology information, within a novel geometry-aware Transformer architecture. Unlike previous methods that directly map WSIs to gene expression, STPath is trained using a masked gene expression prediction objective guided by tailored noise schedules, effectively balancing between capturing gene-gene dependencies and performing high-quality predictions. We evaluate STPath across 6 tasks spanning 23 datasets and 14 biomarkers, including gene expression prediction, spot imputation, spatial clustering, biomarker prediction, gene mutation prediction, and survival prediction. These results demonstrate STPath’s strong ability to infer spatially resolved gene expression and reveal crucial pathological structures within tissue samples, underscoring its promise for scalable ST-based pathology applications.
Related articles
Related articles are currently not available for this article.