Frequency-Blended Diffusion Models for Synthetic Generation of Biologically Realistic Splice Site Sequences

This article has 0 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

We present a frequency-blended diffusion framework for generating biologically realistic splice site sequences. Our approach combines a U-Net-based denoising diffusion probabilistic model with conditional nucleotide frequency priors derived from real donor (5’) and acceptor (3’) splice site sequences inArabidopsis thalianaandHomo sapiens. By guiding the generative process with position-specific empirical base frequencies, the model captures both local sequence motifs and long-range dependencies that are critical for realistic splice site representation. We evaluate the synthetic sequences through direct assessments (sequence logos, GC content, nucleotide conservation) and indirect functional tests using state-of-the-art splice site classifiers (SpliceRover, SpliceFinder, DeepSplicer, Spliceator). Our results show that frequency blending substantially improves motif conservation, compositional fidelity, and model transferability. This work establishes frequency-blended diffusion as a promising strategy for generating high-quality nucleotide sequences for modeling, benchmarking, and data augmentation in genomics research.

Related articles

Related articles are currently not available for this article.