Human whole epigenome modelling for clinical applications with Pleiades

This article has 0 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

Gene regulation in humans extends beyond the four letter genetic code. Cytosine methylation, in particular, functions as a critical epigenetic switchboard, dynamically programming cellular identity, adapting gene expression in response to environmental cues, and underpinning the onset and progression of numerous diseases. Here we present Pleiades, a series of whole-genome epigenetic foundation models spanning three sizes: 90M, 600M, and 7B parameters. Pleiades is trained upon an extensive proprietary dataset of methylated and unmethylated human DNA sequences totalling 1.9T tokens. We introduce alignment embeddings and stacked hierarchical attention techniques to provide precise epigenetic modelling without the need for extended context lengths. Collectively, these advances enable Pleiades to perform a diverse range of downstream biological and clinical tasks, including nucleotide-level regulatory prediction, realistic generation of cell-free DNA fragments and fragment-level celltype-of-origin classification, within a unified and scalable computational framework. We specifically apply Pleiades to the early detection of real-world cohorts of clinical Alzheimer’s disease and Parkinson’s disease, achieving high-accuracy. We integrate Pleiades with leading protein biomarkers, achieving state-of-the-art results, underscoring the complementary value of epigenomic and proteomic multi-modal approaches. By advancing beyond the modelling of pure DNA sequences and relying on limited genomic regions, Pleiades establishes genome-wide epigenomic modelling as a new paradigm for clinical diagnostics, synthetic biology, and precision medicine.

Related articles

Related articles are currently not available for this article.