Title: From Clinical Risk to Biological Aging - A Modular AI Framework for Traceable CpG Biomarker Prioritization
Abstract
Background Artificial intelligence keeps finding new uses in epigenetics, especially when it comes to studying DNA methylation and how it ties into aging and disease. Lately, predictive models can spot CpG sites that connect to clinical outcomes, but they’re often a black box. You get results, but you can’t always see why these markers matter biologically, which makes it tough to reproduce the findings or turn them into real biomedical insights. That’s what pushed us to build a modular framework. The idea is: link CpG biomarker selection directly to biological annotation, so you’re not just predicting risk, you’re actually tying it back to aging biology. Methods We analyzed DNA methylation data from a Canadian cohort comprising 92 samples with over 485,000 CpG sites. Preprocessing included normalization, variance filtering, and log-rank survival tests to select informative CpGs. Survival modeling was performed using Random Survival Forests and Cox proportional hazards models, with ensemble learners to enhance robustness. SHAP feature attribution was applied to identify influential CpGs. Biological annotation was integrated using enhancer and motif databases, and Sankey diagrams were used to visualize links between CpGs, regulatory elements, genes, and pathways. Results Our framework nailed high concordance indices across training, test, and cross-validation sets. So, the predictions held up. SHAP analysis pointed out specific CpG spots that kept showing up as important for survival outcomes. Then, when we dug into enhancer and motif annotations, those same sites tied back to genes like IL7R and CDKN2A, which play big roles in immune aging and cell cycle control. The visualizations made everything clearer. You could actually follow the chain from a CpG site, through its methylation changes, all the way to the biological pathways it affects. Conclusions We’re offering a modular, understandable AI framework for CpG biomarker prioritization. It blends strong statistical modeling with biological insight. This makes results easier to reproduce and gives them more meaning which helps carry methylation research into the heart of aging studies. By connecting clinical predictions with gene regulatory networks, we’re opening up a more transparent road to biomarker discovery, which could mean better tools for personalized medicine and a deeper understanding of how aging works across the whole body.
Related articles
Related articles are currently not available for this article.