Data-driven discovery of core sleep biomarkers for predicting early cardiometabolic risk in a healthy population using machine learning
Abstract
Background
Identifying robust biomarkers for future cardiometabolic risk within the crucial “ preventive window” in healthy individuals remains a major challenge. While numerous sleep metrics are linked to health, their hierarchical importance is unknown. This study aimed to leverage a data-driven machine learning paradigm to move beyond conventional metrics and objectively identify the core sleep-related physiological drivers for predicting the transition to early-stage cardiometabolic risk.
Methods
We conducted a longitudinal analysis on 447 initially healthy participants from the Sleep Heart Health Study (SHHS). A LASSO (L1-regularized) logistic regression model was trained on 16 high-quality clinical and polysomnographic features to perform data-driven biomarker selection, following a rigorous data quality audit where high-missingness variables (e.g., heart rate variability) were excluded. The performance of the final models was rigorously evaluated using 10-repeats of 10-fold cross-validation and compared using paired t-tests.
Findings
LASSO regression identified a parsimonious set of six core predictors. Notably, respiratory disturbance index (RDI) and minimum nocturnal oxygen saturation (min_spo2) emerged as the key biomarkers, superseding traditional sleep fragmentation metrics like the arousal index. In the primary cross-validation analysis, the lean LASSO model demonstrated the strongest predictive performance (mean AUC = 0.698), statistically outperforming a complex model with all 16 features (mean AUC = 0.669, p<0.0001). This superiority and robustness were maintained in high-risk subgroups.
Interpretation
Our data-driven approach reveals that physiological stress directly linked to sleep-disordered breathing and nocturnal hypoxemia, rather than general sleep fragmentation, are the primary drivers of the transition towards early cardiometabolic risk in healthy individuals. This finding provides specific, translatable targets for precision preventive medicine, points towards novel mechanisms for early risk development, and offers a blueprint for developing next-generation screening tools, potentially integrated into wearable technology.
Related articles
Related articles are currently not available for this article.