Genotyping TOMM40’523 Poly-T Polymorphisms Using Whole-Genome Sequencing

This article has 0 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

The TOMM40’523 poly-T repeat polymorphism (rs10524523), located in theTOMM40gene and in linkage disequilibrium withAPOE, has been associated with cognitive decline and Alzheimer’s disease (AD) progression. Accurate genotyping of this polymorphism is crucial for understanding its role in neurodegeneration. Challenges in processing whole-genome sequencing (WGS) data traditionally require additional PCR and targeted sequencing assays to genotype these polymorphisms. Here, we introduce a novel computational pipeline that integrates multiple short tandem repeat (STR) detection tools in an ensemble machine learning model usingXGBoost. This approach leverages STR tool predictions, k-mer counts, and related features to enhance poly-T repeat length estimation. Using a sample of 1,202 participants from four cohort studies, we benchmarked our method against PCR-based measures. Our ensemble model outperformed individual STR tools, improving repeat length estimation accuracy (R2= 0.92) and achieving an accuracy rate of 93.2% with PCR-derived genotypes as the gold standard. Additionally, we validated our WGS-derived genotypes by replicating previously reported associations between TOMM40’523 variants and cognitive decline, demonstrating consistency with prior findings. Our results suggest that computational genotyping from WGS data is a scalable and reliable alternative to PCR-based assays, enabling broader investigations ofTOMM40variation in studies where WGS data is available.

Related articles

Related articles are currently not available for this article.