ANOMALY: A Snakemake pipeline for identifying NuMTs from Long-Read Sequencing Data

Nirmal Singh Mahar
Rachit Singh
Ishaan Gupta
Shweta Ramdas

0 evaluations Published on Apr 15, 2025

This article on Sciety

Abstract

Motivation

Nuclear mitochondrial DNA segments (NuMT) can significantly affect cellular processes, including cancer development and disease progression. Current methods to call NuMTs rely on short-read sequencing data but struggle to resolve complex NuMTs. These limitations can be overcome by employing long-read sequencing data. However, no such workflow exists to capture NuMTs from long-read sequencing data.

Results

Here, we introduce ANOMALY, a novel, easy-to-use workflow for detecting NuMTs from long-read sequencing data. The pipeline takes raw sequencing data or aligned data and calls and visualizes sample NuMTs. On 50 simulated datasets, the pipeline demonstrated high accuracy, with a precision of 1.000, a recall of 0.989, and an F1-score of 0.994. The pipeline underscores the limitations of short-read data in resolving and capturing complex NuMTs while demonstrating that long-read data enables their accurate identification.

Availability and Implementation

The Snakemake pipeline employs Python, Bash and R and is published under an open-source GNU GPL v3 license. Detailed information about setting up and running the pipeline and the source code can be accessed at<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/Nirmal2310/ANOMALY">https://github.com/Nirmal2310/ANOMALY</ext-link>.

Related articles are currently not available for this article.