ANOMALY: A Snakemake pipeline for identifying NuMTs from Long-Read Sequencing Data
Abstract
Motivation
Nuclear mitochondrial DNA segments (NuMT) can significantly affect cellular processes, including cancer development and disease progression. Current methods to call NuMTs rely on short-read sequencing data but struggle to resolve complex NuMTs. These limitations can be overcome by employing long-read sequencing data. However, no such workflow exists to capture NuMTs from long-read sequencing data.
Results
Here, we introduce ANOMALY, a novel, easy-to-use workflow for detecting NuMTs from long-read sequencing data. The pipeline takes raw sequencing data or aligned data and calls and visualizes sample NuMTs. On 50 simulated datasets, the pipeline demonstrated high accuracy, with a precision of 1.000, a recall of 0.989, and an F1-score of 0.994. The pipeline underscores the limitations of short-read data in resolving and capturing complex NuMTs while demonstrating that long-read data enables their accurate identification.
Availability and Implementation
The Snakemake pipeline employs Python, Bash and R and is published under an open-source GNU GPL v3 license. Detailed information about setting up and running the pipeline and the source code can be accessed at<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/Nirmal2310/ANOMALY">https://github.com/Nirmal2310/ANOMALY</ext-link>.
Related articles
Related articles are currently not available for this article.