Evaluating a SNP calling pipeline for Mycobacterium leprae
Abstract
Mycobacterium leprae is the causative agent of leprosy, a persisting disease characterised by skin lesions and peripheral nerve numbness. Attempts to sequence the genome have been challenged by its inability to be cultured on artificial media. Advances in DNA extraction and sequencing technologies have enabled molecular epidemiology to be introduced for M. leprae. However, a validated SNP calling pipeline does not currently exist. A ground truth of pairwise SNP differences was first created using Minimap2 and three complete genomes of M. leprae. Using simulated reads from each of these genomes and the others as a reference, we evaluated the precision and recall of short-read SNP calling using Snippy. Approximately 80% of SNPs were called with false positives only due to ambiguous bases in certain genomes. Repeat region masking was found to be unnecessary for M. leprae SNP calling, unlike for M. tuberculosis. We find that SNP calling from short reads is robust and highly accurate from M. leprae, showing promise as a tool for molecular epidemiology studies to increase case detection and inform leprosy control strategies.
Related articles
Related articles are currently not available for this article.