Rapidly and reproducibly building a comprehensive catalogue of resistance-associated variants for M. tuberculosis
Abstract
Background
Catalogues of genetic variants associated with resistance underpin whole-genome sequencing (WGS)-based predictions of drug susceptibility in Mycobacterium tuberculosis, and are essential for molecular diagnostics and surveillance. The current gold standard catalogues released by the WHO represent substantial progress in using standardised data and methods to associate phenotypes to genotypes, but they remain opaque. The underlying data are not fully released and the catalogues are difficult to interpret. Open and reproducible methods would help address these problems, extending the important work already done.
Methods
We have developed an automated method, <monospace>catomatic</monospace>, that uses a binomial test to associate informative isolates with resistance or susceptibility, and built a catalogue (<monospace>catomatic-1</monospace>) from the same 39,358 samples used to construct the first WHO catalogue (<monospace>WHOv1</monospace>). We performed sensitivity analysis to optimise statistical and bioinformatic parameters for each drug, and benchmarked <monospace>catomatic-1</monospace> against <monospace>WHOv1</monospace> using an independent set of 14,380 isolates.
Findings
By using simpler statistics, <monospace>catomatic-1</monospace> algorithmically classified 1,329 variants, ranging from 5 for linezolid to 440 for pyrazinamide. <monospace>WHOv1</monospace> included generalisable rules added by a panel of experts, which increase the predictive coverage of <monospace>WHOv1</monospace>, but at the cost of reproducibility. Despite excluding rules, <monospace>catomatic-1</monospace> achieves comparable performance for all drugs, with sensitivities for first-line agents above 88% on the independent test set. The automated process allowed us to efficiently explore the parameter space; for instance, detecting resistant variants with low read support improved the sensitivity for all drugs.
Interpretation
Accurate resistance catalogues can be built automatically using transparent and reproducible statistical methods. As more data are collected, catalogue content and performance will evolve, highlighting the need for proper version control, machine/human readability, and open access. This approach provides a foundation for real-time surveillance, diagnostics, and flexible application to diverse use cases in drug-resistant tuberculosis.
Related articles
Related articles are currently not available for this article.