Using synthetic RNA to benchmark poly(A) length inference from direct RNA sequencing
Abstract
Polyadenylation is a dynamic process which is important in cellular physiology. Oxford Nanopore Technologies direct RNA-sequencing provides a strategy for sequencing the full-length RNA molecule and analysis of the transcriptome and epi-transcriptome. There are currently several tools available for poly(A) tail-length estimation, including well-established tools such astailfindrandnanopolish, as well as two more recent deep learning models:DoradoandBoostNano. However, there has been limited benchmarking of the accuracy of these tools against gold-standard datasets. In this paper we evaluate four poly(A) estimation tools using synthetic RNA standards (Sequins), which have known poly(A) tail-lengths and provide a valuable approach to measuring the accuracy of poly(A) tail-length estimation. All four tools generate mean tail-length estimates which lie within 12% of the correct value. Overall,Doradois recommended as the preferred approach due to its relatively fast run times, low coefficient of variation and ease of use with integration with base-calling.
Related articles
Related articles are currently not available for this article.