Hicberg: Reconstruction of contact signals from repeated elements
Abstract
In the course of their evolution, genomes can acquire various repeated elements, such as transposons, ribosomal DNA, duplicated genes or tandem repeats. These types of sequences cannot be processed directly by current high-throughput sequencing pipelines, as they generate short reads that cannot be unambiguously localized on reference genomes. We propose an algorithm calledHicbergthat uses statistical inference with the computation of probability distributions to precisely reassign the positions of reads from repeated sequences in different paired omics data, such as Hi-C data. We show that Hicberg can generate new insights into the impact of repeated elements on the spatial organisation of genomes.
Significance Statement
The genomes of microorganisms can contain various types of repeated sequences: duplicated genes, low-complexity sequences and transposons. The question of their potential impact on the spatial organization of genomes is now wide open. We propose Hicberg, an algorithm capable of reconstructing contact signals from repeated elements.
It computes statistical trends on the unambiguous part of the genome and then, by statistical inference, reassigns the position of multi-mapping reads.
The complete chromosome contact maps thus reveal new observations on the impact of repeated elements on chromosome architecture. In particular, they suggest the involvement of certain retrotransposons in the positioning of cohesins, the molecular motors behind chromosome loops.
Classification:Biophysics and Computational Biology section
Related articles
Related articles are currently not available for this article.