Restoring data balance via generative models of T cell receptors for antigen-binding prediction

This article has 5 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

Unveiling specificity in T cell recognition of antigens represents a major step to understand the immune system response. Many supervised machine learning approaches have been designed to build sequence-based predictive models of such specificity using binding and non-binding receptor-antigen data. Due to the scarcity of known specific T cell receptors for each antigen compared to the abundance of non-specific ones, available datasets are heavily imbalanced and make the goal of achieving solid predictive performances very challenging. Here, we propose to restore data balance through data augmentation using generative unsupervised models. We then use these augmented data to train supervised models for prediction of peptide-specific T cell receptors, or binding pairs of peptide and T cell receptor sequences. We show that our pipeline yields increased performance in prediction tasks of T cell receptors specificity. More broadly, our pipeline provides a general framework that could be used to restore balance in other computational problems involving biological sequence data.

Related articles

Related articles are currently not available for this article.