AllerStack: Predicting Allergenic Proteins with a Stacked Ensemble Approach
Abstract
Accurate prediction of protein allergenicity is essential for ensuring food and drug safety. While machine learning and deep learning models have been explored for this task, limitations remain in dataset scale, feature representation, and model architecture. Here, we introduce AllerStack, a two-stage stacked ensemble model integrating handcrafted and ESM2-based learned features for allergenicity classification. The model was developed using a balanced dataset comprising 11,930 allergenic and 11,930 non-allergenic proteins. We extracted amino acid composition, dipeptide composition, and physicochemical features using the Biopython library, alongside contextual embeddings from the pre-trained ESM2 protein language model. Diverse classifiers (QDA, SVM, KNN, and ANN) were trained separately on these features in the base layer. Their predictions were used as input to a meta-classifier based on XGBoost. AllerStack achieved high predictive performance with 96.87% accuracy, 96.86% F1-score, 93.75% Matthews correlation coefficient (MCC), and an AUC of 0.99. A publicly accessible web server (<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://cosylab.iiitd.edu.in/allerstack/">https://cosylab.iiitd.edu.in/allerstack/</ext-link>) enables real-time allergenicity prediction from protein sequences. AllerStack provides a robust, interpretable, and user-friendly platform for allergen detection in computational biology.
Highlights
An extensive dataset of 23,860 proteins.
Combines handcrafted and ESM2-derived features.
Stacked ensemble model with an accuracy of 96.87%.
SHAP-based model interpretability at the model and feature level.
Public web server, AllerStack (<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://cosylab.iiitd.edu.in/allerstack/">https://cosylab.iiitd.edu.in/allerstack/</ext-link>).
Related articles
Related articles are currently not available for this article.