A Traffic Classification Method Based on Multimodal Deep Learning
Abstract
To address the inconsistency between network traffic classification performance in controlled experiments and its generalizability to real-world scenarios, this study introduces a multimodal deep learning framework for traffic classification. Traditional single-modality approaches often suffer from limited adaptability when confronted with heterogeneous, encrypted, or obfuscated traffic patterns. In contrast, our proposed method leverages the complementary nature of multiple data modalities-such as statistical features, time-series flows, and packet-level payload representations-to learn a more robust and discriminative traffic representation. By eliminating redundant features and aligning cross-modal information, the model captures richer semantic and temporal dynamics of network behavior. Specifically, convolutional neural networks (CNNs) are used to extract spatial features from individual modalities, while long short-term memory (LSTM) networks are employed to model temporal dependencies and cross-modal interactions. This dual-pathway architecture enables the system to learn both intra-modal patterns and inter-modal correlations, resulting in a more holistic understanding of traffic characteristics. Experimental evaluations demonstrate that the proposed multimodal model significantly outperforms baseline single-modality methods, particularly in environments with dynamic traffic types, varying encryption levels, and high background noise. The framework thus provides a scalable and effective solution for real-time network monitoring and intelligent intrusion detection in complex and evolving network infrastructures.
Related articles
Related articles are currently not available for this article.