SDRFPT-Net: A Spectral Dual-Stream Recursive Fusion Network for Multispectral Object Detection
Abstract
Multispectral object detection combining visible and infrared imaging has emerged as a crucial technology for all-day and all-weather surveillance systems. However, effectively integrating complementary information from different spectral domains remains challenging. This paper proposes SDRFPT-Net (Spectral Dual-stream Recursive Fusion Perception Target Network), a novel architecture for multispectral object detection that addresses these challenges through three innovative modules. First, we introduce a Spectral Hierarchical Perception Architecture (SHPA) based on YOLOv10, which employs a dual-stream structure to extract domain-specific features from visible and infrared modalities. Second, a Spectral Recursive Fusion Module (SRFM) facilitates deep cross-modal feature interaction through a hybrid attention mechanism that integrates self-attention, cross-modal attention, and channel attention, coupled with a parameter-efficient recursive progressive fusion strategy. Third, a Spectral Target Perception Enhancement Module (STPEM) improves target region representation and suppresses background interference using lightweight mask prediction. Extensive experiments on the FLIR-aligned and LLVIP datasets demonstrate SDRFPT-Net's superior performance, achieving state-of-the-art results with 0.785 mAP50 and 0.426 mAP50:95 on FLIR-aligned, and 0.963 mAP50 and 0.706 mAP50:95 on LLVIP. Comprehensive ablation studies validate the effectiveness of each proposed component. The findings suggest that SDRFPT-Net offers a promising solution for reliable multispectral object detection in challenging environments, making it valuable for applications in autonomous driving, security surveillance, and remote sensing.
Related articles
Related articles are currently not available for this article.