基于重叠ViT的特征增强遥感图像目标检测

张晴; 王养柱; 邱华鑫; 张小蔓; 吴坤; 李可

doi:10.3969/j.issn.1674-1579.2026.01.011

基于重叠ViT的特征增强遥感图像目标检测

Overlapped ViT-Based Feature-Enhanced Object Detection for Remote Sensing Images

摘要

摘要: 遥感图像场景范围广、目标尺度差异显著、背景复杂多样，且存在大量低对比度的小目标，给精准检测带来巨大挑战. 提出一种基于重叠ViT的特征增强遥感图像目标检测方法（overlapped patches vision transformer detection, OLP-ViTDet）. 该方法在ViT基础上引入重叠分块策略，构建重叠ViT捕捉跨块细粒度特征，通过额外的重叠图像块强化跨块特征关联，解决传统ViT不重叠分块所导致的信息割裂问题；结合简化特征金字塔结构，提高多尺度特征信息提取和融合效率；采用滑动窗口注意力机制，降低计算复杂度的同时，保留全局信息交互能力，有效提升低对比度小目标的检测精度. 在DIOR数据集和NWPU VHR-10数据集上进行对比实验，实验结果表明OLP-ViTDet算法的平均检测精度mAP分别达到78.8%和96.4%，在小目标和复杂结构目标的检测精度上表现出显著优势. 该方法显著提升了遥感图像目标检测精度，为空间任务中的目标识别提供了新思路.

Abstract: Remote sensing images are characterized by wide scene coverage, significant variations in object scales, complex and diverse backgrounds, as well as a large number of low-contrast small objects, which poses significant challenges for accurate detection. To address these issues, this paper proposes an object detection method for remote sensing images based on an overlapped ViT backbone with feature enhancement, named Overlapped Patches Vision Transformer Detection (OLP-ViTDet). Building on the ViT backbone, this method introduces an overlapping patch strategy to construct an overlapped ViT backbone that captures fine-grained features across patches. By incorporating additional overlapping image patches, it strengthens cross-patch feature correlations, thereby resolving the information fragmentation issue caused by non-overlapping patches in traditional ViT backbones. Combined with a simplified feature pyramid structure, it enhances the efficiency of multi-scale feature extraction and fusion. The sliding window attention mechanism is employed to reduce computational complexity while retaining the capability for global information interaction, effectively improving the detection accuracy of low-contrast small objects. Comparative experiments conducted on the DIOR dataset and the NWPU VHR-10 dataset show that the OLP-ViTDet algorithm achieves mean average precision (mAP) values of 78.8% and 96.4%, respectively, demonstrating significant advantages in the detection accuracy of small objects and objects with complex structures. This method substantially improves the accuracy of object detection in remote sensing images and offers new perspectives for object recognition in spatial tasks.

HTML全文

参考文献(20)

施引文献

资源附件(0)