ZHANG Q,WANG Y Z,QIU H X,et al. Overlapped ViT-based feature-enhanced object detection for remote sensing imagesJ. Aerospace Control and Application,2026,52(1):111 − 120(in Chinese). DOI: 10.3969/j.issn.1674-1579.2026.01.011
Citation: ZHANG Q,WANG Y Z,QIU H X,et al. Overlapped ViT-based feature-enhanced object detection for remote sensing imagesJ. Aerospace Control and Application,2026,52(1):111 − 120(in Chinese). DOI: 10.3969/j.issn.1674-1579.2026.01.011

Overlapped ViT-Based Feature-Enhanced Object Detection for Remote Sensing Images

  • Remote sensing images are characterized by wide scene coverage, significant variations in object scales, complex and diverse backgrounds, as well as a large number of low-contrast small objects, which poses significant challenges for accurate detection. To address these issues, this paper proposes an object detection method for remote sensing images based on an overlapped ViT backbone with feature enhancement, named Overlapped Patches Vision Transformer Detection (OLP-ViTDet). Building on the ViT backbone, this method introduces an overlapping patch strategy to construct an overlapped ViT backbone that captures fine-grained features across patches. By incorporating additional overlapping image patches, it strengthens cross-patch feature correlations, thereby resolving the information fragmentation issue caused by non-overlapping patches in traditional ViT backbones. Combined with a simplified feature pyramid structure, it enhances the efficiency of multi-scale feature extraction and fusion. The sliding window attention mechanism is employed to reduce computational complexity while retaining the capability for global information interaction, effectively improving the detection accuracy of low-contrast small objects. Comparative experiments conducted on the DIOR dataset and the NWPU VHR-10 dataset show that the OLP-ViTDet algorithm achieves mean average precision (mAP) values of 78.8% and 96.4%, respectively, demonstrating significant advantages in the detection accuracy of small objects and objects with complex structures. This method substantially improves the accuracy of object detection in remote sensing images and offers new perspectives for object recognition in spatial tasks.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return