+高级检索
基于Swin Transformer的深度有监督哈希图像检索方法
DOI:
作者:
作者单位:

作者简介:

通讯作者:

基金项目:


Deep Supervised Hashing Image Retrieval Method Based on Swin Transformer
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    在深度有监督哈希图像检索的特征提取过程中,一直由卷积神经网络架构主导,但是随着Transformer在视觉领域中的应用,Transformer替代卷积神经网络架构成为可能.为了解决现存基于Transformer的哈希方法中不能生成层次表示和计算复杂度高等问题,提出了一种基于Swin Transformer的深度有监督哈希图像检索方法.该方法以Swin Transformer网络模型为基础,在网络最后添加一个哈希层,为图像进行哈希编码.该模型中引入了局部思想和层级结构,能够有效解决上述问题.与现有的13种先进方法相比,所提方法的哈希检索性能得到大幅提升.在两个常用检索数据集CIFAR-10和NUS-WIDE上进行实验,实验结果表明:在CIFAR-10数据集上所提方法mAP最高达到98.4%,与TransHash方法相比平均提高7.1%,与VTS16-CSQ方法相比平均提高0.57%;在NUS-WIDE数据集上所提方法mAP最高达到93.6%,与TransHash方法相比平均提高18.61%,与VTS16-CSQ方法相比检索精度平均提高8.6%.

    Abstract:

    The feature extraction process in deep supervised Hash image retrieval has been dominated by the convolutional neural network architecture. However, with the application of Transformer in the field of vision, it becomes possible to replace the convolutional neural network architecture with Transformer. In order to address the limitations of existing Transformer-based hashing methods, such as the inability to generate hierarchical representations and high computational complexity, a deep supervised hash image retrieval method based on Swin Transformer is proposed. The proposed method utilizes the Swin Transformer network model, and incorporates a hash layer at the end of the network to generate hash encode for images. By introducing the concepts of locality and hierarchy into the model, the method effectively solve the above problems. Compared with 13 existing state-of-the-art methods, the method proposed in this paper has greatly improved the performance of hash retrieval. Experiments are carried out on two commonly used retrieval datasets, namely CIFAR-10 and NUS-WIDE. The experimental results show that the proposed method achieves the highest mean average precision (mAP) of 98.4% on the CIFAR-10 dataset. This represents an average increase of 7.1% compared with the TransHash method and an average increase of 0.57% compared with the VTS16-CSQ method. On the NUS-WIDE dataset, the proposed method achieves the highest mAP of 93.6%. This corresponds to an average improvement of 18.61% compared with the TransHash method, and an average increase of 8.6% in retrieval accuracy compared with the VTS16-CSQ method.

    参考文献
    相似文献
    引证文献
文章指标
  • PDF下载次数:
  • HTML阅读次数:
  • 摘要点击次数:
  • 引用次数:
引用本文

苗壮 ,赵昕昕 ,李阳 ?,王家宝 ,张睿.基于Swin Transformer的深度有监督哈希图像检索方法[J].湖南大学学报:自然科学版,2023,(8):62~71

复制
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2023-08-29
  • 出版日期:
作者稿件一经被我刊录用,如无特别声明,即视作同意授予我刊论文整体的全部复制传播的权利,包括但不限于复制权、发行权、信息网络传播权、广播权、表演权、翻译权、汇编权、改编权等著作使用权转让给我刊,我刊有权根据工作需要,允许合作的数据库、新媒体平台及其他数字平台进行数字传播和国际传播等。特此声明。
关闭