+高级检索
基于多模态学习的高动态范围图像色调映射
DOI:
作者:
作者单位:

作者简介:

通讯作者:

基金项目:


High Dynamic Range Image Tone Mapping Based on Multimodal Learning
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    针对现有色调映射技术在实际应用中所面临的若干关键挑战,诸如映射结果的稳定性不足、难以兼顾图像的自然美感以及对复杂光照环境与多元场景类型的适应性有限等问题,本文提出了一种基于多模态学习的色调映射方法,旨在通过共享文本与图像的语义空间,获取跨模态监督信息,以期实现更为精准、自然且具有普适性的色调映射.借助对比文本图像大模型的文本图像匹配信息辅助模型进行无监督训练,可以有效抑制欠曝区域和过曝区域的产生,避免了生成对抗方法和对比学习中存在的训练不稳定和训练复杂等问题.实验表明所提出的色调映射方法在多个开放基准数据集上都展现出较好的效果.与现有主流色调映射算法相比,该方法在保持图像整体光照氛围的同时,能更有效地抑制过曝区域,提升欠曝区域,保留丰富的色彩细节,增强视觉层次感,且对各类光照条件及场景类型的适应性更强.同时,本文的研究工作也证实了多模态学习在底层视觉任务中的巨大潜力.

    Abstract:

    In response to several key challenges faced by the existing tone mapping techniques in practical applications, such as insufficient stability of mapping results, difficulty in balancing the natural aesthetics of images, and limited adaptability to complex lighting environments and diverse scene types, this paper proposes a tone mapping method based on multimodal learning. The goal is to acquire cross-modal supervisory information through the shared semantic space of text and images, aiming to achieve more accurate, natural, and universally applicable tone mapping. By leveraging the text-image matching information from large text-image models to assist in unsupervised training, the method effectively suppresses the occurrence of underexposed and overexposed areas, avoiding the training instability and complexity issues present in generative adversarial methods and contrastive learning. Experiments demonstrate that the proposed tone mapping method displays superior performance across multiple open benchmark datasets. Compared with the existing mainstream tone mapping algorithms, this method not only maintains the overall lighting atmosphere of images but also more effectively suppresses overexposed areas, enhances underexposed areas, retains rich color details, and enhances visual hierarchy, with stronger adaptability to various lighting conditions and scene types. Moreover, this work also confirms the significant potential of multimodal learning in foundational vision tasks.

    参考文献
    相似文献
    引证文献
文章指标
  • PDF下载次数:
  • HTML阅读次数:
  • 摘要点击次数:
  • 引用次数:
引用本文

岳焕景 ,何长安 ,杨敬钰 ?.基于多模态学习的高动态范围图像色调映射[J].湖南大学学报:自然科学版,2025,52(8):14~22

复制
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-08-29
  • 出版日期:
作者稿件一经被我刊录用,如无特别声明,即视作同意授予我刊论文整体的全部复制传播的权利,包括但不限于复制权、发行权、信息网络传播权、广播权、表演权、翻译权、汇编权、改编权等著作使用权转让给我刊,我刊有权根据工作需要,允许合作的数据库、新媒体平台及其他数字平台进行数字传播和国际传播等。特此声明。
关闭