+高级检索
基于语义辅助和深度时序一致性约束的自监督单目深度估计
作者:

Self-supervised Monocular Depth Estimation Based on Semantic Assistance and Depth Temporal Consistency Constraints
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
    摘要:

    通过使用相邻帧之间的光度一致性损失代替深度标签作为网络训练的监督信号,基于图像序列训练的自监督单目深度估计方法近年来受到了广泛的关注.光度一致性约束遵循了静态世界假设,而单目图像序列中存在的运动目标违反该假设,进而影响自监督训练过程中相机位姿估计精度和光度损失函数的计算精度.通过检测并移除运动目标区域,可在得到与目标运动解耦的相机位姿的同时,消除运动目标区域对光度损失计算精度的影响.为此,本文提出了一种基于语义辅助和深度时序一致性约束的自监督单目深度估计网络.首先,使用离线的实例分割网络检测可能违反静态世界假设的动态类别目标,并移除对应区域输入位姿网络从而得到与物体运动解耦的相机位姿.其次,基于语义一致性和光度一致性约束,检测动态类别目标的运动状态,使得运动区域的光度损失不影响网络参数的迭代更新.最后,在非运动区域施加深度时序一致性约束,显式对齐当前帧的估计深度值与相邻帧的投影深度值,进一步细化深度预测结果.在KITTI、DDAD以及 KITTI Odometry 数据集上的实验验证了所提方法与以往的自监督单目深度估计方法相比具有更出色的性能表现.

    Abstract:

    Self-supervised monocular depth estimation methods trained on sequences of monocular images have received considerable attention in recent years by using the photometric consistency loss between adjacent frames instead of depth labels as the supervisory signal for network training. The photometric consistency constraint follows the static world assumption, but the moving objects in the monocular image sequence violate this assumption, which affects the camera pose estimation accuracy and the calculation accuracy of the photometric loss function during the self-supervised training process. By detecting and removing the moving target area, the camera pose decoupled from the target motion can be obtained, and the in?uence of the moving target area on the calculation accuracy of the photometric loss can be discarded. To this end, this paper proposes a self-supervised monocular depth estimation network based on semantic assistance and depth temporal consistency constraints. First, an offline instance segmentation network is used to detect dynamic category objects that may violate the static world assumption, and the corresponding region input pose network is removed to obtain a camera pose decoupled from object motion. Secondly, based on semantic consistency and photometric consistency constraints, the motion status of dynamic category targets is detected so that the photometric loss in the moving area does not affect the iterative update of network parameters.Finally, depth temporal consistency constraints are imposed in non-motion areas, and the estimated depth value of the current frame is explicitly aligned with the projected depth value of adjacent frames to further refine the depth prediction results. Experiments on the KITTI, DDAD and KITTI Odometry datasets verify that the proposed method has better performance than previous self-supervised monocular depth estimation methods.

    参考文献
    相似文献
    引证文献
文章指标
  • PDF下载次数:
  • HTML阅读次数:
  • 摘要点击次数:
  • 引用次数:
引用本文

凌传武 ,陈华 ,徐大勇 ,张小刚 ?.基于语义辅助和深度时序一致性约束的自监督单目深度估计[J].湖南大学学报:自然科学版,2024,(8):1~12

复制
历史
  • 在线发布日期: 2024-08-26
作者稿件一经被我刊录用,如无特别声明,即视作同意授予我刊论文整体的全部复制传播的权利,包括但不限于复制权、发行权、信息网络传播权、广播权、表演权、翻译权、汇编权、改编权等著作使用权转让给我刊,我刊有权根据工作需要,允许合作的数据库、新媒体平台及其他数字平台进行数字传播和国际传播等。特此声明。
关闭