基于语义辅助和深度时序一致性约束的自监督单目深度估计

首页 > 过刊浏览>2024年第卷第8期 >1-12

基于语义辅助和深度时序一致性约束的自监督单目深度估计
DOI:
                        
                    
作者:
                        凌传武 1，陈华 2，徐大勇 3，张小刚 1†凌传武 1，陈华 2，徐大勇 3，张小刚 1†
（1.湖南大学 电气与信息工程学院，湖南 长沙 410082；2.湖南大学 信息科学与工程学院，湖南 长沙 410082；3.中国烟草总公司郑州烟草研究院，河南 郑州 450000）
在知网中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
基金项目:

Self-supervised Monocular Depth Estimation Based on Semantic Assistance and Depth Temporal Consistency Constraints

Author:

LING Chuanwu1，CHEN Hua2，XU Dayong3，ZHANG Xiaogang1†
LING Chuanwu1，CHEN Hua2，XU Dayong3，ZHANG Xiaogang1†
（1.College of Electrical and Information Engineering， Hunan University， Changsha 410082， China； 2.College of Computer Science and Electronic Engineering， Hunan University， Changsha 410082， China； 3.Zhengzhou Tobacco Research Institute of CNTC， Zhengzhou 450000， China）
在知网中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

摘要:

通过使用相邻帧之间的光度一致性损失代替深度标签作为网络训练的监督信号，基于图像序列训练的自监督单目深度估计方法近年来受到了广泛的关注.光度一致性约束遵循了静态世界假设，而单目图像序列中存在的运动目标违反该假设，进而影响自监督训练过程中相机位姿估计精度和光度损失函数的计算精度.通过检测并移除运动目标区域，可在得到与目标运动解耦的相机位姿的同时，消除运动目标区域对光度损失计算精度的影响.为此，本文提出了一种基于语义辅助和深度时序一致性约束的自监督单目深度估计网络.首先，使用离线的实例分割网络检测可能违反静态世界假设的动态类别目标，并移除对应区域输入位姿网络从而得到与物体运动解耦的相机位姿.其次，基于语义一致性和光度一致性约束，检测动态类别目标的运动状态，使得运动区域的光度损失不影响网络参数的迭代更新.最后，在非运动区域施加深度时序一致性约束，显式对齐当前帧的估计深度值与相邻帧的投影深度值，进一步细化深度预测结果.在KITTI、DDAD以及 KITTI Odometry 数据集上的实验验证了所提方法与以往的自监督单目深度估计方法相比具有更出色的性能表现.

关键词:单目深度估计;自监督学习;运动目标;时序一致性

Abstract:

Self-supervised monocular depth estimation methods trained on sequences of monocular images have received considerable attention in recent years by using the photometric consistency loss between adjacent frames instead of depth labels as the supervisory signal for network training. The photometric consistency constraint follows the static world assumption， but the moving objects in the monocular image sequence violate this assumption， which affects the camera pose estimation accuracy and the calculation accuracy of the photometric loss function during the self-supervised training process. By detecting and removing the moving target area， the camera pose decoupled from the target motion can be obtained， and the in?uence of the moving target area on the calculation accuracy of the photometric loss can be discarded. To this end， this paper proposes a self-supervised monocular depth estimation network based on semantic assistance and depth temporal consistency constraints. First， an offline instance segmentation network is used to detect dynamic category objects that may violate the static world assumption， and the corresponding region input pose network is removed to obtain a camera pose decoupled from object motion. Secondly， based on semantic consistency and photometric consistency constraints， the motion status of dynamic category targets is detected so that the photometric loss in the moving area does not affect the iterative update of network parameters.Finally， depth temporal consistency constraints are imposed in non-motion areas， and the estimated depth value of the current frame is explicitly aligned with the projected depth value of adjacent frames to further refine the depth prediction results. Experiments on the KITTI， DDAD and KITTI Odometry datasets verify that the proposed method has better performance than previous self-supervised monocular depth estimation methods.

Key words:monocular depth estimation;self supervision learning;moving object;temporal consistency

文章指标

PDF下载次数:
HTML阅读次数:
摘要点击次数:
引用次数:

引用本文

凌传武 ,陈华 ,徐大勇 ,张小刚 ?.基于语义辅助和深度时序一致性约束的自监督单目深度估计[J].湖南大学学报：自然科学版,2024,(8):1~12

复制

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2024-08-26
出版日期:

首页

期刊简介

编委会

作者中心

下载中心

学术道德

常见问题

版权声明

联系我们

English

文章指标

引用本文

历史