International Conference on 3D Vision - Attention meets Geometry: Geometry Guided Spatial-Temporal Attention for Consistent Self-Supervised Monocular Depth Estimation

Attention meets Geometry: Geometry Guided Spatial-Temporal Attention for Consistent Self-Supervised Monocular Depth Estimation
Authors: Patrick Ruhkamp, Daoyi Gao, Hanzhi Chen, Nassir Navab and Benjamin Busam
Abstract: Inferring geometrically consistent dense 3D scenes across a tuple of temporally consecutive images remains challenging for self-supervised monocular depth prediction pipelines. This paper explores how the increasingly popular transformer architecture, together with novel regularized loss formulations, can improve depth consistency while preserving accuracy. We propose a spatial attention module that correlates coarse depth predictions to aggregate local geometric information. A novel temporal attention mechanism further processes the local geometric information in a global context across consecutive images. Additionally, we introduce geometric constraints between frames regularized by photometric cycle consistency. By combining our proposed regularization and the novel spatial-temporal-attention module we fully leverage both the geometric and appearance-based consistency across monocular frames. This yields geometrically meaningful attention and improves temporal depth stability and accuracy compared to previous methods.
PDF (protected)

Paper registration

July 23 30, 2021

Paper submission

July 30, 2021

Supplementary

August 8, 2021

Tutorial submission

August 15, 2021

Tutorial notification

August 31, 2021

Rebuttal period

September 16-22, 2021

Paper notification

October 1, 2021

Camera ready

October 15, 2021

Demo submission

~~July 30~~ Nov 15, 2021

Demo notification

~~Oct 1~~ Nov 19, 2021

Tutorial

November 30, 2021

Main conference

December 1-3, 2021