Improving Depth Completion via Depth Feature Upsampling

The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR), 2024.


Yufei Wang1, Ge Zhange2, Shaoqian Wang1, Bo Li1, Qi Liu1, Le Hui1, Yuchao Dai1

1 Northwestern Polytechnical University and Shaanxi Key Laboratory of Information Acquisition and Processing   
2Beijing Institute of Tracking and Telecommunication Technology


Architecture


Abstract


The encoder-decoder network (ED-Net) is a commonly employed choice for existing depth completion methods, but its working mechanism is ambiguous. In this paper, we visualize the internal feature maps to analyze how the network densifies the input sparse depth. We find that the encoder feature of ED-Net focus on the areas with input depth points around. To obtain a dense feature and thus estimate complete depth, the decoder feature tends to complement and enhance the encoder feature by skip-connection to make the fused encoder-decoder feature dense, resulting in the decoder feature also exhibits sparse. However, ED-Net obtains the sparse decoder feature from the dense fused feature at the previous stage, where the ``dense-to-sparse'' process destroys the completeness of features and loses information. To address this issue, we present a depth feature upsampling network (DFU) that explicitly utilizes these dense features to guide the upsampling of a low-resolution (LR) depth feature to a high-resolution (HR) one. The completeness of features is maintained throughout the upsampling process, thus avoiding information loss. Furthermore, we propose a confidence-aware guidance module (CGM), which is confidence-aware and performs guidance with adaptive receptive fields (GARF), to fully exploit the potential of these dense features as guidance. Experimental results show that our DFU, a plug-and-play module, can significantly improve the performance of existing ED-Net based methods with limited computational overheads, and new SOTA results are achieved. Besides, the generalization capability on sparser depth is also enhanced.


Video



Citation


@InProceedings{DFU_CVPR_2024,
  author    = {Wang, Yufei and Zhang, Ge and Wang, Shaoqian and Li, Bo and Liu, Qi and Hui, Le and Dai, Yuchao},
  title     = {Improving Depth Completion via Depth Feature Upsampling},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={21104--21113},
  year      = {2024}
}