Neural Deformable Voxel Grid for Fast Optimization of Dynamic View Synthesis

Neural Deformable Voxel Grid for
Fast Optimization of Dynamic View Synthesis

Asian Conference on Computer Vision (ACCV)

Xiang Guo^1,3,, Guanying Chen^2,, Yuchao Dai¹, Xiaoqing Ye³, Jiadai Sun¹, Xiao Tan³, Errui Ding³,

¹Northwestern Polytechnical University ²The Chinese University of Hong Kong, Shenzhen ³Baidu Inc.
^* denotes equal contribution

Abstract

Recently, Neural Radiance Fields (NeRF) is revolutionizing the task of novel view synthesis (NVS) for its superior performance. However, NeRF and its variants generally require a lengthy per-scene training procedure, where a multi-layer perceptron (MLP) is fitted to the captured images. To remedy the challenge, the voxel-grid representation has been proposed to significantly speed up the training. However, these existing methods can only deal with static scenes. How to develop an efficient and accurate dynamic view synthesis method remains an open problem. Extending the methods for static scenes to dynamic scenes is not straightforward as both the scene geometry and appearance change over time. In this paper, built on top of the recent advances in voxel- grid optimization, we propose a fast deformable radiance field method to handle dynamic scenes. Our method consists of two modules. The first module adopts a deformation grid to store 3D dynamic features, and a light-weight MLP for decoding the deformation that maps a 3D point in observation space to the canonical space using the interpolated features. The second module contains a density and a color grid to model the geometry and density of the scene. The occlusion is explicitly modeled to further improve the rendering quality. Experimental results show that our method achieves comparable performance to D-NeRF using only 20 minutes for training, which is more than 70× faster than D-NeRF, clearly demonstrating the efficiency of our proposed method.

Contribution

We propose a fast deformable radiance field method based on the voxel- grid representation to enable space-time view synthesis for dynamic scenes. To the best of our knowledge, this is the first method that integrates the voxel-grid optimization with deformable radiance field.
We introduce a deformation grid to store the 3D dynamic features and adopt a light-weight MLP to decode the feature to deformation. Our method explicitly models occlusion to improve the results.
Our method produces rendering results comparable to D-NeRF within only 20 minutes, which is more than 70× faster.

Fast Optimization of Dynamic NeRF

Neural Deformable Voxel Grid (NDVG) for fast optimization of dynamic view synthesis. Left side of the figure shows that our method achieves a super fast convergence in 20 minutes, which is 70× faster than the D-NeRF method. Right side of the figure visualizes the results after training with 1, 5 and 20 minutes

Overview of the Method

Overview of our proposed method. Our method consists of (a) a defor- mation module to model the motion of the space points and (b) a canonical module to model the radiance field of the static scene at the canonical time. To render a ray shooting from the camera center, we compute the deformation of all sampled points and transform the sampled points to the canonical space, where the density and color are computed. The pixel color can then be rendered by (c) volume rendering

Visual Results of Esitmated Occlusion

Occlusion Estimation Visualization. We visualize the estimated occlusion points at different times in blue color. The first row shows the images at each time step which gives an insight of the motion. We warp the canonical grids into corresponding time step by the deformation estimated by deformation module. We visualize points with their rgb colors whose density is over 0.8.