Neural Deformable Voxel Grid for
Fast Optimization of Dynamic View Synthesis

Asian Conference on Computer Vision (ACCV)


Xiang Guo1,3,*, Guanying Chen2,*, Yuchao Dai1, Xiaoqing Ye3, Jiadai Sun1, Xiao Tan3, Errui Ding3,

1Northwestern Polytechnical University    2The Chinese University of Hong Kong, Shenzhen    3Baidu Inc.   
* denotes equal contribution

Overview Video



Abstract


Recently, Neural Radiance Fields (NeRF) is revolutionizing the task of novel view synthesis (NVS) for its superior performance. However, NeRF and its variants generally require a lengthy per-scene training procedure, where a multi-layer perceptron (MLP) is fitted to the captured images. To remedy the challenge, the voxel-grid representation has been proposed to significantly speed up the training. However, these existing methods can only deal with static scenes. How to develop an efficient and accurate dynamic view synthesis method remains an open problem. Extending the methods for static scenes to dynamic scenes is not straightforward as both the scene geometry and appearance change over time. In this paper, built on top of the recent advances in voxel- grid optimization, we propose a fast deformable radiance field method to handle dynamic scenes. Our method consists of two modules. The first module adopts a deformation grid to store 3D dynamic features, and a light-weight MLP for decoding the deformation that maps a 3D point in observation space to the canonical space using the interpolated features. The second module contains a density and a color grid to model the geometry and density of the scene. The occlusion is explicitly modeled to further improve the rendering quality. Experimental results show that our method achieves comparable performance to D-NeRF using only 20 minutes for training, which is more than 70× faster than D-NeRF, clearly demonstrating the efficiency of our proposed method.


Contribution


  • We propose a fast deformable radiance field method based on the voxel- grid representation to enable space-time view synthesis for dynamic scenes. To the best of our knowledge, this is the first method that integrates the voxel-grid optimization with deformable radiance field.
  • We introduce a deformation grid to store the 3D dynamic features and adopt a light-weight MLP to decode the feature to deformation. Our method explicitly models occlusion to improve the results.
  • Our method produces rendering results comparable to D-NeRF within only 20 minutes, which is more than 70× faster.

Fast Optimization of Dynamic NeRF


teaser

Neural Deformable Voxel Grid (NDVG) for fast optimization of dynamic view synthesis. Left side of the figure shows that our method achieves a super fast convergence in 20 minutes, which is 70× faster than the D-NeRF method. Right side of the figure visualizes the results after training with 1, 5 and 20 minutes


Overview of the Method


overview

Overview of our proposed method. Our method consists of (a) a defor- mation module to model the motion of the space points and (b) a canonical module to model the radiance field of the static scene at the canonical time. To render a ray shooting from the camera center, we compute the deformation of all sampled points and transform the sampled points to the canonical space, where the density and color are computed. The pixel color can then be rendered by (c) volume rendering


Visual Results of Learned Geometry


timeline

Learned Geometry. We show examples of geometries learned by our model. For each, we show rendered images and corresponding disparity under two novel views and six time step.

Visual Results Compared with D-NeRF


ablation study

Qualitative Comparison. Synthesized images on test set of the dataset. For each scene, we show an image rendered at novel view, and followed by zoom in of ground truth, our NDVG, and D-NeRF

Visual Results of Esitmated Occlusion


ablation study

Occlusion Estimation Visualization. We visualize the estimated occlusion points at different times in blue color. The first row shows the images at each time step which gives an insight of the motion. We warp the canonical grids into corresponding time step by the deformation estimated by deformation module. We visualize points with their rgb colors whose density is over 0.8.

Citation


@InProceedings{Guo_2022_NDVG_ACCV,
  title     = {Neural Deformable Voxel Grid for Fast Optimization of Dynamic View Synthesis},
  author    = {Guo, Xiang and Chen, Guanying and Dai, Yuchao and Ye, Xiaoqing and Sun, Jiadai and Tan, Xiao and Ding, Errui},
  booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)},
  year      = {2022}
}