1Northwestern Polytechnical University
2The Chinese University of Hong Kong, Shenzhen
3Baidu Inc.
* denotes equal contribution
Recently, Neural Radiance Fields (NeRF) is revolutionizing the task of novel view synthesis (NVS) for its superior performance. However, NeRF and its variants generally require a lengthy per-scene training procedure, where a multi-layer perceptron (MLP) is fitted to the captured images. To remedy the challenge, the voxel-grid representation has been proposed to significantly speed up the training. However, these existing methods can only deal with static scenes. How to develop an efficient and accurate dynamic view synthesis method remains an open problem. Extending the methods for static scenes to dynamic scenes is not straightforward as both the scene geometry and appearance change over time. In this paper, built on top of the recent advances in voxel- grid optimization, we propose a fast deformable radiance field method to handle dynamic scenes. Our method consists of two modules. The first module adopts a deformation grid to store 3D dynamic features, and a light-weight MLP for decoding the deformation that maps a 3D point in observation space to the canonical space using the interpolated features. The second module contains a density and a color grid to model the geometry and density of the scene. The occlusion is explicitly modeled to further improve the rendering quality. Experimental results show that our method achieves comparable performance to D-NeRF using only 20 minutes for training, which is more than 70× faster than D-NeRF, clearly demonstrating the efficiency of our proposed method.
Neural Deformable Voxel Grid (NDVG) for fast optimization of dynamic view synthesis. Left side of the figure shows that our method achieves a super fast convergence in 20 minutes, which is 70× faster than the D-NeRF method. Right side of the figure visualizes the results after training with 1, 5 and 20 minutes
Overview of our proposed method. Our method consists of (a) a defor- mation module to model the motion of the space points and (b) a canonical module to model the radiance field of the static scene at the canonical time. To render a ray shooting from the camera center, we compute the deformation of all sampled points and transform the sampled points to the canonical space, where the density and color are computed. The pixel color can then be rendered by (c) volume rendering
Learned Geometry. We show examples of geometries learned by our model. For each, we show rendered images and corresponding disparity under two novel views and six time step.
Qualitative Comparison. Synthesized images on test set of the dataset. For each scene, we show an image rendered at novel view, and followed by zoom in of ground truth, our NDVG, and D-NeRF
Occlusion Estimation Visualization. We visualize the estimated occlusion points at different times in blue color. The first row shows the images at each time step which gives an insight of the motion. We warp the canonical grids into corresponding time step by the deformation estimated by deformation module. We visualize points with their rgb colors whose density is over 0.8.
@InProceedings{Guo_2022_NDVG_ACCV,
title = {Neural Deformable Voxel Grid for Fast Optimization of Dynamic View Synthesis},
author = {Guo, Xiang and Chen, Guanying and Dai, Yuchao and Ye, Xiaoqing and Sun, Jiadai and Tan, Xiao and Ding, Errui},
booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)},
year = {2022}
}