^{1}Northwestern Polytechnical University
^{2}FNii and SSE, CUHK-Shenzhen
^{3}Baidu Inc.

^{*}guoxiang@mail.nwpu.edu.cn

This paper proposes a neural radiance field (NeRF) approach for novel view synthesis of dynamic scenes using forward warping. Existing methods often adopt a static NeRF to represent the canonical space, and render dynamic images at other time steps by mapping the sampled 3D points back to the canonical space with the learned \emph{backward flow} field. However, this backward flow field is non-smooth and discontinuous, which is difficult to be fitted by commonly used smooth motion models. To address this problem, we propose to estimate the \emph{forward flow} field and directly warp the canonical radiance field to other time steps. Such forward flow field is smooth and continuous within the object region, which benefits the motion model learning. To achieve this goal, we represent the canonical radiance field with voxel grids to enable efficient forward warping, and propose a differentiable warping process, including an average splatting operation and an inpaint network, to resolve the many-to-one and one-to-many mapping issues. Thorough experiments show that our method outperforms existing methods in both novel view rendering and motion modeling, demonstrating the effectiveness of our forward flow motion modeling. Code will be released.

- To the best of our knowledge, we are the first to investigate forward warping in dynamic view synthesis for general scenes. We propose a novel canonical based NeRF with forward flow motion modeling for dynamic view synthesis. Thanks to the forward flow field, our method can better represent the object motions, and explicitly recover the trajectory of a surface point.
- We introduce voxel grid based canonical radiance field to enable reasonable computation of forward warping, and propose a differentiable forward warping method, including an average splatting operation and an inpaint network, to solve the many-to-one and one-to-many issues of forward warping.
- Experiments on multiple datasets show that our method outperforms existing methods on the D-NeRF dataset, Hypernerf dataset, NHR dataset and our proposed dataset.

**Comparison of backward flow and forward flow.**
This figure shows an example of backward and forward flow changes. **(a)** An example of dynamic scene. **(b)** With the bucket lifting up, different types of points cover the green point $\mathbf{p}$, which needs very different backward flows to map this point back to canonical space. **(d)** shows the norm changes of the backward flow, which is not smooth. **(c)** On the other hand, the forward flow of position $\mathbf{q}$, which maps the constant object point from canonical space to other times, is smooth and continuous. **(e)** shows the norm changes of the forward flow.

**Overview of our proposed method.** **a)** We represent a static scene at canonical time with a voxel grid based radiance field for density & color and a voxel grid based trajectory field for deformations; **b)** We propose to first forward warp canonical radiance field using the forward flow by average splatting; **c)** We then inpaint the warped radiance field using a inpaint network; Specifically, **1. Voxel Grid Based Canonical Field** contains two models. The canonical radiance field $\mathbf{V}_{\text{R}}^{\text{Can}}$ is estimated by a Light MLP which takes canonical radiance feature $\mathbf{V}_{\text{Rf}}^{\text{Can}}$ and corresponding 3D coordinates $\mathbf{V}_{\text{p}}^{\text{Can}}$ as input. The canonical trajectory field $\mathbf{V}_{\text{T}}^{\text{Can}}$ is estimated by another Light MLP which takes deformation feature and coordinates as input. The deformation flow $\mathbf{V}_{\text{flow}}^{t}$ from canonical to time $t$ can then be obtained; **2. Differential Forward Warping** first warp $\mathbf{V}_{\text{R}}^{\text{Can}}$ to get radiance field $\mathbf{V}_{\text{R}}^{t}$ at time $t$. Then, the $\mathbf{V}_{\text{R}}^{t}$ is inpainted by a inpaint network, which is $\mathbf{V}_{\text{R}_{\text{Inp}}}^{t}$; **3. Volume Rendering** render colors of rays at time $t$ based on $\mathbf{V}_{\text{R}_{\text{Inp}}}^{t}$

We show some novel view synthesized images on the selected test set of the dataset. Comparing ours with ground truth, D-NeRF and TiNeuVox. Our model yields cleaner images with more details.

**Qualitative comparison on HyperNeRF Dataset.** Our results are closer to ground truth than other methods.

We show canonical radiance field comparison with D-NeRF. Given the error map between the ground truth and rendered images, we can see that the canonical frame yielded by ours is closer to the ground truth. The results of D-NeRF are blurry and have large displacements.

**Trajectory learned by the canonical trajectory field.** Light blue is the canonical frame, the curve represents the historical motion trajectory. More results can be found in video.

```
@InProceedings{Guo_2023_ICCV,
author = {Guo, Xiang and Sun, Jiadai and Dai, Yuchao and Chen, Guanying and Ye, Xiaoqing and Tan, Xiao and Ding, Errui and Zhang, Yumeng and Wang, Jingdong},
title = {Forward Flow for Novel View Synthesis of Dynamic Scenes},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {16022-16033}
}
```