School of Electronics and Information, Northwestern Polytechnical University, Xi'an, China
Rolling shutter (RS) images can be viewed as the result of the row-wise combination of global shutter (GS) images captured by a virtual moving GS camera over the period of camera readout time. The RS effect brings tremendous difficulties for the downstream applications. In this paper, we propose to invert the above RS imaging mechanism, i.e., recovering a high framerate GS video from consecutive RS images to achieve RS temporal super-resolution (RSSR). This extremely challenging problem, e.g., recovering 1440 GS images from two 720-height RS images, is far from being solved end-to-end. To address this challenge, we exploit the geometric constraint in the RS camera model, thus achieving geometry-aware inversion. Specifically, we make three contributions in resolving the above difficulties: (i) formulating the bidirectional RS undistortion flows under the constant velocity motion model, (ii) building the connection between the RS undistortion flow and optical flow via a scaling operation, and (iii) developing a mutual conversion scheme between varying RS undistortion flows that correspond to different scanlines. Building upon these formulations, we propose the first RS temporal super-resolution network in a cascaded structure to extract high framerate global shutter video. Our method explores the underlying spatio-temporal geometric relationships within a deep learning framework, where no extra supervision besides the middle-scanline ground truth GS image is needed. Essentially, our method can be very efficient for explicit propagation to generate GS images under any scanline. Experimental results on both synthetic and real data show that our method can produce high-quality GS image sequences with rich details, outperforming state-of-the-art methods.
The RS image is generated by continuously synthesizing the GS image row by row, while our rolling shutter temporal super-resolution (RSSR) pipeline reverses this process, i.e., extracting the latent GS image sequence from two consecutive RS images.
Given two input consecutive RS frames, we first estimate the bidirectional optical flows. Then, we use the UNet architecture to resolve the correlation maps. Next, the middle-scanline RS undistortion flows can be calculated explicitly, while being certifiable. Finally, we adopt softmax splatting to generate the target middle-scanline GS frames. Note that our main network is designed to predict the latent GS images corresponding to the middle scanline during training. In particular, in the test phase, the RS undistortion flows for any scanline $s\in[0,h-1]$ can be propagated explicitly (see dashed arrow), followed by the recovery of the GS image corresponding to scanline $s$.
Compared to the isotropically smooth regular optical flow map, the RS undistortion flow map exhibits a more significant scanline dependence. On the one hand, the RS undistortion flows near the target scanline appear as lighter colors (i.e., smaller warping displacement values). On the other hand, the RS undistortion flows corresponding to pixels smaller than and larger than the target scanline show different colors (i.e., different warping displacement directions).
Visual comparisons on Fastec-RS testing set. We zoom in the correction results according to the blue boxes. While other methods cause various artifacts, our method produces best results.
@inproceedings{fan_RSSR_ICCV21,
title={Inverting a rolling shutter camera: bring rolling shutter images to high framerate global shutter video},
author={Fan, Bin and Dai, Yuchao},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={4228--4237},
year={2021}
}
@article{fan_RSSR_PAMI23,
title={Rolling shutter inversion: bring rolling shutter images to high framerate global shutter video},
author={Fan, Bin and Dai, Yuchao and Li, Hongdong},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2023},
volume={45},
number={5},
pages={6214--6230},
publisher={IEEE}
}