Differential SfM and Image Correction for a Rolling Shutter Stereo Rig

IVC 2022

Bin Fan, Yuchao Dai, Zhiyuan Zhang, Ke Wang

School of Electronics and Information, Northwestern Polytechnical University, Xi'an, China

Abstract

Most modern consumer-grade cameras are equipped with an electronic rolling shutter (RS), leading to image distortions when the camera moves during image acquisition. We explore the first structure and motion estimation problem of a dynamic generalized RS stereo camera. Such a general configuration is commonplace in robots and autonomous driving applications. We propose a tractable RS stereo differential structure from motion (SfM) algorithm, taking into account the RS effect during consecutive imaging, which effectively compensates for the RS-stereo image distortion by a linear scaling operation on each optical flow. We further propose embedding the cheirality into RANSAC and develop a robust RS-stereo-aware full-motion estimation framework. We demonstrate that the RS stereo motion and depth map refined by our non-linear optimization schemes within the maximum likelihood criterion can be used for image correction to recover high-quality global shutter (GS) stereo images. Moreover, using the proposed generalized RS stereo differential SfM pipeline, the corrected images produce an accurate 3D scene structure as the ground-truth structure. Extensive experiments on both synthetic and real RS stereo data demonstrate the effectiveness of our model and method in various configurations.

Contribution

We develop a simple and flexible generalized RS stereo differential SfM algorithm over two consecutive frames, and propose imposing the cheirality to reach a robust RANSAC-based RS stereo motion estimation pipeline.
We propose two effective and efficient RS-stereo non-linear optimization techniques based on the maximum likelihood criterion to refine the camera relative pose.
We advance an RS-stereo image correction method to remove the inaccuracies induced by the RS effect, which can produce an accurate 3D scene geometry as the ground-truth.
The proposed model and method are universal and tractable. It neither forces identical specifications of the RS stereo rig nor requires strict time synchronization and scanline alignment between left and right RS cameras.
Extensive experiments on both simulated data and synthetic RS stereo images show the effectiveness and robustness of our proposed method.

Illustration of the generalized RS stereo configuration

Illustration of the exposure, readout, idle, and delay mechanisms of the generalized RS stereo camera across two consecutive frames. The sensor is exposed and read out row by row at a constant speed. Assuming the camera exposure is instantaneous, the frame time $\tau_i$ includes readout time $\tau^a_i$ and idle time $\tau^b_i$ in the single RS camera $i=l,r$. Moreover, there is a calibrated delay time $\tau^d$ between the exposure start times of left and right cameras.

Overview of our generalized RS-stereo differential SfM and image correction pipeline

From two consecutive general RS stereo images (a), we establish a generalized RS stereo model to robustly recover the RS stereo motion (b) and the 3D scene structure (c), and then achieve high-quality RS stereo correction (d), in which the red tilted poles in the foreground are repaired. Note that we are only showing an example for standard RS stereo images here.

Quantitative comparison

Quantitative evaluation for generalized RS stereo configuration under various settings: (a) varying the image resolution of the right RS camera, (b) varying the FPS of the right RS camera, (c) varying the exposure delay time of the right RS camera, (d) randomly varying the image resolution, FOV, and readout time ratio of the right RS camera separately.

Qualitative comparison on synthetic data

Qualitative results on synthetic data with various RS stereo configurations. In the first two columns, we show both left and right RS images to illustrate the generality of our RS stereo configuration. The last two columns represent the residual images, i.e., the absolute difference between the original or corrected RS image and the ground-truth GS image. From top to bottom: (a) Standard RS stereo camera with the same orientation. (b) RS stereo camera with a vertical orientation. (c) RS stereo camera with opposite orientation. (d) The right camera has a higher frame rate of 60Hz. (e) The right camera delays exposure by 1/60 seconds. (f) The right camera has a larger image resolution of 1200$\times$1200. It demonstrates that our method has excellent performances for varying setups in removing RS distortion and estimating RS depth maps, showing as darker residual images in the last column. Images have been scaled for visualization.

Qualitative results on real data

Qualitative results on real data collected by a UAV. Our method is effective to reconstruct the accurate 3D structure and remove the undesired RS distortion for the generalized RS stereo configuration in practice.