Learning Bilateral Cost Volume for Rolling Shutter Temporal Super-Resolution

Joint Appearance and Motion Learning for Efficient Rolling Shutter Correction

CVPR 2023

Bin Fan, Yuxin Mao¹, Yuchao Dai, Zhexiong Wan¹, Qi Liu,

School of Electronics and Information, Northwestern Polytechnical University, Xi'an, China

Abstract

Rolling shutter correction (RSC) is becoming increasingly popular for RS cameras that are widely used in commercial and industrial applications. Despite the promising performance, existing RSC methods typically employ a two-stage network structure that ignores intrinsic information interactions and hinders fast inference. In this paper, we propose a single-stage encoder-decoder-based network, named JAMNet, for efficient RSC. It first extract pyramid features from consecutive RS inputs, and then simultaneously refines the two complementary information (i.e., global shutter appearance and undistortion motion field) to achieve mutual promotion in a joint learning decoder. To inject sufficient motion cues for guiding joint learning, we introduce a transformer-based motion embedding module and propose to pass hidden states across pyramid levels. Moreover, we present a new data augmentation strategy “vertical flip + inverse order” to release the potential of the RSC datasets. Experiments on various benchmarks show that our approach surpasses the state-of-the-art methods by a large margin, especially with a 4.7dB PSNR leap on real-world RSC.

Contribution

We propose a tractable single-stage architecture to jointly perform GS appearance refinement and undistortion motion estimation for efficient RS correction.
We develop a general data augmentation strategy, i.e., vertical flip and inverse order, to maximize the exploration of the RS correction datasets.
xperiments show that our approach not only achieves SOTA RSC accuracy, but also enjoys a fast inference speed and a flexible and compact network structure.

Network Architecture

Overall architecture of our JAMNet. It has three main processes: a feature pyramid encoder, a transformer-based motion embedding module, and a joint appearance and motion decoder. After extracting the hierarchical pyramid features, the transformer is used for motion embedding to inject motion cues, followed by a coarse-to-fine decoder that gradually refines the GS appearance and motion fields at the same time, until synthesizing the final full-resolution GS image. A hidden state $h^{j}$ is also passed sequentially.

Joint Appearance and Motion Learning for Efficient Rolling Shutter Correction

CVPR 2023

Bin Fan, Yuxin Mao¹, Yuchao Dai, Zhexiong Wan¹, Qi Liu,

Abstract

Contribution

Network Architecture

Quantitative comparisons

Qualitative results

Citation

Joint Appearance and Motion Learning for Efficient Rolling Shutter Correction

CVPR 2023

Bin Fan, Yuxin Mao1, Yuchao Dai, Zhexiong Wan1, Qi Liu,

Abstract

Contribution

Network Architecture

Quantitative comparisons

Qualitative results

Citation

Bin Fan, Yuxin Mao¹, Yuchao Dai, Zhexiong Wan¹, Qi Liu,