Self-supervised rigid transformation equivariance for accurate 3D point cloud registration

Pattern Recognition

Zhiyuan Zhang¹, Jiadai Sun¹, Yuchao Dai¹, Dingfu Zhou², Xibin Song², Mingyi He¹

¹Northwestern Polytechnical University ²Baidu Inc

Abstract

Transformation equivariance has been widely investigated in 3D point cloud representation learning for more informative descriptors, which formulates the change of the representation with respect to the transformation of the input point clouds explicitly. In this paper, we extend this property to the task of 3D point cloud registration and propose a rigid transformation equivariance (RTE) for accurate 3D point cloud registration. Specifically, RTE formulates the change of the relative pose explicitly with respect to the rigid transformation of the input point clouds. To exploit RTE, we adopt a Siamese structure network with two shared registration branches. One focuses on the input pair of point clouds, and the other one focuses on the new pair achieved by applying two random rigid transformations to the input point clouds respectively. Since the change of the two output relative poses has been predicted according to RTE, a new additional self-supervised loss is obtained to supervise the training. This general network structure can be integrated with most learning-based point cloud registration frameworks easily to improve the performance. Our method adopts the state-of-the-art virtual point-based pipelines as our shared branches, in which we propose a data-driven matching based on learned cost volume (LCV) rather than traditional hand-crafted matching strategies. Experimental evaluations on both synthetic datasets and real datasets validate the effectiveness of our proposed framework.

Contribution

We construct a dedicated RTE in 3D point cloud registration and design a Siamese network structure instead of the traditional “single branch” network. Our RTE structure can be integrated with the learning-based frameworks easily to improve the registration performance.
We propose to learn the matching matrix from the LCV instead of the hand-crafted matching strategy, which is more effective and efficient.
Remarkable performance on several datasets topping the stateof-the-art methods proves the effectiveness of our proposed method.

Network Architecture

Illustration of our proposed 3D point cloud registration framework (RTE-structure). First, we augment the input point clouds $\mathcal{X}$ and $\mathcal{Y}$ with random rigid transformations to $\mathcal{X}^\prime$ and $\mathcal{Y}^\prime$. Then, two shared branches are used to estimate the relative poses $\mathbf{T}$ and $\mathbf{T}^\prime$ between $\{\mathcal{X},\mathcal{Y}\}$ and $\{\mathcal{X}^\prime,\mathcal{Y}^\prime\}$ respectively. Each branch network consists of feature extractor, point matching, and motion estimation. The augmentation operation and the inherent RTE constraint, connect the input and output of the two branches respectively, where a closed-loop is constructed. This explicit correlation provides a self-supervised loss function without any extra ground truth information. This RTEstructure can be integrated into other learning-based point cloud registration easily.

Illustration of the branch registration network architecture. First, we extract the point features for the source and target point clouds $\mathcal{X}$ and $\mathcal{Y}$. Then, in the matching stage, we construct the LCV by replicating and concatenating the features. And the subsequent MLP and row-wise softmax are applied to regress the LCV to the matching matrix. The virtual corresponding points to $\mathcal{X}$ are obtained by using matching matrix to perform the weighted average on $\mathcal{Y}$. Finally, the Procrustes algorithm is used to estimate the rotation matrix $\mathbf{R}$ and translation vector $\mathbf{t}$. Right part shows the detailed matching matrix learning process.