VRNet: Learning the Rectified Virtual Corresponding Points for 3D Point Cloud Registration

IEEE Transactions on Circuits and Systems for Video Technology

Zhiyuan Zhang¹, Jiadai Sun¹, Yuchao Dai¹, Bin Fan¹, Mingyi He¹

¹School of Electronics and Information, Northwestern Polytechnical University, Xi'an, China

Abstract

3D point cloud registration is fragile to outliers, which are labeled as the points without corresponding points. To handle this problem, a widely adopted strategy is to estimate the relative pose based only on some accurate correspondences, which is achieved by building correspondences on the identified inliers or by selecting reliable ones. However, these approaches are usually complicated and time-consuming. By contrast, the virtual point-based methods learn the virtual corresponding points (VCPs) for all source points uniformly without distinguishing the outliers and the inliers. Although this strategy is time-efficient, the learned VCPs usually exhibit serious collapse degeneration due to insufficient supervision and the inherent distribution limitation. In this paper, we propose to exploit the best of both worlds and present a novel robust 3D point cloud registration framework. We follow the idea of the virtual point-based methods but learn a new type of virtual points called rectified virtual corresponding points (RCPs), which are defined as the point set with the same shape as the source and with the same pose as the target . Hence, a pair of consistent point clouds, i.e . source and RCPs, is formed by rectifying VCPs to RCPs (VRNet), through which reliable correspondences between source and RCPs can be accurately obtained. Since the relative pose between source and RCPs is the same as the relative pose between source and target , the input point clouds can be registered naturally. Specifically, we first construct the initial VCPs by using an estimated soft matching matrix to perform a weighted average on the target points. Then, we design a correction-walk module to learn an offset to rectify VCPs to RCPs, which effectively breaks the distribution limitation of VCPs. Finally, we develop a hybrid loss function to enforce the shape and geometry structure consistency of the learned RCPs and the source to provide sufficient supervision. Extensive experiments on several benchmark datasets demonstrate that our method achieves advanced registration performance and time-efficiency simultaneously.

Contribution

We propose a point cloud registration method named VRNet to guarantee high accuracy and high timeefficiency. We present a new type of virtual points called RCPs, which maintain the same shape as the source and the same pose as the target, to help build reliable correspondences.
We design a novel correction-walk module in our VRNet to learn an offset to break the distribution limitation of the initial VCPs. Besides, a hybrid loss function is proposed to enhance the rigidity and geometric structure consistency between the learned RCPs and the source.
Remarkable results on benchmark datasets validate the superiority and effectiveness of our proposed method for robust 3D point cloud registration.

Motivation

1) The degeneration of the learned corresponding points. Red and blue represent the source and the target respectively. Pink indicates the learned corresponding points. The matching lines connect the source points and the corresponding points. Due to insufficient supervision, the learned corresponding points of DCP and RPMNet degenerate seriously. 2) Illustration of the distribution limitation of VCPs. The red and green represent the source and the target respectively. In this case, only a part of corresponding points can be fitted by the VCPs, which are generated by performing the weighted average on the target. And the corresponding points of the source points marked by the box can never be fitted since the distribution of the VCPs is limited in the convex set of the target. 3) Illustration of our VRNet. ① source and ④ target have different poses and different shapes (broken tail and wing in source and target respectively). The existing methods will learn degenerated VCPs indicated by the pink in ②. Conversely, our VRNet devotes to learning the RCPs indicated by ③, which maintain the same shape as the source and the same pose as the target, by unfolding VCPs and rectifying the partiality of the wing. Hence, the reliable correspondences of these consistent point clouds, i.e. source and RCPs, can be obtained easily since the influence of outliers has been eliminated. Further, the relative pose between source and RCPs can be solved accurately, which is same as the relative pose between source and target.

Network Architecture

The network architecture of our proposed VRNet. Given the source and target, DGCNN and Transformer are applied to extract point features. Then, a soft matching matrix is achieved based on the constructed similarity matrix. Virtual corresponding points and corresponding point features are obtained by using the matching matrix to perform the weighted average on the target point cloud and the target point features respectively. To break the distribution limitation, a correction-walk module is proposed to learn the offset to amend the VCPs to the desired RCPs. Finally, the rigid transformation is solved by the Procrustes algorithm. The network is supervised by the proposed hybrid loss function, which enforces the rigidity and geometry structure consistency between the learned RCPs and the source point cloud.

Method Analyze

Visualization of the source point (purple), the target point cloud (green), the learned VCPs (gray), RCPs (blue), and the learned offset (red line). All points clouds are calibrated to the same pose for clear comparison. Obviously, the VCPs approximate the source as much as possible but it is limited in the target distribution. Then, the correction-walk module amends the VCPs to the RCPs, which present a more consistent distribution with the source than the VCPs and the origianl target.