Mutual Information Regularization for Weakly-supervised RGB-D Salient Object Detection

TCSVT 2023


Aixuan Li1, Yuxin Mao1, Jing Zhang2, Yuchao Dai1

1Northwestern Polytechnical University, China    2Australian National University, Australia

Abstract


In this paper, we present a weakly-supervised RGB-D salient object detection model via scribble supervision. Specifically, as a multimodal learning task, we focus on effective multimodal representation learning via inter-modal mutual information regularization. In particular, following the principle of disentangled representation learning, we introduce a mutual information upper bound with a mutual information minimization regularizer to encourage the disentangled representation of each modality for salient object detection. Based on our multimodal representation learning framework, we introduce an asymmetric feature extractor for our multimodal data, which is proven more effective than the conventional symmetric backbone setting. We also introduce multimodal variational auto-encoder as stochastic prediction refinement techniques, which takes pseudo labels from the first training stage as supervision and generates refined prediction. Experimental results on benchmark RGB-D salient object detection datasets verify both effectiveness of our explicit multimodal disentangled representation learning method and the stochastic prediction refinement strategy, achieving comparable performance with the state-of-the-art fully supervised models.


Contributions


(1) We introduce a mutual information optimization method to explicitly model the contribution of RGB and depth for weakly-supervised RGB-D saliency detection;

(2) We present asymmetric feature extractors, taking advantage of different backbones' encoding abilities to achieve more reliable feature representation;

(3) We present a multimodal variational auto-encoder framework as the second stage refinement solution to refine model prediction, which is proven more robust to error propagation issues caused by pseudo labeling.


Network Architecture


Architecture

Benchmarking results of SOD models.


Performance comparison of SOD.

Visualization of the generated saliency maps from benchmark RGB-D saliency detection models and ours.


Performance comparison of SOD.

Citation


@article{li2023mutual,
  title={Mutual Information Regularization for Weakly-supervised RGB-D Salient Object Detection}, 
  author={Aixuan Li and Yuxin Mao and Jing Zhang and Yuchao Dai},
  booktitle = {IEEE Transactions on Circuits and Systems for Video Technology},
  year={2023}
}