A General Divergence Modeling Strategy for Salient Object Detection

ACCV 2022


Xinyu Tian1, Jing Zhang2, Yuchao Dai1

1Northwestern Polytechnical University    2Australian National University

Abstract


Salient object detection is subjective in nature, which implies that multiple estimations should be related to the same input image. Most existing salient object detection models are deterministic following a point to point estimation learning pipeline, making them incapable of estimating the predictive distribution. Although latent variable model based stochastic prediction networks exist to model the prediction variants, the latent space based on the single clean saliency annotation is less reliable in exploring the subjective nature of saliency, leading to less effective saliency "divergence modeling". Given multiple saliency annotations, we introduce a general divergence modeling strategy via random sampling, and apply our strategy to an ensemble based framework and three latent variable model based solutions to explore the "subjective nature" of saliency. Experimental results prove the superior performance of our general divergence modeling strategy.


Overview Video



Poster


Architecture


Model Architecture


Architecture

The proposed strategy within the ensemble based framework (left) and the latent variable model based solutions (right). By randomly selecting one ground truth from the multiple annotations for model updating, the proposed strategy can better explore the contribution of multiple annotations for human visual system exploration.


Uncertainty Comparison with Single Annotation


Architecture

Uncertainty maps of latent variable models with single majority voting GT ("_M") and multiple (M = 5) diverse GT using our divergence modeling strategy ("_R").


Uncertainty w.r.t. Diversity of Annotation


Architecture

Uncertainty map comparison w.r.t. the number of annotations for each image.


Applying Our Solution to SOTA SOD Model


Architecture

Uncertainty maps of SOTA SOD model (PFSNet in particular) with the proposed general divergence modeling strategy, where "-E", "-G", "-V" and "-A" indicate the corresponding deep ensemble, GAN, VAE and ABP based model. The first column show image and the majority voting ground truth, and from the second column to the last one, we show prediction (top) and the corresponding predictive uncertainty (bottom).


Generated saliency maps and uncertainty maps


Architecture

Generated diverse saliency maps. For each example, the first column shows, from left to right, input image, ground truth map after majority voting, and ground truth maps from 5 different annotators; the second column shows the predicted uncertainty map, prediction of our majority voting branch, and five generated diverse saliency maps.


Citation


@inproceedings{tian2022general,
  title={A General Divergence Modeling Strategy for Salient Object Detection},
  author={Tian, Xinyu and Zhang, Jing and Dai, Yuchao},
  booktitle={Proceedings of the Asian Conference on Computer Vision},
  pages={2406--2424},
  year={2022}
}