1Northwestern Polytechnical University
2Australian National University
Salient object detection is subjective in nature, which implies that multiple estimations should be related to the same input image. Most existing salient object detection models are deterministic following a point to point estimation learning pipeline, making them incapable of estimating the predictive distribution. Although latent variable model based stochastic prediction networks exist to model the prediction variants, the latent space based on the single clean saliency annotation is less reliable in exploring the subjective nature of saliency, leading to less effective saliency "divergence modeling". Given multiple saliency annotations, we introduce a general divergence modeling strategy via random sampling, and apply our strategy to an ensemble based framework and three latent variable model based solutions to explore the "subjective nature" of saliency. Experimental results prove the superior performance of our general divergence modeling strategy.
The proposed strategy within the ensemble based framework (left) and the latent variable model based solutions (right). By randomly selecting one ground truth from the multiple annotations for model updating, the proposed strategy can better explore the contribution of multiple annotations for human visual system exploration.
Uncertainty maps of latent variable models with single majority voting GT ("_M") and multiple (M = 5) diverse GT using our divergence modeling strategy ("_R").
Uncertainty map comparison w.r.t. the number of annotations for each image.
Uncertainty maps of SOTA SOD model (PFSNet in particular) with the proposed general divergence modeling strategy, where "-E", "-G", "-V" and "-A" indicate the corresponding deep ensemble, GAN, VAE and ABP based model. The first column show image and the majority voting ground truth, and from the second column to the last one, we show prediction (top) and the corresponding predictive uncertainty (bottom).
Generated diverse saliency maps. For each example, the first column shows, from left to right, input image, ground truth map after majority voting, and ground truth maps from 5 different annotators; the second column shows the predicted uncertainty map, prediction of our majority voting branch, and five generated diverse saliency maps.
@inproceedings{tian2022general,
title={A General Divergence Modeling Strategy for Salient Object Detection},
author={Tian, Xinyu and Zhang, Jing and Dai, Yuchao},
booktitle={Proceedings of the Asian Conference on Computer Vision},
pages={2406--2424},
year={2022}
}