A General Divergence Modeling Strategy for Salient Object Detection

ACCV 2022

Xinyu Tian¹, Jing Zhang², Yuchao Dai¹

¹Northwestern Polytechnical University ²Australian National University

Paper

Video

Code

Lab Page

Abstract

Salient object detection is subjective in nature, which implies that multiple estimations should be related to the same input image. Most existing salient object detection models are deterministic following a point to point estimation learning pipeline, making them incapable of estimating the predictive distribution. Although latent variable model based stochastic prediction networks exist to model the prediction variants, the latent space based on the single clean saliency annotation is less reliable in exploring the subjective nature of saliency, leading to less effective saliency "divergence modeling". Given multiple saliency annotations, we introduce a general divergence modeling strategy via random sampling, and apply our strategy to an ensemble based framework and three latent variable model based solutions to explore the "subjective nature" of saliency. Experimental results prove the superior performance of our general divergence modeling strategy.

Model Architecture

The proposed strategy within the ensemble based framework (left) and the latent variable model based solutions (right). By randomly selecting one ground truth from the multiple annotations for model updating, the proposed strategy can better explore the contribution of multiple annotations for human visual system exploration.

Uncertainty Comparison with Single Annotation

Uncertainty maps of latent variable models with single majority voting GT ("_M") and multiple (M = 5) diverse GT using our divergence modeling strategy ("_R").

Applying Our Solution to SOTA SOD Model

Uncertainty maps of SOTA SOD model (PFSNet in particular) with the proposed general divergence modeling strategy, where "-E", "-G", "-V" and "-A" indicate the corresponding deep ensemble, GAN, VAE and ABP based model. The first column show image and the majority voting ground truth, and from the second column to the last one, we show prediction (top) and the corresponding predictive uncertainty (bottom).

Generated saliency maps and uncertainty maps

Generated diverse saliency maps. For each example, the first column shows, from left to right, input image, ground truth map after majority voting, and ground truth maps from 5 different annotators; the second column shows the predicted uncertainty map, prediction of our majority voting branch, and five generated diverse saliency maps.