Modality-Guided Subnetwork for Salient Object Detection
Zongwei WU, Guillaume Allibert, Christophe Stolz, Chao Ma and Cedric Demonceaux
Convolutional neural network (CNN) has demonstrated its effectiveness in salient object detection (SOD). Nevertheless, it is limited by the capability of learning geometry from RGB images due to the fixed design of kernel shape and size, which does not account for the visual appearance with large deformation. Recently, RGBD-based SOD has attracted research attention. The extra depth clues such as boundary clues, surface normal, shape attribute, etc., contribute to the identification of salient objects with complicated scenarios. However, most RGBD networks require multi-modalities from the input side and feed them separately through a two-stream design, which inevitably results in extra costs on computation and depth sensors. To tackle these inconveniences, we present in this paper the modality-guided subnetwork (MGSnet), a novel fusion design that can be applied to both RGB and RGBD two-stream models. It has the following superior designs: 1) It is possible to rely only on RGB images for training and testing data. Taking the inner workings of depth-prediction networks into account, we propose to estimate the pseudo-geometry maps from RGB input — essentially mimicking the multi-modality input. 2) Our proposed MGSnet for RGB SOD results in real-time inference but achieves state-of-the-art performance compared to other RGB models. 3) The flexible and light-weight design of MGS facilitates the integration into RGBD two-streaming models to enable further progress but with minimal cost.