Mix3D: Out-of-Context Data Augmentation for 3D Scenes
Alexey Nekrasov, Jonas Schult, Or Litany, Bastian Leibe and Francis Engelmann
We present Mix3D, a data augmentation technique for segmenting large-scale 3D scenes that is robust towards strong scene priors. As scene context helps reasoning about object semantics, current works focus on models with large capacity and receptive fields that can fully capture the global context of an input 3D scene. In this work, we focus on the importance of balancing global scene context and local object geometry, with the goal of avoiding overfitting to contextual priors in the training set. To this end, we propose Mix3D, which creates new training samples by mixing 3D scenes. By doing so, it implicitly places object instances into novel out-of-context environments. We perform detailed analysis to understand the importance of global context, local geometry and the effect of mixing scenes. In experiments, we show that models trained with Mix3D profit from a significant performance boosts on indoor (ScanNet, S3DIS) and outdoor datasets (SemanticKITTI). Trained with Mix3D, MinkowskiNet outperforms all prior state-of-the-art methods by a significant margin on the ScanNet test benchmark (78.1% mIoU).