Monocular 3D Reconstruction of Interacting Hands via Collision-Aware Factorized Refinements
Yu Rong, Jingbo Wang, Ziwei Liu and Chen Change Loy
3D interacting hand reconstruction is essential to facilitate human-machine interaction, g, understanding human emotions and how humans interact with the surrounding. Previous works in this field either rely on auxiliary inputs such as depth images or they can only handle a single hand if monocular single RGB images are used. Single-hand methods tend to generate collided hand meshes, when applied to closely interacting hands, since they cannot model the interactions between two hands explicitly. In this paper, we make the first attempt to take monocular single RGB images as input and reconstructs 3D interacting hands as output, which is able to achieve precise 3D hand poses with just minimal collision. This is made possible via a two-stage framework. Specifically, the first stage consists a convolutional neural network to generate coarse predictions that tolerate collisions but encourage pose-accurate hand meshes. The second stage progressively ameliorates the collision problem through a series of factorized refinements while retaining the preciseness of 3D poses. We carefully investigate potential implementations for the factorized refinement, considering the trade off between efficiency and accuracy. Extensive quantitative and qualitative results on large-scale datasets such as InterHand2.6M demonstrate the effectiveness of the proposed approach. Codes and models will be made publicly available.