A Spatio-temporal Transformer for 3D Human Motion Prediction


Emre Aksan, Manuel Kaufmann, Peng Cao and Otmar Hilliges


We propose a novel Transformer-based architecture for the task of generative modeling of 3D human motion. Previous work commonly relies on RNN-based models considering shorter forecast horizons reaching a stationary and often implausible state quickly. Recent studies show that implicit temporal representations in the frequency domain are also effective in making predictions for a pre-determined horizon. Our focus lies on learning spatio-temporal representations autoregressively and hence the generation of plausible future developments over both the short and long term. The proposed model learns high dimensional embeddings for skeletal joints and how to compose a temporally coherent pose via a decoupled temporal and spatial self-attention mechanism. Our dual attention concept allows the model to access current and past information directly and to capture both the structural and the temporal dependencies explicitly. We show empirically that this effectively learns the underlying motion dynamics and reduces error accumulation over time observed in auto-regressive models. Our model is able to make accurate short-term predictions and generate plausible motion sequences over long horizons.

PDF (protected)

  Important Dates

All deadlines are 23:59 Pacific Time (PT). No extensions will be granted.

Paper registration July 23 30, 2021
Paper submission July 30, 2021
Supplementary August 8, 2021
Tutorial submission August 15, 2021
Tutorial notification August 31, 2021
Rebuttal period September 16-22, 2021
Paper notification October 1, 2021
Camera ready October 15, 2021
Demo submission July 30 Nov 15, 2021
Demo notification Oct 1 Nov 19, 2021
Tutorial November 30, 2021
Main conference December 1-3, 2021