Keynote Speakers

Michael Black

Chaired by:
Djamila Aouada
Michael Black - Keynote Speaker at #3DV 2021
Learning digital humans for the Metaverse

The Metaverse will require artificial humans that interact with real humans as well as with real and virtual 3D worlds. This requires a real-time understanding of humans and scenes as well as the generation of natural and appropriate behavior. We approach the problem of creating such embodied human behavior through capture, modeling, and synthesis. First, we learn realistic and expressive 3D human avatars from 3D scans. We then train neural networks to estimate human pose and shape from images and video. Specifically, we focus on humans interacting with each other and the 3D world. By capturing people in action, we are able to train neural networks to model and generate human movement and human-scene interaction. To validate our models, we synthesize virtual humans in novel 3D scenes. The goal is to produce realistic human avatars that interact with virtual worlds in ways that are indistinguishable from real humans.


Michael Black received his B.Sc. from the University of British Columbia (1985), his M.S. from Stanford (1989), and his Ph.D. from Yale University (1992). He has held positions at the University of Toronto, Xerox PARC, and Brown University. He is one of the founding directors at the Max Planck Institute for Intelligent Systems in TĂĽbingen, Germany, where he leads the Perceiving Systems department. He is a Distinguished Amazon Scholar and an Honorarprofessor at the University of Tuebingen. His work has won several awards including the IEEE Computer Society Outstanding Paper Award (1991), Honorable Mention for the Marr Prize (1999 and 2005), the 2010 Koenderink Prize, the 2013 Helmholtz Prize, and the 2020 Longuet-Higgins Prize. He is a member of the German National Academy of Sciences Leopoldina and a foreign member of the Royal Swedish Academy of Sciences. In 2013 he co-founded Body Labs Inc., which was acquired by Amazon in 2017.

Katerina Fragkiadaki

Chaired by:
Gerard Pons-Moll
Katerina Fragkiadaki - Keynote Speaker at #3DV 2021
Modular 3D neural scene representations for visuomotor control and language grounding

Current state-of-the-art perception models localize rare object categories in images, yet often miss basic facts that a two-year-old has mastered: that objects have 3D extent, they persist over time despite changes in the camera view, they do not 3D intersect, and others. We will discuss models that learn to map 2D and 2.5D images and videos into amodal completed 3D feature maps of the scene and the objects in it by predicting views. We will show the proposed models learn object permanence, have objects emerge in 3D without human annotations, can ground language in 3D visual simulations, and learn intuitive physics and controllers that generalize across scene arrangements and camera configurations. In this way, the proposed world-centric scene representations overcome many limitations of image-centric representations for video understanding, model learning and language grounding.


Katerina Fragkiadaki is an Assistant Professor in the Machine Learning Department in Carnegie Mellon University. She received her Ph.D. from University of Pennsylvania and was a postdoctoral fellow in UC Berkeley and Google research after that. Her work is on learning visual representations with little supervision and on combining spatial reasoning in deep visual learning. Her group develops algorithms for mobile computer vision, learning of physics and common sense for agents that move around and interact with the world. Her work has been awarded with a best Ph.D. thesis award, an NSF CAREER award, AFOSR Young Investigator award, a DARPA Young Investigator award, Google, TRI, Amazon and Sony faculty research awards.

Richard Newcombe

Chaired by:
Shohei Nobuhara
Richard Newcombe - Keynote Speaker at #3DV 2021


Richard Newcombe is Director of Research at Facebook Reality Labs. His team at FRL-R is developing LiveMaps - a new generation of always-on 3D computer vision and machine perception technologies, devices, and infrastructure to unlock the potential of Augmented Reality and Contextualized AI. He received his PhD from Imperial College in London with a Postdoctoral Fellowship at the University of Washington and went on to co-found Surreal Vision that was acquired by Facebook in 2015. His original research introduced the Dense SLAM paradigm demonstrated in KinectFusion, DTAM and DynamicFusion, impacting a generation of real-time and interactive systems being developing in the emerging fields of AR/VR and robotics. His interests span sub-disciplines across machine perception and machine learning from hardware-software sensor device co-design to computer vision algorithms and novel infrastructure research.

Imari Sato

Chaired by:
Silvia Zuffi
Imari Sato - Keynote Speaker at #3DV 2021


Imari Sato received the BS degree in policy management from Keio University in 1994. After studying at Robotics Institute of Carnegie Mellon University as a visiting scholar, she received the MS and Ph.D. degrees in interdisciplinary Information Studies from the University of Tokyo in 2002 and 2005, respectively. In 2005, she joined the National Institute of Informatics, where she is currently a professor. Concurrently, she serves as a visiting professor at Tokyo Institute of Technology and a professor at the University of Tokyo. Her primary research interests are in the fields of computer vision (physics-based vision, spectral analysis, image-based modeling). She has received various research awards, including The Young Scientists’ Prize from The Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology (2009), and Microsoft Research Japan New Faculty award (2011).