Keynote Speakers

Michael Black

Chaired by
Djamila Aouada
Michael Black - Keynote Speaker at #3DV 2021
Learning digital humans for the Metaverse

The Metaverse will require artificial humans that interact with real humans as well as with real and virtual 3D worlds. This requires a real-time understanding of humans and scenes as well as the generation of natural and appropriate behavior. We approach the problem of creating such embodied human behavior through capture, modeling, and synthesis. First, we learn realistic and expressive 3D human avatars from 3D scans. We then train neural networks to estimate human pose and shape from images and video. Specifically, we focus on humans interacting with each other and the 3D world. By capturing people in action, we are able to train neural networks to model and generate human movement and human-scene interaction. To validate our models, we synthesize virtual humans in novel 3D scenes. The goal is to produce realistic human avatars that interact with virtual worlds in ways that are indistinguishable from real humans.


Michael Black received his B.Sc. from the University of British Columbia (1985), his M.S. from Stanford (1989), and his Ph.D. from Yale University (1992). He has held positions at the University of Toronto, Xerox PARC, and Brown University. He is one of the founding directors at the Max Planck Institute for Intelligent Systems in TĂĽbingen, Germany, where he leads the Perceiving Systems department. He is a Distinguished Amazon Scholar and an Honorarprofessor at the University of Tuebingen. His work has won several awards including the IEEE Computer Society Outstanding Paper Award (1991), Honorable Mention for the Marr Prize (1999 and 2005), the 2010 Koenderink Prize, the 2013 Helmholtz Prize, and the 2020 Longuet-Higgins Prize. He is a member of the German National Academy of Sciences Leopoldina and a foreign member of the Royal Swedish Academy of Sciences. In 2013 he co-founded Body Labs Inc., which was acquired by Amazon in 2017.

Katerina Fragkiadaki

Chaired by
Gerard Pons-Moll
Katerina Fragkiadaki - Keynote Speaker at #3DV 2021
Modular 3D neural scene representations for visuomotor control and language grounding

Current state-of-the-art perception models localize rare object categories in images, yet often miss basic facts that a two-year-old has mastered: that objects have 3D extent, they persist over time despite changes in the camera view, they do not 3D intersect, and others. We will discuss models that learn to map 2D and 2.5D images and videos into amodal completed 3D feature maps of the scene and the objects in it by predicting views. We will show the proposed models learn object permanence, have objects emerge in 3D without human annotations, can ground language in 3D visual simulations, and learn intuitive physics and controllers that generalize across scene arrangements and camera configurations. In this way, the proposed world-centric scene representations overcome many limitations of image-centric representations for video understanding, model learning and language grounding.


Katerina Fragkiadaki is an Assistant Professor in the Machine Learning Department in Carnegie Mellon University. She received her Ph.D. from University of Pennsylvania and was a postdoctoral fellow in UC Berkeley and Google research after that. Her work is on learning visual representations with little supervision and on combining spatial reasoning in deep visual learning. Her group develops algorithms for mobile computer vision, learning of physics and common sense for agents that move around and interact with the world. Her work has been awarded with a best Ph.D. thesis award, an NSF CAREER award, AFOSR Young Investigator award, a DARPA Young Investigator award, Google, TRI, Amazon and Sony faculty research awards.

Richard Newcombe

Chaired by
Silvia Zuffi
Richard Newcombe - Keynote Speaker at #3DV 2021
Foundational Predictive 3D Models of Reality in the Metaverse, Robotics and the Future of AI

Future robots, AI assistants and mixed reality in the Metaverse will be greatly enhanced by enabling rich predictive 3D models with far greater context about our physical reality than is possible today. Such models will require a live and updating model of reality capturing what is in our environments and how we interact within them. To obtain this context we will need to observe reality at greater sensing fidelity than has previously been possible, wherever we are, whenever we need it. Upcoming generations of robots and wearable devices like AR glasses enable always-on mobile sensing, computations and communications to push the observability of reality to the limit - observing reality change as a user or robot interacts in the environment. In this talk I will provide an overview of key stages in building a scalable, predictive 3D model of reality that is kept up-to-date by egocentric data from AR glasses and future robots. I'll chart a course in my research that has persuaded me this is the foundation on which contextualized AI can be built to power the future of robotics, AI assistants and mixed reality worlds in the Metaverse.


Richard Newcombe is Director of Research at Reality Labs Research, Meta. His team at RL-R is developing LiveMaps - a new generation of always-on 3D computer vision and machine perception technologies, devices, and infrastructure to unlock the potential of Augmented Reality and Contextualized AI. He received his PhD from Imperial College in London with a Postdoctoral Fellowship at the University of Washington and went on to co-found Surreal Vision that was acquired by Facebook in 2015. His original research introduced the Dense SLAM paradigm demonstrated in KinectFusion, DTAM and DynamicFusion, impacting a generation of real-time and interactive systems being developing in the emerging fields of AR/VR and robotics. His interests span sub-disciplines across machine perception and machine learning from hardware-software sensor device co-design to computer vision algorithms and novel infrastructure research.

Imari Sato

Chaired by
Shohei Nobuhara
Imari Sato - Keynote Speaker at #3DV 2021
Spectral signature analysis for scene understanding

The spectral absorption of objects provides innate information about material properties that have proven useful in applications such as classification, synthetic relighting, and medical imaging, to name a few. In recent years, the photoacoustic imaging technique (PAI) has received attention. PAI utilizes the photoacoustic effect for shape recovery that materials emit acoustic signals under light irradiation. What makes PAI different from ordinary 3D sensing is PAI can provide 3D geometric structure associated with wavelength-dependent absorption information of the interior of a target in a non-invasive manner. In this talk, I will introduce various shape recovery methods focusing on the properties of light such as absorption and refraction: 3D modeling by PAI, shape from water, shape from chromatic aberration, and shape from fluorescence.


Imari Sato received the BS degree in policy management from Keio University in 1994. After studying at Robotics Institute of Carnegie Mellon University as a visiting scholar, she received the MS and Ph.D. degrees in interdisciplinary Information Studies from the University of Tokyo in 2002 and 2005, respectively. In 2005, she joined the National Institute of Informatics, where she is currently a professor. Concurrently, she serves as a visiting professor at Tokyo Institute of Technology and a professor at the University of Tokyo. Her primary research interests are in the fields of computer vision (physics-based vision, spectral analysis, image-based modeling). She has received various research awards, including The Young Scientists’ Prize from The Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology (2009), and Microsoft Research Japan New Faculty award (2011).