Joint PhD offer in collaboration with Microsoft: Temporal Integration for Shape and Appearance Modeling

The Morpheo INRIA team and Microsoft research are setting up a collaboration on the capture and modelling of moving shapes using multiple videos. This PhD proposal is part of this collaboration with the objective to improve the quality of both the geometry and appearance of moving shapes models by leveraging the time dimension in the modelling process. The PhD will take place at Inria Grenoble Rhône-Alpes and will involve regular visits and stays at Microsoft in Redmond (USA) and Cambridge (UK).

PhD Objectives

4D modeling of dynamic scenes offers exciting new prospects for shape and performance capture in virtual and mixed reality contexts. It is the process of retrieving geometry and appearance information from multiple temporal frames of a scene or subject, observed from one or multiple color or depth cameras. Leveraging temporal analysis opens the possibility for geometric and / or texture detail refinement, as exhibited in recent works in the field [3,4]. Many challenges remain to be addressed in terms of information representation, computation time, full-body scope of the temporal accumulation, treatment of everyday and loose clothing and wearable accessories, sequence geometry realignment under fast motions, and efficient and robust estimation, for applicability in a virtual/mixed reality context.

We propose to explore, with a jointly supervised PhD student, how hybrid representations, and pre-learned low or mid-level scale geometry, motion and texture characteristics and statistics could contribute both to efficiently build statistical priors, and accelerate the process of retrieving their estimate through discriminative approaches. Recent learning techniques such as convolutional neural networks can be efficiently used toward this goal.

Highly detailed representations can be acquired with specific setups (e.g. the Kinovis platform@INRIA Grenoble and the Microsoft multi-camera platforms) that would allow to produce the training data necessary for these techniques with high precision. Drastic improvements in visual quality and compression of acquired 4D sequences is expected, by leveraging temporal redundancy, automatically inferred smart keyframing and interpolation in the sequences, exploitation of correlations between shape geometry and appearance. We also envision breakthroughs in the 4D modelling domain using training strategies that would allow geometry and appearance of 4D models to be gradually estimated and refined, even from limited or incomplete input data.

Candidate profile

The PhD candidate should hold a master’s degree in computer science. Very good background in computer vision, 3D vision, and/or machine learning are expected. The candidate will be co-supervised by Jean-Sébastien Franco and Edmond Boyer at Inria Grenoble, France, with the involvement of Steve Sullivan, Andrew Fitzgibbon, Jamie Shotton and Marta Wilczkowiak at Microsoft.

Inria Grenoble
Inria is a leading French research centre in computer science, with an international culture – the English language being widely adopted. The Grenoble centre is located at the heart of the French Alps, a very dynamic region for new technologies offering a large range of recreational activities.

Microsoft Research
The PhD will involve regular visits and stays to the Microsoft centres at Redmond (USA) and Cambridge (UK).


Informal inquires can be addressed to and Please upload your application, quoting the PhD subject and Microsoft collaboration, on the team website:


  1. Real-time human pose recognition in parts from single depth images. Jamie Shotton, Toby Sharp, Alex Kipman, Andrew Fitzgibbon, Mark Finocchio, Andrew Blake, Mat Cook, Richard Moore, Communications of the ACM, 2013, 56 (1), 116-124
  2. High-Quality Streamable Free-Viewpoint Video. Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Steve Sullivan. ACM Trans. Graphics (SIGGRAPH), 2015, 34, 4
  3. Fusion4D: Real-time Performance Capture of Challenging Scenes. Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Ryan Fanello, Adarsh Kowdle, Sergio Orts Escolano, Christoph Rhemann, David Kim, Jonathan Taylor, Pushmeet Kohli, Vladimir Tankovich, Shahram Izadi. ACM Transactions on Graphics (TOG), 2016, 35 (4), 114
  4. High resolution 3D shape texture from multiple videos. Vagia Tsiminaki, Jean-Sébastien Franco, Edmond Boyer. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014.
  5. An efficient volumetric framework for shape tracking. Benjamin Allain, Jean-Sébastien Franco, Edmond Boyer. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
  6. Volumetric 3d tracking by detection.Chun-Hao Huang, Benjamin Allain, Jean-Sébastien Franco, Nassir Navab, Slobodan Ilic, Edmond Boyer. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016