Inria-Kyoto Workshop on 4D Modeling

A workshop on 4D Modeling is organized on the 26th november 2013, at the INRIA Grenoble, in the frame of the INRIA-Kyoto collaboration. 4D Modeling refers to the ability to build, analyze and interpret spatio-temporal models of shapes that evolve over time using visual information, typically colour and depth images. It is a subject of growing interest in the computer vision, computer graphic and medical imaging communities. This workshop will give the opportunity to see recent works conducted in this area at Kyoto University, Munich Technical University and INRIA Grenoble.


Takashi Matsuyama, Kyoto University

Title: Cooperative Distributed Vision Systems for Real-Time Multi-Target Tracking and 3D Video Capture of an Object Moving in a Wide Area .

Abstract: We have been studying Cooperative Distributed Vision Systems for these twenty years. The systems consist of a group of active cameras connected via communication networks. To make the cameras work in an integrated way, we have to develop communication protocols for cooperation as well as video data processing and camera control. This talk gives an overview of our research attainments including 1) Dynamic Memory Architecture to synchronize asynchronous processes, 2) real-time multi-target tracking system by a group of active vision agents, and 3) synchronised calibrated multi-view video capture system for generating 3D video of an object moving in a wide area.

Short bio: Professor Takashi Matsuyama received B. Eng., M. Eng., and D. Eng. degrees in electrical engineering from Kyoto University, Japan, in 1974, 1976, and 1980, respectively. He is currently a professor in the Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University. He served as the director of the Academic Center for Computing and Media Studies (2002-2006), the director general of the Institute for Information Management and Communications (2005-2010), and a vice president of Kyoto University (2008-2010).
He has been studying cooperative distribute sensing-control-reasoning systems over 30 years. Their application fields include knowledge-based image understanding, visual surveillance, 3D video, human-computer interaction, and smart energy management. He wrote more than 100 journal papers and more than 20 books including three research monographs: A Structural Analysis of Complex Aerial Photographs, PLENUM, 1980, SIGMA: A Knowledge-Based Aerial Image Understanding System, PLENUM, 1990, and 3D Video and its Applications, Springer, 2012.

Shohei Nobuhara, Kyoto University

Title: 3D Shape from Silhouettes in Water for Online Novel-View Synthesis.

Abstract: This talk is aimed at presenting a new algorithm for full 3D shape reconstruction and online free-viewpoint rendering of objects in water. The key contributions are (1) a new calibration model for the refractive projection, and (2) a new 3D shape reconstruction algorithm based on shape-from-silhouette concept and specially designed for the new calibration model. We also propose an online free-viewpoint rendering system as a practical application of the proposed scheme, and demonstrate a real-time novel-view synthesis of a fish in a water bowl captured by cameras outside of the bowl.

Short bio: Shohei Nobuhara received his B.Sc. in Engineering, M.Sc. andPh.D. in Informatics from Kyoto University, Japan, in 2000, 2002, and 2005 respectively. From 2005 to 2010, he was a postdoctoral researcher and a research associate at Kyoto University. Since 2010, he has been a senior lecturer at Kyoto University. His research interests include computer vision , 3D video, and aqua vision – 3D video in water.

Tony Tung, Kyoto University

Title: Spatiotemporal descriptor for surface dynamics characterization.

Abstract: We present a spatiotemporal descriptor designed for dynamic surfaces. Using state-of-the-art technology, details on dynamic surfaces such as cloth wrinkle or facial expression can be accurately reconstructed. Hence, various results (e.g., surface rigidity, elasticity, etc.) could be derived by microscopic categorization of surface elements. We propose a timing-based descriptor to model local spatiotemporal variations of surface intrinsic properties. The low-level descriptor encodes gaps between local event dynamics of neighboring key points using timing structure of linear dynamical systems (LDS). We also introduce the bag-of-timings (BoT) paradigm for surface dynamics characterisation. Experiments are performed on synthesized and real-world datasets. We show the proposed descriptor can be used for challenging dynamic surface classification and segmentation with respect to rigidity at surface keypoints.

Short bio: Tony Tung received the Ph.D. degree in Signal and Image processing from the Ecole Nationale Superieure des Telecommunications de Paris in 2005. He worked as IT consultant (2000-2002) and senior R&D engineer (2005-2008) in IT companies, and as postdoctoral research fellow at Kyoto University (2005, 2008-2009). Since 2010, he is an Assistant Professor at Kyoto University, working jointly at the Department of Intelligence Science and Technology, Graduate School of Informatics, and at the Academic Center for Computing and Media Studies. His research interests include computer vision, pattern recognition, shape modeling, and multimodal interaction. He was awarded Fellowships from the Japan Society for the Promotion of Science in 2005 and 2008, and Grant-in-Aid for Young Scientists in 2011.

Slobodan Ilic, TU Munich

Title: Multi-task Forest for Human Pose Estimation in Depth Images.

Abstract: In this talk, we address the problem of human body pose estimation from depth data. Previous works based on random forests relied either on a classification strategy to infer the different body parts or on a regression approach to predict directly the joint positions. To permit the inference of very generic poses, those approaches did not consider additional information during the learning phase, as e.g. the performed activity. In the present work, we introduce a novel approach to integrate additional information at training time that actually improves the pose prediction during the testing. Our main contribution is a structured output forest that aims at solving a joint regression-classification task: each foreground pixel from a depth image is associated to its relative displacements to the 3D joint positions as well as the activity class. Integrating activity information in the objective function during forest training permits to better separate the space of 3D poses, leading to a better modeling of the posterior. Thereby, our approach provides an improved pose prediction, and as a by-product, can give an estimate of the performed activity. We performed experiments on a dataset performed by 10 people associated with the ground truth body poses from a motion capture system. To demonstrate the benefits of our approach, we divided the poses into 10 different activities for the training phase, which permits to improve human pose estimation compared to a pure regression forest approach.

Short bio: Slobodan Ilic is senior research scientist working at TU Munich, Germany. Since February 2009 he is leading the Computer Vision Group of the CAMP Laboratorie at TUM. Form June 2006 he was a senior researcher at Deutsche Telekom Laboratories in Berlin. Before that he was a postdoctoral fellow for one year at Computer Vision Laboratory, EPFL, Switzerland, where he received his PhD in 2005. His research interests include: deformable surface modeling and tracking, 3D reconstruction, real-time object detection and tracking, object detection and classification in 3D data, image segmentation and human body pose estimation from depth and multiple cameras. Slobodan Ilic serves as a regular program committee member for all major computer vision conferences, such as CVPR, ICCV and ECCV as well as journals, such as TPAMI and IJCV. Besides active academic involvement Slobodan has strong relations to industry and supervises a number of PhD students supported by industry.

Paul Chun-Hao Huang, TU Munich

Title: Robust human shape and pose tracking.

Abstract: In this talk, we address the problem of marker-less human performance capture from multiple camera videos. We consider in particular the recovery of both shape and parametric motion information as often required in applications that produce and manipulate animated 3D contents using multiple videos. To this aim, we propose an approach that jointly estimates skeleton joint positions and surface deformations by fitting a reference surface model to 3D point reconstructions. The approach is based on a probabilistic deformable surface registration framework coupled with a bone binding energy. The former makes soft assignments between the model and the observations while the latter guides the skeleton fitting. The main benefit of this strategy lies in its ability to handle outliers and erroneous observations frequently present in multi-view data. For the same purpose, we also introduce a learning based method that partition the point cloud observations into different rigid body parts that further discriminate input data into classes in addition to reducing the complexity of the association between the model and the observations. We argue that such combination of a learning based matching and of a probabilistic fitting framework efficiently handle unreliable observations with fake geometries or missing data and hence, it reduces the need for tedious manual interventions.

Short bio: Chun-Hao Huang was born in Tainan, Taiwan, in 1985. He received the B.S. degree in electrical engineering from National Cheng Kung University (NCKU), Tainan, in 2008, and the M.S. degree in computer and communication engineering from NCKU in 2010. After military duty he worked in Research Center for Information Technology Innovation, Academia Sinica in 2011. As of Oct. 2012 he started his Ph.D. study in Technische Universität München. His research interests include human motion capture, 2D/3D conversion, and computer vision related issue.

Lionel Reveret, LJK-INRIA Grenoble Rhône-Alpes

Title: Motion models for markerless video tracking and physically-based animation.

Abstract:Recent results in computer vision allow to easily extract dense 3D information about surfaces in motion. These new source of data opens interesting perspectives for markerless motion analysis. In this talk, I will first present an overview of the works done at INRIA on such 3D data and how they are obtained from multiple videos. While these data are enough to stream visual appearance of a scene, they intrinsically lack of temporal coherence. Model tracking is required to provide time consistent trajectory of body features. To this end, a flexible model of shape deformation has been developed and applied for motion tracking of human motion data using manifold learning on Green coordinates. For neurophysiological studies, this model has been extended to track the complex body deformation involved in rodent motion. In the last part of the talk, I will present works on motion models based on physics laws: a simulation model using actuators from a reduced parametric space and an optimization-based model handling multiple contacts.

Short bio:Lionel Reveret is a research scientist at INRIA Grenoble, France. He obtained his PhD from Grenoble University, followed by a postdoctoral stay at GeorgiaTech. His first research works have focused on modeling and analysis of talking faces. He joined INRIA in 2002 to develop research works on 3D animation and motion analysis from video. He is particularly interested in motion analysis and physically-based animation of human and animal. He is also working on biomechanical aspect of rodent locomotion for the neurophysiological study of equilibrium.