Parameters for talking head animation

In my Ph.D. works at the ICP, I developed a phonetically-oriented coding of the lip gestures. The 3D modeling of the lips separates geometric modeling and articulatory modeling. During my postdoc at ICP, this approach has been extended to a more complete 3D articulatory model of the face.

GEOMETRIC MODELING OF THE LIPS

Firstly, a geometric modeling allows to create a 3D lip model from two views of any speaker's lips. The model is defined by the 3D location of 30 control points. It can be adapted to any speaker and any shape. The whole 3D surface of the lips is represented as a parametric surface which interpolates the control points by means of cubic splines. This model has been the starting point of a wider face modeling done by Takaaki Kuratate at ATR-HIP.

The 3D geometric lip model based on 30 control points interpolation, and a smooth shading rendering.

ARTICULATORY MODELING OF THE LIPS

Secondly, the lip motion of a speaker is learned from the geometric modeling of 10 selected key shapes. The choice of the key shapes is guided by general phonetic observations in order to cover the articulatory space of the speaker. A statistical analysis of the key shapes gives an articulatory 3D lip model of the speaker controlled by only three parameters :

1. lip rounding, which separates rounded vowels and spread vowels,
2. lower lip motion, mainly correlated with jaw opening,
3. upper lip motion, to perform full closure for stop consonnants.

ARTICULATORY MODELING OF THE FACE

In the project MOTHER the previous approach used for lips has been extended to a 3D model of a talking face. The 3D model is learned for a different speaker with a larger training set of 34 key shapes. This model is controled by 6 articulatory parameters which separate the influence on the face of the jaw motion and the lip muscles :

1. jaw opening,
2. jaw advance,
3. lips rounding,
4. lips closure,
5. lips raising (for fricatives such as /f/ and /v/),
6. glottal height.

The following figures show the 3D model superimposed on the speaker image, the 3D wireframe and a rendering by texture mapping.

This work has been supported by France Telecom Multimedia, with the collaboration of G.Bailly, P.Badin and P.Borel at the ICP.

L. Reveret, C. Benoit
A New 3D Lip Model for Analysis and Synthesis of Lip Motion in Speech Production (PS.gz | PDF)
Proc. of the Second ESCA Workshop on Audio-Visual Speech Processing, AVSP'98, Terrigal, Australia, Dec. 4-6, 1998.

L. Reveret
Desgin and evaluation of a video tracking system of lip motion in speech production (PS.gz | PDF)
PhD dissertation, INPG, Grenoble, France, June 1999.

L. Reveret, G. Bailly, P. Badin
MOTHER: A new generation of talking heads providing a flexible articulatory control for video-realistic speech animation (PS.gz | PDF)
Proc. of the 6th Int. Conference of Spoken Language Processing, ICSLP'2000, Beijing, China, Oct. 16-20, 2000.