Back to Home Page
Parameters for talking head animation
In my Ph.D. works at the ICP, I developed a phonetically-oriented coding of the lip gestures.
The 3D modeling of the lips separates geometric modeling
and articulatory modeling. During my postdoc at ICP, this approach has been extended to a more complete 3D articulatory model of the face.
GEOMETRIC MODELING OF THE LIPS
Firstly, a geometric modeling allows to create a 3D lip model from two
views of any speaker's lips. The model is defined by the 3D location of
30 control points. It can be adapted to any speaker and any shape. The
whole 3D surface of the lips is represented as a parametric surface which
interpolates the control points by means of cubic splines. This model has
been the starting point of a wider face modeling done by
Takaaki Kuratate at ATR-HIP.
The 3D geometric lip model based on 30 control points interpolation,
and a smooth shading rendering.
ARTICULATORY MODELING OF THE LIPS
Secondly, the lip motion of a speaker is learned from the geometric modeling
of 10 selected key shapes. The choice of the key shapes is guided by general
phonetic observations in order to cover the articulatory space of the speaker.
A statistical analysis of the key shapes gives an articulatory 3D lip model
of the speaker controlled by only three parameters :
1. lip rounding, which separates rounded vowels and spread
vowels,
2. lower lip motion, mainly correlated with jaw opening,
3. upper lip motion, to perform full closure for stop consonnants.
ARTICULATORY MODELING OF THE FACE
In the project MOTHER
the previous approach used for lips has been extended to a 3D model
of a talking face. The 3D model is learned for a different speaker with
a larger training set of 34 key shapes. This model is controled by 6 articulatory
parameters which separate the influence on the face of the jaw motion and
the lip muscles :
1. jaw opening,
2. jaw advance,
3. lips rounding,
4. lips closure,
5. lips raising (for fricatives such as /f/ and /v/),
6. glottal height.
The following figures show the 3D model superimposed on the speaker image,
the 3D wireframe and a rendering by texture mapping.
This work has been supported by France Telecom Multimedia, with the collaboration
of G.Bailly, P.Badin and P.Borel at the ICP.
Related articles
L. Reveret, C. Benoit
A New 3D Lip Model for Analysis and Synthesis of Lip Motion in Speech Production (PS.gz | PDF)
Proc. of the Second ESCA Workshop on Audio-Visual Speech Processing, AVSP'98, Terrigal, Australia, Dec. 4-6, 1998.
L. Reveret
Desgin and evaluation of a video tracking system of lip motion in speech production (PS.gz | PDF)
PhD dissertation, INPG, Grenoble, France, June 1999.
L. Reveret, G. Bailly, P. Badin
MOTHER: A new generation of talking heads providing a flexible articulatory control for video-realistic speech animation (PS.gz | PDF)
Proc. of the 6th Int. Conference of Spoken Language Processing, ICSLP'2000, Beijing, China, Oct. 16-20, 2000.