一种基于三维模型和照片的合成“说话头”

赖伟; 孙岭; 王仁华

发布时间：
摘要点击次数： 3075
全文下载次数： 358
DOI: 10.11834/jig.200407167
2004 | Volume 9 | Number 7

一种基于三维模型和照片的合成“说话头”

赖伟¹, 孙岭¹, 王仁华¹(中国科学技术大学电子工程与信息科学系，合肥 230027)

摘要

视觉语音的研究已经成为人机交互技术中一个非常活跃的领域，在语音的相关视觉信息当中，最主要的是说话人的口型乃至整个头部的图像，即“说话头”(talking head)。为了合成具有真实感的三维“说话头”模型，提出了一种基于三维模型和真人照片来合成真实“说话头”的方法，即在一个中性的三维人头部模型的基础上，从任意人的正面和侧面两张照片当中，通过提取脸形和五官位置等特征参数来校正模型，并且从照片中提取皮肤和头发等纹理，使得合成的模型能在较大程度上贴近真人。该方法综合了基于三维模型和基于图像库的建模方法，因此同时具有两者的优点，即既能够灵活控制表情和口型，又可自由旋转，不仅可实时合成，而且合成效果接近真人，自然度高。已将此模型应用于视觉语音合成系统，并获得了满意的效果。

关键词

说话头视觉语音合成三维模型人脸动画人机交互

A Talking Head Synthesis System Based on 3D-Model and Photo

()

Abstract

Recently, research onVisual Speech attractsmore and more attention. It has become a very active research field of the Human-Machine Interface. The chief information relative to speech is lip motion, face, and even the whole head, which is called “Talking Head”. To synthesis a lifelike three-dimension (3D) talking head model, a novel method is proposed in this paper, which is based on an individual independent 3D-model and photos of human face. At first, the features of face shape and the position of facial organs are extracted from a front-face and a side-face phototo revise the 3D-model and make it adaptthe real person. Then, the textures of the skin and hair are picked from the photos and pasted on the revised 3D-model to make it looks like the person. This method integrates the techniques of 3D-model based modeling and photo lib based modeling, and has both of their advantages: the model has strong flexibility of synthesizing lip motions and expressions, can be rotated freely, can be synthesized in real-time, and can achieve a highly natural, lifelike 3D talking head visual effect. Then, the model is applied in a visualText-to-Speech (TTS) talking head synthesis system, and gets a satisfying result.

Keywords

talking head visual text-to-speech 3D model face animation

在线采编平台

论文出版

年度会议

下载中心

年度信息