摘 要
语音库的制作和分发应该是一个系统工程,每一步都应该遵从特定的规范,以便数据交换。本文按照数据库制作规范,介绍863四大方言普通话语音语料库(包括上海、广州、重庆和厦门普通话)--RASC863 (863 annotated 4 regional accent speech corpus)。RASC863包括自然口语部分、朗读部分(语音平衡和常用口语句)和方言词汇。口语设立了160个话题,由发音人任意选择一个,然后讲述相关内容4-5分钟。朗读语料是挑选的语音平衡的句子共2200句和600个常用口语句。每个方言点的发音人为200人,共800人。覆盖不同年龄、性别和教育背景。
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none;border:none;mso-border-bottom-alt:solid windowtext .75pt; padding:0cm;mso-padding-alt:0cm 0cm 1.0pt 0cm'>
normal style='text-autospace:none'>
2.语音库制作过程和一般规范
normal align=left style='text-align:left;text-indent:18.0pt;
mso-char-indent-count:2.0'>具体涉及的制作规范和含义如表1所示。
normal align=center style='margin-left:9.0pt;text-align:center'>表1: 制作语音语料库的一般规范
normal style='text-indent:21.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal align=center style='text-align:center;text-indent:18.0pt; mso-char-indent-count:2.0;text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal align=center style='text-align:center;text-indent:18.0pt; mso-char-indent-count:2.0;text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:22.1pt;mso-char-indent-count:2.0;
text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0;
text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal align=left style='text-align:left;text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0;
text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='margin-left:18.0pt;text-indent:-18.0pt;mso-list:l16 level1 lfo29; tab-stops:list 18.0pt;text-autospace:none'>
normal style='margin-left:18.0pt;text-indent:-18.0pt;mso-list:l16 level1 lfo29; tab-stops:list 18.0pt;text-autospace:none'>
normal style='margin-left:18.0pt;text-indent:-18.0pt;mso-list:l16 level1 lfo29; tab-stops:list 18.0pt;text-autospace:none'>
normal style='margin-left:18.0pt;text-indent:-18.0pt;mso-list:l16 level1 lfo29; tab-stops:list 18.0pt;text-autospace:none'>
normal style='margin-left:18.0pt;text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='text-indent:18.0pt;mso-char-indent-count:2.0; text-autospace:none'>
normal style='margin-left:10.75pt;mso-para-margin-left:-1.0gd;
text-indent:-10.75pt;mso-text-indent-alt:-21.25pt'>
祖漪清,1998,连续语音数据库设计的科学性问题,《语音研究报告》,中国社会科学院。
normal style='margin-left:10.75pt;mso-para-margin-left:-1.0gd;
text-indent:-10.75pt;mso-text-indent-alt:-21.25pt'>
[2] 王天庆,李爱军,“连续汉语语音识别语料库的设计”,第六届全国现代语音学学术会议,天津,2003年10月。
normal style='margin-left:10.75pt;mso-para-margin-left:-1.0gd;
text-indent:-10.75pt;mso-text-indent-alt:-21.25pt'>
[3] , , “production and Validation of Speech , , 2003.
normal style='margin-left:18.0pt;text-indent:-18.0pt;mso-char-indent-count:
-2.0;text-autospace:none'>[4] Aijun Li, Chinese
Prosody and Prosodic Labeling of Spontaneous Speech, Speech Prosody 2002,
normal style='margin-left:18.0pt;text-indent:-18.0pt;mso-char-indent-count:
-2.0;text-autospace:none'>[5] Xiaoxia Chen, Aijun Li, et. al. Application of SAMPA-C in SC,
normal style='margin-left:18.0pt;text-indent:-18.0pt;mso-char-indent-count:
-2.0;text-autospace:none'>[6] 李荣主编,汉语方言词典系列,上海、广州、贵州和厦门方言方言辞典,中国社会科学院语言所。
normal style='margin-left:13.5pt;text-indent:-13.5pt;mso-char-indent-count:
-1.5'>[7] Aijun Li
Wang, A Contrastive Investigation of Standard Mandarin and Accented Mandarin,
Eurospeech2003.
normal style='margin-left:22.5pt;text-indent:-22.5pt;mso-char-indent-count: -2.5'>
normal style='margin-left:18.0pt;text-indent:-18.0pt;mso-char-indent-count: -2.0'>
normal style='margin-left:18.0pt;text-indent:-18.0pt;mso-char-indent-count: -2.0;text-autospace:none'>[10] 陈娟文、
normal style='margin-left:18.0pt;text-indent:-18.0pt;mso-char-indent-count:
-2.0;text-autospace:none'>[11] 陈娟文、