MBROLA Formatted Diphone Database for Sinhala Language

Kumara, K.H.; Dias, N.G.J.; Wickramasinghe, R.l.P.

MBROLA Formatted Diphone Database for Sinhala Language

Kumara, K.H.; Dias, N.G.J.; Wickramasinghe, R.l.P.

URI:
http://repository.kln.ac.lk/handle/123456789/7461

Citation: Kumara, K.H., Dias, N.G.J. and Wickramasinghe, R.l.P., 2007. MBROLA Formatted Diphone Database for Sinhala Language, Proceedings of the Annual Research Symposium 2007, Faculty of Graduate Studies, University of Kelaniya, pp 137.

Date: 2007

Abstract:

Diphone synthesis is one of the most popular methods used for creating a synthetic voice from recordings or samples of a particular person. Diphones are speech units that begin in the middle of the stable state of a phone and end in the middle of the following phone. The main interest in diphone synthesis is that they minimize the concatenation problems. The aim of the MBROLA project, recently initiated by the Facult' e Polytechnique de Mons (Belgium), is to obtain a set of speech synthesizers for as many voices, languages and dialects as possible, free of use for non-commercial and non-military applications. Central to the MBROLA project is MBROLA 2.00, a speech synthesizer based on the concatenation of diphones, takes a list of allophones associated with prosodic information as input and outputs 16 bit linear speech samples. Diphone databases tailored to the MBROLA format are necessary to run the synthesizer. Therefore we put forward a Diphone database, tailored to the MBROLA format, to generate synthetic voice for Sinhala language through MBROLA .pho reader. The first step of building the diphone database was the fixing a list of all the phones (acoustic instances of phonemes) of Sinhala language. Creating the diphone database was achieved in three steps: Creating a text corpus, Recording the corpus and Segmenting the speech corpus. For the text corpus, we used few selected chapters of two Sinhala novels. The corpus was then red by two (Male and Female) native Sinhala speakers, digitally recorded and stored. Then all diphones were spotted manually with the help of Speech Viewer ofCSLU toolkit which was developed by the Center for Spoken Language Understanding, Oregon Graduate Institute of Science and Technology, USA. A diphone database was finally created with 1004 diphone segments, which summarizes the results in the form of: the name of diphoncs, the related waveforms, their duration, and internal sub-splittings. Since we did not consider allophone variations in all instances, it may reduce the naturalness of the resulting synthetic speech. It is also possible that the number of diphone segments may higher than the above number (1004). However, most of the common occurrences of diphones were included in the database that we have developed. 137

Show full item record