dc.description.abstract |
Since the conventional user interfaces such as keyboard and monitors restrict the usage of computers, there is a dire need for an interface other than keyboard and screen-interface that is widely in use at present. Speech technologies promise to be the next generation user interfaces. In general, two technologies for processing speech are needed. One is speech recognition, and the other is speech synthesis. Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software and/or hardware. Text-to-Speech (TTS) is one of the speech synthesis technologies. TTS can be defined as “the production of speech by machines, by way of the automatic phonetization of the sentences to utter”. Before a synthesizer can produce an utterance, several steps have to be completed. First, the right segments/units have to be selected. The units usually used are diphones, half-syllables, and triphones etc. Many synthesizers use diphones as their basic units of concatenation. A diphone is the transition between two speech sounds, obtained from natural speech. Creating a diphone database, which contains all the sound transitions in the target language, is critical in diphone TTS synthesis. |
en_US |