Digital Repository

A tool for automatic segmentation of a given Sinhala text into Syllables for Speech synthesis and Speech recognition

Show simple item record

dc.contributor.author Kumara, K.H.
dc.contributor.author Dias, N.G.J.
dc.contributor.author Sirisena, H.
dc.date.accessioned 2015-05-11T04:09:35Z
dc.date.available 2015-05-11T04:09:35Z
dc.date.issued 2006
dc.identifier.citation Kumara, K.H., Dias, N.G.J. and Sirisena, H., 2006. A tool for automatic segmentation of a given Sinhala text into Syllables for Speech synthesis and Speech recognition, Proceedings of the Annual Research Symposium 2006, Faculty of Graduate Studies, University of Kelaniya, pp 67-68. en_US
dc.identifier.uri
dc.identifier.uri http://repository.kln.ac.lk/handle/123456789/7354
dc.description.abstract In the present era of human computer interaction, the educationally under privileged and the rural communities of Sri Lanka are being deprived of technologies that pervade the growing interconnected web of computers and communications. One good solution for this problem would be computers talking to the common man in the language he is comfortable to communicate in. Sri Lankan population has a significant percentage of people who are educationally under-privileged. On one hand we claim that to build an EGovernment or an E-Society in Sri Lanka on the other hand, the advances we make are totally inaccessible by a large number of people in Sri Lanka. Under such circumstances, we cannot expect rural/educationally under-privileged people to use computers and IT products unless we remove the need of being literate, which exists as a barrier between them and computers. However, the interaction between the computer and the user is largely through keyboard and screen-oriented systems. In the current Sri Lankan context, this restricts the usage of computers to a miniscule fraction of the population, who are both computer-literate and conversant with written English. In order to enable a wider proportion of population to benefit from Information technology, there is a dire need for an interface other than keyboard and screen-interface that is widely in use at present. Speech technologies promise to be the next generation user interface. Software applications having speech and voice recognition abilities have a better chance to communicate with a large percentage of population which include educationally underprivileged, visually challenged and computer illiterates, if these applications can speak and understand the native language. It is well known that the transcription of orthographic words into syllables is one of the principal steps of a syllable based Speech synthesis and Speech recognition. Hence we put forward a dictionary based automatic syllabification tool for Speech Synthesis and Automatic Speech Recognition in Sinhala language. Also it is capable to provide the frequency distributions of Vowels, Consonants and Syllables of given Sinhala text. Although there is no universal agreement for syllable definition, in this research our syllable definition can be considered as Cn 0 V n 1 Cn 0 where Cn 0 signifies 0 to n consonants and V n 1 signifies 1 to n vowels. In this tool, detection of Syllable boundaries for a given Sinhala sentence is achieved by four main phases: (1) Reformat everything encountered (e.g. digits, abbreviations) into words and punctuation.(2) Derive a phonemic representation for each word. (3) Determine the C n 0 V n 1 units for a given word. (4) Reformat above Cn 0 V n 1 units according to the Cn 0 V n 1 Cn 0 definition in order to obtain the syllable boundaries. Following example will give a better explanation of the algorithm. en_US
dc.language.iso en en_US
dc.publisher University of Kelaniya en_US
dc.title A tool for automatic segmentation of a given Sinhala text into Syllables for Speech synthesis and Speech recognition en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Repository


Browse

My Account