Abstract:
This paper describes an architecture to convert
Sinhala Unicode text into phonemic specification of
pronunciation. The study was mainly focused on
disambiguating schwa-/\/ and /a/ vowel epenthesis for
consonants, which is one of the significant problems
found in Sinhala. This problem has been addressed by
formulating a set of rules. The proposed set of rules
was tested using 30,000 distinct words obtained from
a corpus and com-pared with the same words
manually transcribed to phonemes by an expert. The
Grapheme-to-Phoneme (G2P) con-version model
achieves 98 % accuracy.