Digital Repository

A Study on the Utility of Hierarchical Phrase-Based Model for Low Resource Languages.

Show simple item record

dc.contributor.author Shanmugarasa, Y.
dc.contributor.author Thayasivam, U.
dc.date.accessioned 2017-12-13T07:08:51Z
dc.date.available 2017-12-13T07:08:51Z
dc.date.issued 2017
dc.identifier.citation Shanmugarasa, Y. and Thayasivam, U. (2017). A Study on the Utility of Hierarchical Phrase-Based Model for Low Resource Languages. The Third International Conference on Linguistics in Sri Lanka, ICLSL 2017. Department of Linguistics, University of Kelaniya, Sri Lanka. p128. en_US
dc.identifier.uri http://repository.kln.ac.lk/handle/123456789/18539
dc.description.abstract With the rebellion of internet, people got more opportunities to go global. There is the issue of communication, which is made more challenging due to difference in languages. English is the generally spoken language and there is no assurance that everyone is proficient in it. Therefore, translation plays a major role. Currently, South Asian languages are dominantly translated using traditional statistical and neural machine translation approaches. South Asian languages lack necessary natural language resources and tools, hence are classified as low resourced languages. This limits the effectiveness achievable in machine translation of those languages. Compared to English language, South Asian languages are morphologically rich and are commonly used in different sentence structures. For example, the structure of a sentence is subject-verb-object in English while it is subject-object-verb in most South Asian languages. As official languages of Sri Lanka are low resourced, when it is used to translate using traditional statistical machine translation, it is impossible to produce sentences with acceptable sentence structure because of sub-phrases which can only be reordered using distortion reordering model, are independent of their context. In addition, using phrases longer than three words barely improves the translation because such phrases are infrequent in the corpora due to data sparsity. To overcome this problem hierarchical phrase model translation, which uses grammar rules formed by the Synchronous Context Free Grammar, can be used. Moses is selected to build the baseline system. In the experiments, the system used 50000 parallel sentences for Tamil and English. Using BLEU as a metric, the hierarchical phrase-based model achieves 3.42 for Tamil to English translation and 1.73 for vice-versa. This score improves 0.72 from traditional approach. For Sinhala to Tamil, it achieves 11.18 and 10.73 for vice-versa. Moreover, the system could further be improved by establishing certain rules. en_US
dc.language.iso en en_US
dc.publisher The Third International Conference on Linguistics in Sri Lanka, ICLSL 2017. Department of Linguistics, University of Kelaniya, Sri Lanka. en_US
dc.subject Hierarchical Model en_US
dc.subject Synchronous Context-Free Grammar (SCFG) en_US
dc.subject BLEU and Publication Unit en_US
dc.subject University of Kelaniya en_US
dc.subject Sri Lanka en_US
dc.title A Study on the Utility of Hierarchical Phrase-Based Model for Low Resource Languages. en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Repository


Browse

My Account