Integrating CCG Supertags with English-Arabic Factored Machine Translation
Hamdi Ahmed Rajeh Ali
This thesis displays how SMT is applied with Arabic-English language pair, and how we created factored and integrated approaches able to deliver high-quality translation output. We display the development by using medium and big size of data for Arabic-English language pair. During this research, we attempt to handle the incorporation of this information into the target language of the translation process, where we introduced the Phrase-Based SMT as a Baseline model in the experiments. We added POS tags to the target of the PBSM model in order to create a POS-model. Then, in later experiments, we included the supertags on the target side of the PBSMT model to create the CCG model. Finally, unlike previous works involving Arabic-English translations, the PBSMT model was injected by both POS-tags and CCG-supertags to create a model called the Integrated Model. we present various experiments to compare four models (Phrase-Based model, POS-model, CCG-model and Integrated model) for the Arabic-English translation.
As yet another contribution of the research, we show the impact of word segmentation on CCG-based SMT.This research presents a comparative study of two approaches to statistical machine translation (SMT). We have presented a study on Factored Machine Translation for the Arabic–English pair of languages, using the training, tuning and test data of the multi UN domain. Our experiments that utilize POS tags, CCG supertags and segmentation of Arabic sentences displayed a considerable progress in terms of the BLEU score.As shown, the results presented considerable improvement in translating the segmented Arabic rather than the non-segmented into English language.In the third contribution in this thesis, we addressed the problems of sparsity data and ambiguity; we report that using transliteration is better to gain improvement in the performance than using standard one when we apply in the source side of the corpora.
Hamdi Ahmed Rajeh Ali,Zhiyong Li,Abdullah Mohammed Ayedh.A Novel Approach by Injecting CCG Supertags into an Arabic–English Factored Translation Machine. DOI: 10.1007/s13369-016-2075-9.(SCI indexed).
Hamdi Ahmed Rajeh Ali, Zhi-yong LI ,Al-Ghaili Mohammed.The Impact of Word Segmentation on CCG-based Arabic-English SMT. DOI: 10.12783/dtcse/aita2017/16013.(Ei indexed).
版权所有©湖南大学2017 湖南大学党委宣传部 地址：湖南省长沙市岳麓区麓山南路麓山门 邮编：410082 Email：firstname.lastname@example.org 域名备案信息：[www.hnu.edu.cn,www.hnu.cn/湘ICP备05000239号] [hnu.cn 湘教QS3-200503-000481 hnu.edu.cn 湘教QS4-201312-010059]