答辩公告
我的位置在: 首页 > 答辩公告 > 正文
Hamdi Ahmed Rajeh Ali答辩公告
浏览次数:日期:2018-05-30编辑:研究生教务办1

                       答辩公告

论文题目

Integrating CCG Supertags with English-Arabic Factored Machine Translation

答辩人

Hamdi Ahmed Rajeh Ali

指导教师

李智勇

答辩委员会

主席

罗娟教授

学科专业

计算机科学与技术

学院

信息科学与工程学院

答辩地点

信息科学与工程学院基地317

答辩时间

2018年6月1日

下午2:30

学位论文简介

This thesis displays how SMT is applied with Arabic-English language pair, and how we created factored and integrated approaches able to deliver high-quality translation output. We display the development by using medium and big size of data for Arabic-English language pair. During this research, we attempt to handle the incorporation of this information into the target language of the translation process, where we introduced the Phrase-Based SMT as a Baseline model in the experiments. We added POS tags to the target of the PBSM model in order to create a POS-model. Then, in later experiments, we included the supertags on the target side of the PBSMT model to create the CCG model. Finally, unlike previous works involving Arabic-English translations, the PBSMT model was injected by both POS-tags and CCG-supertags to create a model called the Integrated Model. we present various experiments to compare four models (Phrase-Based model, POS-model, CCG-model and Integrated model) for the Arabic-English translation.

As yet another contribution of the research, we show the impact of word segmentation on CCG-based SMT.This research presents a comparative study of two approaches to statistical machine translation (SMT). We have presented a study on Factored Machine Translation for the Arabic–English pair of languages, using the training, tuning and test data of the multi UN domain. Our experiments that utilize POS tags, CCG supertags and segmentation of Arabic sentences displayed a considerable progress in terms of the BLEU score.As shown, the results presented considerable improvement in translating the segmented Arabic rather than the non-segmented into English language.In the third contribution in this thesis, we addressed the problems of sparsity data and ambiguity; we report that using transliteration is better to gain improvement in the performance than using standard one when we apply in the source side of the corpora.

主要学术成果

[1]Hamdi Ahmed Rajeh Ali,Zhiyong Li,Abdullah Mohammed Ayedh.A Novel Approach by Injecting CCG Supertags into an Arabic–English Factored Translation Machine. DOI: 10.1007/s13369-016-2075-9.(SCI indexed).

[2]Hamdi Ahmed Rajeh Ali, Zhi-yong LI ,Al-Ghaili Mohammed.The Impact of Word Segmentation on CCG-based Arabic-English SMT. DOI: 10.12783/dtcse/aita2017/16013.(Ei indexed).