Text translation: Template extraction for a bidiretional english-filipino example-based machine translation

Date of Publication

2006

Document Type

Bachelor's Thesis

Degree Name

Bachelor of Science in Computer Science

Subject Categories

Computer Sciences

College

College of Computer Studies

Department/Unit

Computer Science

Thesis Adviser

Ethel C. Ong

Defense Panel Member

Ethel Ong

Allan Borra

Rachel Roxas

Abstract/Summary

A bidirectional English-Filipino Example-based Machine Translation System that learns and uses templates is presented. The system uses machine learning techniques to initially extract templates from a given bilingual corpus. These templates are subsequently used for translating English input text into Filipino and vise versa. The system implements the similarity template learning algorithm performed by (Cicekli et. al, 2001) but goes further by introducing template refinement and derivation of templates from chunks learned. To improve translation quality, new chunk alignment and splitting algorithms are introduced into the training process while a flexible template and chunk matching scheme is establish for translation. Test results verify that a strict chunk alignment scheme in training is needed and that specific words such as commonly occurring words need to be filtered out to produce better templates, thereby improving overall quality by assuring complete template and chunk correctness in training and reducing word and sentence error rates by as much as half in translation. Tests also show that the translation with the highest score selected from various candidates is consistently the best choice as checked against automotive evaluation methods. Still, much of the system implementation is limited by the quality and coverage of the lexicon and morphological references which are patterned after those of TWiRL's a rule-based machine translator. This research is part of a three-year project on hybrid machine translation that is funded by the Philippine Council for Advanced Science and Technology Research and Development of the Department of Science and Technology (DOST-PCASTRD).

Abstract Format

html

Language

English

Format

Print

Accession Number

TU14567

Shelf Location

Archives, The Learning Commons, 12F, Henry Sy Sr. Hall

This document is currently not available here.

Share

COinS