Date of Publication

2007

Document Type

Master's Thesis

Degree Name

Master of Science in Computer Science

College

College of Computer Studies

Department/Unit

Computer Science

Thesis Adviser

Ethel C. Ong

Defense Panel Chair

Rachel O. Roxas

Defense Panel Member

Allan B. Borra

Abstract/Summary

The research presents TExt-2, an extension of TExt Translation that includes learning and using similarity and difference templates. The system learns templates from sentence-aligned bilingual corpus. The extraction of the templates was based only on the surface form of the text. An alignment method was used to find correspondences between the source and target sentences. The combination of difference templates, similarity templates and Chunk Refinement increased the number of templates learned by 60 and the chunks learned by 43 compared to TExt Translation. When the corpus for training was translated, TExt-2 was able to translate 12 more sentences correctly compared to TExt Translation. TExt-2 also gained a WER that is lower by 0.21%, a CWP that is higher by 0.94%, and a BLEU that is higher by 0.0344 when the sentences for translation have patterns that were not encountered during training. When the corpus for translation was based from the patterns learned by an STTL system, the WER of TExt Translation was lower by 1.68%, the SER was lower by 13.33%, the CWP was higher by 1.13%, and the BLEU was higher by 0.0344. TExt Translation got better scores because the sentences to be translated would match the templates learned through STTL, which is the algorithm used in TExt Translation. However, in the manual evaluation of the same corpus, the scores of TExt Translation tied with TExt-2. This indicated that even if the systems produced different translations, the meanings of these translations are the same. To improve the performance of the system, a huge lexicon and a full-blown morphological analyzer could be added. This would eliminate the need to manually enter the words in the lexicon every time a new corpus is used. Semantic analysis could be added to improve the selection of templates and chunks. The DTC process could also be limited on what it can learn since it can sometimes learn templates that are difficult to match and use due to the generalization of common words.

Abstract Format

html

Language

English

Format

Electronic

Accession Number

TG04329; CDTG004329

Shelf Location

Archives, The Learning Commons, 12F Henry Sy Sr. Hall

Physical Description

v, 100 leaves ; 28 cm. + 1 computer optical disc.

Keywords

Translators (Computer programs); Template matching (Digital image processing); Machine learning; Information organization; Information retrieval

Upload Full Text

wf_yes

Share

COinS