Date of Publication

12-2005

Document Type

Master's Thesis

Degree Name

Master of Science in Computer Science

Subject Categories

Computer Sciences

College

College of Computer Studies

Department/Unit

Computer Science

Thesis Adviser

Rachel Editha O. Roxas

Defense Panel Chair

Allan B. Borra

Defense Panel Member

Charibeth K. Cheng
Ethel C. Ong

Abstract/Summary

Selecting the right word translation among several options in the lexicon is a core problem for machine translation. It is not enough that a word in context is translated, but an appropriate translation must be considered. An automated approach is presented here for resolving target word selection, based on word-to-sense and sense-to-word relationship between source words and its translations, utilizing syntactic relationships (subject-verb, verb-object, adjective noun). Translation selection proceeds from sense disambiguation of source words based on knowledge from a bilingual dictionary and word similarity measures from WordNet, and then selection of target a word using statistics from a target language corpus. The system was tested on 145,746 word pairs in syntactic relationships that were extracted from target corpora gathered from various online editorials, Tagalog readings and Tagalog New Testament with a total of 317,113 words. Sense profile, with 2681 entries for source words was built from an existing bilingual dictionary that includes clues for disambiguation and target translations. A test on 200 sentences with ambiguous words (average of 4 senses) in three categories: nouns, verbs and adjectives, produced an overall result of 63.89% accuracy for selecting word translation with a standardized precision of at least 80% for generating expected translations for different categories: nouns, verbs, adjectives. An addition of reliable clues for sense disambiguation, as well as application of some smoothing techniques can further improve overall performance of the method. The words produced by the system are root words. The system can further be improved with the integration of morphological generation into a machine translation system to produce even more fluent translations. In addition, the method developed in here can be extended to accommodate translation of other content words as well as other syntactic categories. Furthermore, the method presented here can be improved to support bidirectional translation (Tagalog to English).

Abstract Format

html

Language

English

Format

Electronic

Accession Number

CDTG003938

Shelf Location

Archives, The Learning Commons, 12F Henry Sy Sr. Hall

Physical Description

xi, 268 leaves, 1 computer optical disc ; 4 3/4 in.

Keywords

Machine translating; Information theory

Upload Full Text

wf_yes

Share

COinS