A bi-directional example-based English-Tagalog machine translation system
Date of Publication
2006
Document Type
Master's Thesis
Degree Name
Master of Science in Computer Science
College
College of Computer Studies
Department/Unit
Computer Science
Thesis Adviser
Rachel Edita O. Roxas
Defense Panel Chair
Allan B. Borra
Defense Panel Member
Rachel Edita O. Roxas
Raquel E. Sison Buban
Abstract/Summary
A bi-directional English-Tagalog machine translation system named Halo is created based on the example-based machine translation (EBMT) approach, wherein the translation is based primarily on knowledge obtained from analysis of parallel corpora. The system focused on the creation of a knowledge base for translation, requiring no linguistic knowledge prior to and during translation.
Halo is composed of two major phases, the knowledge extraction phase and the translation phase. From parallel corpora, databases of sentence pair examples are extracted. All the words that occurred in the stored sentence pairs are indexed with information on its frequency and position. A database structure for this purpose using the relational database concept was also developed. The Dice Coefficient formula is used to establish a relationship between words from two languages. The calculation is utilized to approximate the most probable translation of the words in the two languages. Algorithms on the following processes were developed: build-up of the correlation table (dictionary), input text segmentation, translation of the segments and the recombination of the translated segments to form the final translation for the whole input text.
The system was tested on subsets of parallel corpora from the 1987 Philippine Constitution and the novel Alchemist. A scoring algorithm is used to generate the two candidate translations with high scores (1.0 as the highest value). The candidate translation with the highest score is taken as the correct translation. For the Philippine Constitution test data, the average translation scores for both chunk and sentence levels from English to Tagalog is 0.85 and from Tagalog to English is 0.72. Using the Alchemist corpus, the average scores for English to Tagalog is 0.56 in the chunk level and 0.64 in the sentence level for the Tagalog to English the scores in the chunk and sentence levels are 0.63 and 0.62, respectively. The percentage of the segments or chunks translated correctly as determined manually based on the expected translation for selected input sentences is highest (66%) for the Tagalog to English translation using the Alchemist corpus while the English to Tagalog translation of the said corpus has the lowest percent correct translation (40%). For the 1987 Philippine Constitution, percent correct translation was evaluated.to be 59% and 41% for English to Tagalog and Tagalog to English, respectively.
The quality of translation depends heavily on the quality and nature of the corpus used. The Philippine Constitution test data had better translation scores since strict and proper translations are necessary for such a legal document. In contrast, the Alchemist test data produced low quality translations where most of the segments were not translated correctly because the sentences in the corpus were translated non-literally (or subjectively) since it is a literary document. In general, results show acceptable translations at the chunk level while translations of whole input text which are composed of several chunks tend to degenerate in thought because it is derived from different sentence examples.
Abstract Format
html
Language
English
Format
Accession Number
TG04081; CDTG004081
Shelf Location
Archives, The Learning Commons, 12F Henry Sy Sr. Hall
Physical Description
v, 78, 10 leaves ; 28 cm. + 1 computer optical disc + 1 cd supplement.
Keywords
Machine translating; English language--Machine translating; English language--Translating into Tagalog
Recommended Citation
Tolentino, R. (2006). A bi-directional example-based English-Tagalog machine translation system. Retrieved from https://animorepository.dlsu.edu.ph/etd_masteral/3401