Date of Publication
2007
Document Type
Master's Thesis
Degree Name
Master of Science in Computer Science
College
College of Computer Studies
Department/Unit
Computer Science
Thesis Adviser
Ethel C. Ong
Defense Panel Chair
Rachel O. Roxas
Defense Panel Member
Allan B. Borra
Abstract/Summary
The research presents TExt-2, an extension of TExt Translation that includes learning and using similarity and difference templates. The system learns templates from sentence-aligned bilingual corpus. The extraction of the templates was based only on the surface form of the text. An alignment method was used to find correspondences between the source and target sentences. The combination of difference templates, similarity templates and Chunk Refinement increased the number of templates learned by 60 and the chunks learned by 43 compared to TExt Translation. When the corpus for training was translated, TExt-2 was able to translate 12 more sentences correctly compared to TExt Translation. TExt-2 also gained a WER that is lower by 0.21%, a CWP that is higher by 0.94%, and a BLEU that is higher by 0.0344 when the sentences for translation have patterns that were not encountered during training. When the corpus for translation was based from the patterns learned by an STTL system, the WER of TExt Translation was lower by 1.68%, the SER was lower by 13.33%, the CWP was higher by 1.13%, and the BLEU was higher by 0.0344. TExt Translation got better scores because the sentences to be translated would match the templates learned through STTL, which is the algorithm used in TExt Translation. However, in the manual evaluation of the same corpus, the scores of TExt Translation tied with TExt-2. This indicated that even if the systems produced different translations, the meanings of these translations are the same. To improve the performance of the system, a huge lexicon and a full-blown morphological analyzer could be added. This would eliminate the need to manually enter the words in the lexicon every time a new corpus is used. Semantic analysis could be added to improve the selection of templates and chunks. The DTC process could also be limited on what it can learn since it can sometimes learn templates that are difficult to match and use due to the generalization of common words.
Abstract Format
html
Language
English
Format
Electronic
Accession Number
TG04329; CDTG004329
Shelf Location
Archives, The Learning Commons, 12F Henry Sy Sr. Hall
Physical Description
v, 100 leaves ; 28 cm. + 1 computer optical disc.
Keywords
Translators (Computer programs); Template matching (Digital image processing); Machine learning; Information organization; Information retrieval
Upload Full Text
wf_yes
Recommended Citation
Nuñez, V. D. (2007). Combining similarity and difference templates for a bidirectional example-based machine translation. Retrieved from https://animorepository.dlsu.edu.ph/etd_masteral/3524