A Tagalog morphological analyzer using example-based approach

Date of Publication

2006

Document Type

Master's Thesis

Degree Name

Master of Science in Computer Science

College

College of Computer Studies

Department/Unit

Computer Science

Thesis Adviser

Charibeth K. Cheng

Defense Panel Chair

Allan B. Borra

Defense Panel Member

Rachel Roxas
Michelle Wendy Tan

Abstract/Summary

Example-based MA approaches learn a languages morphology from a set of examples. Researches in this area have been developed to address the time consuming and costly development of rule-based MAs. But most researches in this area are centered on concatenative morphology and little work has been done for non-concatenative morphology due to its complexity. Tagalog is an example of a language that exhibits non-concatenative morphology. Some works on example-based MA that has been able to handle such morphologies incorrectly models the morphological phenomena of infixation and reduplication. An example-based MA that learns string rewrite rules from a word pair was developed to handle the different morphological phenomena in Tagalog, namely prefixation, infixation, suffixation, cirumfixation, internal vowel changes, and partial and whole word reduplication, and its morphotactic rules. The model was evaluated against a Filipino lexicon because the language is composed mainly Tagalog words, adapts Tagalog morphology and is a language commonly used in the Philippines. The model was tested using ten-fold cross validation with 40,272 word pairs. The model developed performs better with words exhibiting infixation and reduplication and has an accuracy of 90% for both derivational and inflectional morphology from an original performance of 88% using the original model. The analysis time on the other hand increased from 11 minutes using the original model to 35 minutes using the developed model. The developed model can be used to discover affixes and its associated morphological categories for other languages that exhibit the same morphological phenomena. The current limitation of the model is that it is unable to properly model agglutination and a solution considering syllabication and phonology for word alignment is recommended to further improve its performance.

Abstract Format

html

Language

English

Format

Electronic

Accession Number

CDTG004073

Shelf Location

Archives, The Learning Commons, 12F Henry Sy Sr. Hall

Physical Description

1 computer optical disc ; 4 3/4 in.

Keywords

Grammar; Comparative and general--Morphology

This document is currently not available here.

Share

COinS