Sentence-level morphological and phonological analyzer for Filipino (filSPAM)

Date of Publication

2011

Document Type

Bachelor's Thesis

Degree Name

Bachelor of Science in Computer Science

Subject Categories

Computer Sciences

College

College of Computer Studies

Department/Unit

Computer Science

Thesis Adviser

Shirley,Chu

Defense Panel Member

Allan Borra

Nathalie Rose Lim-Cheng

Abstract/Summary

Morphological analysis is an important process in natural language processing. It deals with the identification of a root word and its affixes (morphemes) from a morphed word. Phonology is another facet of morphology that has to do with how a word is voiced or sounded out. There are various approaches and systems that exist and are used in morphological analysis for generating rules for different languages such as MACTag. These differ in each of their methods in identification and classification of morphemes as well as handling ambiguity. Although there are systems which handle morphology for Filipino, most of these are limited in that they are only word-level and they do not cover rules for phonology. Part-of-Speech tagging is an integrated part in sentence analysis that is concerned with annotating the part-of-speech of a particular word in a sentence. There are existing tools for part-of-speech tagging such as HATPOST. These components, namely the morphological analyzer and part-of-speech tagger, function independently from one another. However, they have their own individual limitations that need to be addressed. The research constructs a sentence-level morphological and phonological analyzer for the Filipino language that integrate the aforementioned components in order to identify the part-of-speech of a Filipino word in the sentence and generate the root word and phonology of the identified words. filSPAM (Sentence-level Phonological and Morphological Analyzer for Filipino) analyzes a given Filipino sentence input and generate the corresponding part-of-speech, root word, and phonology of this sentence. The system has four modules: POS tagger which has 54% accuracy, the morphological analyzer which has 73.02% accuracy, the phonological analyzer is corpus-based and unknown handler which has two functions, the automaton and the generalized tree which has 67% accuracy and 64% respectively.

Abstract Format

html

Language

English

Format

Print

Accession Number

TU18447

Shelf Location

Archives, The Learning Commons, 12F, Henry Sy Sr. Hall

Physical Description

ix, 40, 15 leaves : illustrations ; 28 cm.

Keywords

Grammar, Comparative and general--Morphology; Natural language processing (Computer science)

This document is currently not available here.

Share

COinS