A stemming algorithm for Tagalog words

Date of Publication

2003

Document Type

Master's Thesis

Degree Name

Master of Science in Computer Science

College

College of Computer Studies

Department/Unit

Computer Science

Thesis Adviser

Rachel Editha O. Roxas

Abstract/Summary

Tag-SA, a Tagalog Stemming Algorithm, was developed for all forms of Tagalog words. It can be used specifically for morphological analysis to derive root words. In addition, it can also be applied to information retrieval (IR) to conflate different word forms to a common canonical form. It uses the principle of iterative affix removal and is context sensitive. The system was tested and evaluated based on error counting using 6,382 words variants derived from three sources (duplicates included). The resulting understemming error of less than 15 % and overstemming error of less than 0.005 % indicate a good performance of TagSA.

Abstract Format

html

Language

English

Format

Print

Accession Number

TG03581; CDTG003581

Shelf Location

Archives, The Learning Commons, 12F Henry Sy Sr. Hall

Physical Description

ix, 103 leaves ; 28 cm.

Keywords

Computer algorithms; Tagalog language; Word processing

This document is currently not available here.

Share

COinS