A stemming algorithm for Tagalog words
Date of Publication
2003
Document Type
Master's Thesis
Degree Name
Master of Science in Computer Science
College
College of Computer Studies
Department/Unit
Computer Science
Thesis Adviser
Rachel Editha O. Roxas
Abstract/Summary
Tag-SA, a Tagalog Stemming Algorithm, was developed for all forms of Tagalog words. It can be used specifically for morphological analysis to derive root words. In addition, it can also be applied to information retrieval (IR) to conflate different word forms to a common canonical form. It uses the principle of iterative affix removal and is context sensitive. The system was tested and evaluated based on error counting using 6,382 words variants derived from three sources (duplicates included). The resulting understemming error of less than 15 % and overstemming error of less than 0.005 % indicate a good performance of TagSA.
Abstract Format
html
Language
English
Format
Accession Number
TG03581; CDTG003581
Shelf Location
Archives, The Learning Commons, 12F Henry Sy Sr. Hall
Physical Description
ix, 103 leaves ; 28 cm.
Keywords
Computer algorithms; Tagalog language; Word processing
Recommended Citation
Bonus, D. J. (2003). A stemming algorithm for Tagalog words. Retrieved from https://animorepository.dlsu.edu.ph/etd_masteral/3111