A corpus based-Filipino grammar checker using hybrid N-gram rules from grammatically-correct terms
Date of Publication
2016
Document Type
Master's Thesis
Degree Name
Master of Science in Computer Science
College
College of Computer Studies
Department/Unit
Computer Science
Thesis Adviser
Allan B. Borra
Defense Panel Member
Maribeth Cheng
Allan B. Borra
Nathalie Rose Lim-Cheng
Abstract/Summary
This study examines the use of a corpus-based approach as a method for detecting grammatical errors and suggesting corrections for the Filipino language. Prior to this study, the said approach has not yet been applied for the target language, while it showed a high potential in error detection and correction in other languages. Currently, Filipino grammar checker systems are limited and are mostly rule-based systems. One huge concern with this existing type of systems in Filipino is that it can only detect errors that were denied by the system which results to a very limited set of error types. The proposed approach, being corpus-based, learns grammar rules from a grammatically-correct and tagged corpus which will be used in detecting errors and providing suggestions. The grammar rules, which are hybrid n-grams, will be composed of words, part-of-speech tags, and lemmas. Input sentences will be compared against these grammar rules and identify whether there is an error or not using a weighted Levenshtein edit distance algorithm. Using this approach, the correction types can be suggested: insertion, deletion, substitution, merging, and unmerging. The approach also covers a broad range of error types such as: incorrect a x, misspellings, wrong word usage, missing word, unnecessary words, incorrectly merged words, and incorrectly unmerged words. The developed system has scored 64.11% in producing correct suggestions for 248 test phrases containing spelling/grammar errors and scored 70.95% accuracy in aging error-free words in a 1,284 error-free word corpus using only a small training corpus of 7,384 complex sentences.
Abstract Format
html
Language
English
Format
Electronic
Accession Number
CDTG006938
Shelf Location
Archives, The Learning Commons, 12F Henry Sy Sr. Hall
Physical Description
1 computer disc ; 4 3/4 in.
Keywords
Filipino language--Grammar; Filipino language; Filipino language--Study and teaching
Upload Full Text
wf_no
Recommended Citation
Go, M. (2016). A corpus based-Filipino grammar checker using hybrid N-gram rules from grammatically-correct terms. Retrieved from https://animorepository.dlsu.edu.ph/etd_masteral/5335