Using Stanford part-of-speech tagger for the morphologically-rich Filipino Language
College
College of Computer Studies
Department/Unit
Software Technology
Document Type
Conference Proceeding
Source Title
PACLIC 2017 - Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation
First Page
81
Last Page
88
Publication Date
1-1-2019
Abstract
This research focuses on the implementation of a Maximum Entropy-based Part-of-Speech (POS) tagger for Filipino. It uses the Stanford POS tagger - a trainable POS tagger that has been trained on English, Chinese, Arabic, and other languages and producing one of the highest results in each language. The tagger was trained for Filipino using a 406k token corpus and considering unique Filipino linguistic phenomena such as high morphology and intra-sentential code-switches. The Filipino POS tagger resulted to 96.15% tagging accuracy which currently presents the highest accuracy and with a large lead among existing POS taggers for Filipino. Copyright © 2017 Matthew Phillip Go and Nicco Nocon
html
Recommended Citation
Go, M. V., & Nocon, N. S. (2019). Using Stanford part-of-speech tagger for the morphologically-rich Filipino Language. PACLIC 2017 - Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation, 81-88. Retrieved from https://animorepository.dlsu.edu.ph/faculty_research/484
Disciplines
South and Southeast Asian Languages and Societies
Keywords
Filipino language—Parts of speech
Upload File
wf_no