A hybrid agent for automatically determining and extracting the 5Ws of Filipino news articles
College
College of Computer Studies
Department/Unit
Software Technology
Document Type
Conference Proceeding
Source Title
CEUR Workshop Proceedings
Volume
1986
First Page
50
Last Page
56
Publication Date
1-1-2017
Abstract
Copying permitted for private and academic purposes. As the number of sources of unstructured data continues to grow exponentially, manually reading through all this data becomes notoriously time consuming. Thus, there is a need for faster understanding and processing of this data. This can be achieved by automating the task through the use of information extraction. In this paper, we present an agent that automatically detects and extracts the 5Ws, namely the who, when, where, what, and why from Filipino news articles using a hybrid of machine learning and linguistic rules. The agent caters specifically to the Filipino language by working with its unique features such as ambiguous prepositions and markers, focus instead of subject and predicate, dialect influences, and others. In order to be able to maximize machine learning algorithms, techniques such as linguistic tagging and weighted decision trees are used to preprocess and filter the data as well as refine the final results. The results show that the agent achieved an accuracy of 63.33% for who, 71.38% for when, 58.25% for where, 89.20% for what, and 50.00% for why. Copyright © by the paper's authors.
html
Recommended Citation
Livelo, E. S., Ver, A. O., Chua, J. L., Yao, J. S., & Cheng, C. K. (2017). A hybrid agent for automatically determining and extracting the 5Ws of Filipino news articles. CEUR Workshop Proceedings, 1986, 50-56. Retrieved from https://animorepository.dlsu.edu.ph/faculty_research/3314
Disciplines
Computer Sciences | Software Engineering
Keywords
Text data mining
Upload File
wf_no