A hybrid agent for automatically determining and extracting the 5Ws of Filipino news articles

College

College of Computer Studies

Department/Unit

Software Technology

Document Type

Conference Proceeding

Source Title

CEUR Workshop Proceedings

Volume

1986

First Page

50

Last Page

56

Publication Date

1-1-2017

Abstract

Copying permitted for private and academic purposes. As the number of sources of unstructured data continues to grow exponentially, manually reading through all this data becomes notoriously time consuming. Thus, there is a need for faster understanding and processing of this data. This can be achieved by automating the task through the use of information extraction. In this paper, we present an agent that automatically detects and extracts the 5Ws, namely the who, when, where, what, and why from Filipino news articles using a hybrid of machine learning and linguistic rules. The agent caters specifically to the Filipino language by working with its unique features such as ambiguous prepositions and markers, focus instead of subject and predicate, dialect influences, and others. In order to be able to maximize machine learning algorithms, techniques such as linguistic tagging and weighted decision trees are used to preprocess and filter the data as well as refine the final results. The results show that the agent achieved an accuracy of 63.33% for who, 71.38% for when, 58.25% for where, 89.20% for what, and 50.00% for why. Copyright © by the paper's authors.

html

Disciplines

Computer Sciences | Software Engineering

Keywords

Text data mining

Upload File

wf_no

This document is currently not available here.

Share

COinS