A hybrid approach to extracting the 5Ws in Filipino news articles
Date of Publication
Bachelor of Science in Computer Science
College of Computer Studies
Defense Panel Member
Briane Paul Samson
The goal of this research is to develop an information extraction system for Filipino news articles that extracts the 5Ws, namely, sino (who), ano (what), kailan (when), saan (where), and bakit (why) and produces an output which can reduce the e ort required for further data analysis. Utilizing the output of the information extraction system, an interface is provided to allow its users to view, search, and edit the extracted data in a structured format.
The information extraction system applies both rule-based and machine learning techniques as well as various tools in order to perform text processing, candidate selection, and feature extraction. The functions that fall under text processing include tokenization, sentence segmentation, named-entity recogni- tion, part-of-speech tagging, and word scoring. Afterwards, rule-based candidate selection is performed by utilizing both the output of the text processing module as well as text markers. Subsequently, feature extraction is done through both machine-learned candidate classi cation models for the who, when, and where features and rule-based algorithms for the what and why features.
Furthermore, the information extraction system was evaluated alongside the system in the research of Cagampan (2014) in order to compare the results against a similar system that extracts the same features. However, the system in Cagampans research is optimized for Filipino editorials as opposed to news articles.
The proponents' system was able to achieve 63.3257% accuracy for 'who', 71.3768% accuracy for 'when', 58.2492% accuracy for 'where', 89.2% accuracy for 'what', and 50% accuracy for 'why'. In comparison to Cagampan's system, the 'who', 'where', and 'what' feature extraction modules of the proponents' system performed better.
Archives, The Learning Commons, 12F, Henry Sy Sr. Hall
1 computer optical disc ; 4 3/4 in.
Text processing (Computer science); Natural language processing (Computer science); Information retrieval
Chua, J. L., Livelo, E. S., Ver, A. O., & Yao, J. S. (2016). A hybrid approach to extracting the 5Ws in Filipino news articles. Retrieved from https://animorepository.dlsu.edu.ph/etd_bachelors/11501