Date of Publication
2024
Document Type
Dissertation/Thesis
Degree Name
Bachelor of Science (Honors) in Computer Science and Master of Science in Computer Science
College
College of Computer Studies
Department/Unit
Software Technology
Thesis Advisor
Ethel C. Ong
Defense Panel Chair
Charibeth K. Cheng
Defense Panel Member
Edward P. Tighe
Abstract (English)
Machine reading comprehension (MRC) is a popular task in natural language processing that has found applications in sectors such as customer service and healthcare. MRC has been integrated into software systems such as search engines and chatbots and used as a benchmark for evaluating the performance of language models. Despite the large amount of MRC research conducted in high-resource languages like English and Chinese, no work has been done on Filipino MRC. This study proposes to kick-start the field of Filipino MRC by constructing the Filipino Question Answering Dataset (FilQuAD), the first dataset in this area, to facilitate the training and evaluation of Filipino MRC models. The questions in FilQuAD were gathered via manual data collection and synthetic data generation. Dataset analysis and model evaluation were conducted to understand the properties of the created questions and benchmark existing Filipino language models on the MRC task. A total of 4063 question-answer pairs were gathered from both manual data collection and synthetic data generation. Model evaluation experiments show that cross-lingual language models significantly outperform Filipino models, and that synthetic data augmentation yields improved model performance. The models struggled most on questions requiring multiple sentence reasoning and world knowledge, as well as questions with numeric answers.
Abstract Format
html
Language
English
Recommended Citation
Pua, G. T. (2024). FilQuAD: A Filipino question answering dataset for machine reading comprehension. Retrieved from https://animorepository.dlsu.edu.ph/etdm_softtech/14
Upload Full Text
wf_yes
2024_Pua_PageswithSignature.pdf (638 kB)
2024_Pua_Chapter1.pdf (109 kB)
2024_Pua_Chapter2.pdf (634 kB)
2024_Pua_Chapter3.pdf (410 kB)
2024_Pua_Chapter4.pdf (576 kB)
2024_Pua_Chapter5.pdf (55 kB)
2024_Pua_AppendixA.pdf (755 kB)
2024_Pua_AppendixB.pdf (197 kB)
2024_Pua_References.pdf (95 kB)
2024_Pua_Dataset.zip (1258 kB)
Embargo Period
8-15-2024