Date of Publication

2024

Document Type

Dissertation/Thesis

Degree Name

Bachelor of Science (Honors) in Computer Science and Master of Science in Computer Science

College

College of Computer Studies

Department/Unit

Software Technology

Thesis Advisor

Charibeth K. Cheng

Defense Panel Chair

Ethel Chua Joy Ong

Defense Panel Member

Charibeth K. Cheng

Jennifer O. Contreras

Abstract (English)

Text style transfer involves automatically translating a sentence from one style to another. Exploring techniques for text style transfer is important, as style plays a crucial role in making NLP systems more user-centered. However, there has been limited research on text style transfer in non-English contexts, primarily due to the scarcity of resources, such as parallel corpora, which are crucial for training text style transfer models. The limited work conducted in non-English settings is a barrier to a comprehensive understanding of the current state of text style transfer approaches. In that regard, this work was done within the context of the Filipino language, where text style transfer is unexplored. This work focused on the formality style transfer subtask, which aims to rewrite informal text to have a formal style. To address the lack of parallel corpora in the Filipino language, this work proposed to use pseudo-parallel corpus construction, where informal-formal text pairs are created using only non-parallel corpora. These pseudo-parallel pairs were used to train a sequence-to-sequence model to learn how to formalize Filipino text. Different modifications to the pipeline were explored, and the performances were evaluated using the three standard metrics in text style transfer: style transfer score, meaning preservation, and fluency. Although results show that the best model has below-average performance, the improvements gained with pipeline modifications indicate that further tweaking the methodology could still improve the quality of style transfer. This study recommends exploring better sentence representations, finding adjacent datasets for augmentation, and using aggregation-based scores to refine the dataset. Furthermore, more robust metric implementations should be used for reliable evaluation scores on Filipino text style transfer.

Abstract Format

html

Language

English

Upload Full Text

wf_yes

Embargo Period

4-17-2024

Share

COinS