Date of Publication

4-5-2021

Document Type

Master's Thesis

Degree Name

Master of Science in Computer Science

Subject Categories

Computer Sciences

College

College of Computer Studies

Department/Unit

Computer Science

Thesis Advisor

Ethel Chua-Joy Ong

Defense Panel Chair

Charibeth K. Cheng

Defense Panel Member

Edward P. Tighe
Ethel Chua-Joy Ong

Abstract/Summary

The proper identification of difficulty levels of reading materials prescribed in an educational setting is key towards effective learning and comprehension. Educators and publishers have relied on readability formulas in predicting text readability. While the English language boasts a rich history of research efforts in readability assessment, limited work has been done for the Filipino language. This study explores the use of an extensive range of linguistic predictors identified by experts spanning traditional, lexical, language model, syllable pattern, and morphological features to train an automatic readability assessment model using Logistic Regression, Support Vector Machines, and Random Forest. Over 265 story books and passages from Adarna House Inc. and DepEd Commons covering Grades 1, 2, and 3 were used for training the models. Results of feature selection process show that the optimal subset of linguistic feature sets achieving the highest performance of 66.1\% accuracy is a hybrid Random Forest model using the combination of traditional (TRAD) and syllable pattern (SYLL) features. Performing global and local model interpretation showed that surface-based features such as word count, average sentence length, and sentence count used in old readability formulas remain relevant in measuring the readability of Filipino texts, but combining them with deeper linguistic features would yield better performance of models. Future directions of the study include the use of various types of written literature, not only story books, to develop a more generalized readability assessment model as well as the use of deep neural networks for automatic feature extraction.

Keywords: Readability Assessment, Filipino, Linguistic Features, Story Books

Abstract Format

html

Language

English

Format

Electronic

Physical Description

128 leaves

Keywords

Readability (Literary style); Evaluation; Filipino language; Neural networks (Computer science); Children's books

Recommended Citation

Imperial, J. R. (2021). Exploring hybrid linguistic feature sets to measure filipino text readability. Retrieved from https://animorepository.dlsu.edu.ph/etdm_comsci/5

Upload Full Text

wf_yes

Embargo Period

5-9-2021

Download

COinS

Computer Science Master's Theses

Exploring hybrid linguistic feature sets to measure filipino text readability

Date of Publication

Document Type

Degree Name

Subject Categories

College

Department/Unit

Thesis Advisor

Defense Panel Chair

Defense Panel Member

Abstract/Summary

Abstract Format

Language

Format

Physical Description

Keywords

Recommended Citation

Upload Full Text

Embargo Period

Search

Browse

Submissions

Links

Computer Science Master's Theses

Exploring hybrid linguistic feature sets to measure filipino text readability

Author

Date of Publication

Document Type

Degree Name

Subject Categories

College

Department/Unit

Thesis Advisor

Defense Panel Chair

Defense Panel Member

Abstract/Summary

Abstract Format

Language

Format

Physical Description

Keywords

Recommended Citation

Upload Full Text

Embargo Period

Share

Search

Browse

Submissions

Links